Papers

Excitation-by-SampleRNN Model for Text-to-Speech

International Conference
2016~2020
작성자
한혜원
작성일
2019-06-01 16:42
조회
1597
Authors : Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang

Year : 2019

Publisher / Conference : ITC-CSCC

In this paper, we propose a neural vocoder-based textto-speech (TTS) system that effectively utilizes a source-filter modeling framework. Although neural vocoder algorithms such as SampleRNN and WaveNet are well-known to generate high quality speech, its generation speed is too slow to be used for real-world applications. By first decomposing speech signal into spectral and excitation components using the sourcefilter framework, we train the two components separately, i.e. training the spectrum or acoustic parameters with a long short-term memory model and the excitation component with a SampleRNN-based generative model. Unlike the conventional generative model that needs to represent the complicated probabilistic distribution of speech waveform, the proposed approach needs to generate only the glottal movement of human production mechanism. Therefore, it is possible to obtain high quality speech signals using a small-size of the pitch interval-oriented SampleRNN network. The objective and subjective test results confirm the superiority of the proposed system over a glottal modeling-based parametric and original SampleRNN-based speech synthesis systems.
전체 355
106 International Conference Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework" in ITC-CSCC, 2019
105 International Conference Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang "Excitation-by-SampleRNN Model for Text-to-Speech" in ITC-CSCC, 2019
104 International Journal Seung-Chul Shin, Jinkyu Lee, Soyeon Choe, Hyuk In Yang, Jihee Min, Ki-Yong Ahn, Justin Y. Jeon, Hong-Goo Kang "Dry Electrode-Based Body Fat Estimation System with Anthropometric Data for Use in a Wearable Device" in Sensors, vol.19, issue 9, 2019
103 International Conference Yang Yuan, Soo-Whan Chung, Hong-Goo Kang "Gradient-based active learning query strategy for end-to-end speech recognition" in ICASSP, 2019
102 International Conference Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang "Perfect match: Improved cross-modal embeddings for audio-visual synchronisation" in ICASSP, 2019
101 International Conference Hyewon Han, Kyungguen Byun, Hong-Goo Kang "A Deep Learning-based Stress Detection Algorithm with Speech Signal" in Workshop on Audio-Visual Scene Understanding for Immersive Multimedia (AVSU’18), 2018
100 International Conference Min-Jae Hwang, Eunwoo Song, Jin-Seob Kim, Hong-Goo Kang "A Unified Framework for the Generation of Glottal Signals in Deep Learning-based Parametric Speech Synthesis Systems" in INTERSPEECH, 2018
99 International Journal Jinkyu Lee, Jan Skoglund, Turaj Shabestary, Hong-Goo Kang "Phase-Sensitive Joint Learning Algorithms for Deep Learning-Based Speech Enhancement" in IEEE Signal Processing Letters, vol.25, issue 8, pp.1276-1280, 2018
98 International Conference Haemin Yang, Soyeon Choe, Keulbit Kim, Hong-Goo Kang "Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement" in ICSigSys, 2018
97 International Conference Min-Jae Hwang, Eunwoo Song, Kyungguen Byun, Hong-Goo Kang "Modeling-by-Generation-Structured Noise Compensation Algorithm for Glottal Vocoding Speech Synthesis System" in ICASSP, 2018