Papers

Excitation-by-SampleRNN Model for Text-to-Speech

International Conference
2016~2020
작성자
한혜원
작성일
2019-06-01 16:42
조회
1805
Authors : Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang

Year : 2019

Publisher / Conference : ITC-CSCC

In this paper, we propose a neural vocoder-based textto-speech (TTS) system that effectively utilizes a source-filter modeling framework. Although neural vocoder algorithms such as SampleRNN and WaveNet are well-known to generate high quality speech, its generation speed is too slow to be used for real-world applications. By first decomposing speech signal into spectral and excitation components using the sourcefilter framework, we train the two components separately, i.e. training the spectrum or acoustic parameters with a long short-term memory model and the excitation component with a SampleRNN-based generative model. Unlike the conventional generative model that needs to represent the complicated probabilistic distribution of speech waveform, the proposed approach needs to generate only the glottal movement of human production mechanism. Therefore, it is possible to obtain high quality speech signals using a small-size of the pitch interval-oriented SampleRNN network. The objective and subjective test results confirm the superiority of the proposed system over a glottal modeling-based parametric and original SampleRNN-based speech synthesis systems.
전체 364
108 International Conference Hyewon Han, Soo-Whan Chung, Hong-Goo Kang "MIRNet: Learning multiple identities representations in overlapped speech" in INTERSPEECH, 2020
107 International Conference Yoohwan Kwon, Soo-Whan Chung, Hong-Goo Kang "Intra-Class Variation Reduction of Speaker Representation in Disentanglement Framework" in INTERSPEECH, 2020
106 International Conference Minh-Tri Ho, Jinyoung Lee, Bong-Ki Lee, Dong Hoon Yi, Hong-Goo Kang "A Cross-channel Attention-based Wave-U-Net for Multi-channel Speech Enhancement" in INTERSPEECH, 2020
105 International Conference Seyun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "Emotional Speech Synthesis with Rich and Granularized Control" in ICASSP, 2020
104 International Conference Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang "Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network" in ICASSP, 2020
103 International Conference Hyeonjoo Kang, Young-Sun Joo, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang "A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis" in APSIPA, 2019
102 International Conference Min-Jae Hwang, Hong-Goo Kang "Parameter enhancement for MELP speech codec in noisy communication environment" in INTERSPEECH, 2019
101 International Conference Keulbit Kim, Jinkyu Lee, Jan Skoglund, Hong-Goo Kang "Model Order Selection for Wind Noise Reduction in Non-negative Matrix Factorization" in ITC-CSCC, 2019
100 International Conference Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework" in ITC-CSCC, 2019
99 International Conference Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang "Excitation-by-SampleRNN Model for Text-to-Speech" in ITC-CSCC, 2019