Excitation-by-SampleRNN Model for Text-to-Speech

International Conference
2019-06-01 16:42
Authors : Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang

Year : 2019

Publisher / Conference : ITC-CSCC

In this paper, we propose a neural vocoder-based textto-speech (TTS) system that effectively utilizes a source-filter modeling framework. Although neural vocoder algorithms such as SampleRNN and WaveNet are well-known to generate high quality speech, its generation speed is too slow to be used for real-world applications. By first decomposing speech signal into spectral and excitation components using the sourcefilter framework, we train the two components separately, i.e. training the spectrum or acoustic parameters with a long short-term memory model and the excitation component with a SampleRNN-based generative model. Unlike the conventional generative model that needs to represent the complicated probabilistic distribution of speech waveform, the proposed approach needs to generate only the glottal movement of human production mechanism. Therefore, it is possible to obtain high quality speech signals using a small-size of the pitch interval-oriented SampleRNN network. The objective and subjective test results confirm the superiority of the proposed system over a glottal modeling-based parametric and original SampleRNN-based speech synthesis systems.
전체 326
296 Domestic Conference 이성현, 강홍구 "딥러닝 기반 종단 간 다채널 음질 개선 알고리즘" in 전자공학회 하계학술대회, pp.968-970, 2020
295 Domestic Conference 임정운, 김지현, 강홍구 "메타러닝을 이용한 SAR 영상 자동표적 인식" in 한국항공우주학회 2020 춘계학술대회, pp.353-354, 2020
294 International Conference Se-Yun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "Emotional Speech Synthesis with Rich and Granularized Control" in ICASSP, 2020
293 International Conference Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang "Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network" in ICASSP, 2020
292 International Journal Soo-Whan Chung, Joon Son Chung, Hong Goo Kang "Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval" in IEEE Journal of Selected Topics in Signal Processing, vol.14, issue 3, 2020
291 International Conference Hyeonjoo Kang, Young-Sun Joo, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang "A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis" in APSIPA, 2019
290 International Journal Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis" in IEEE Signal Processing Letters, vol.26, issue 9, pp.1383-1387, 2019
289 International Conference Min-Jae Hwang, Hong-Goo Kang "Parameter enhancement for MELP speech codec in noisy communication environment" in INTERSPEECH, 2019
288 Domestic Journal 오상신, 엄세연, 장인선, 안충현, 강홍구 "k-평균 알고리즘을 활용한 음성의 대표 감정 스타일 결정 방법" in 한국음향학회지, vol.38, 제 5호, pp.614-620, 2019
287 International Journal Jinkyu Lee, Hong-Goo Kang "A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems" in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.27, issue 6, pp.1098-1108, 2019