Papers

Excitation-by-SampleRNN Model for Text-to-Speech

International Conference
2016~2020
작성자
한혜원
작성일
2019-06-01 16:42
조회
1502
Authors : Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang

Year : 2019

Publisher / Conference : ITC-CSCC

In this paper, we propose a neural vocoder-based textto-speech (TTS) system that effectively utilizes a source-filter modeling framework. Although neural vocoder algorithms such as SampleRNN and WaveNet are well-known to generate high quality speech, its generation speed is too slow to be used for real-world applications. By first decomposing speech signal into spectral and excitation components using the sourcefilter framework, we train the two components separately, i.e. training the spectrum or acoustic parameters with a long short-term memory model and the excitation component with a SampleRNN-based generative model. Unlike the conventional generative model that needs to represent the complicated probabilistic distribution of speech waveform, the proposed approach needs to generate only the glottal movement of human production mechanism. Therefore, it is possible to obtain high quality speech signals using a small-size of the pitch interval-oriented SampleRNN network. The objective and subjective test results confirm the superiority of the proposed system over a glottal modeling-based parametric and original SampleRNN-based speech synthesis systems.
전체 355
122 International Conference Miseul Kim, Minh-Tri Ho, Hong-Goo Kang "Self-supervised Complex Network for Machine Sound Anomaly Detection" in EUSIPCO, 2021
121 International Conference Kihyuk Jeong, Huu-Kim Nguyen, Hong-Goo Kang "A Fast and Lightweight Text-To-Speech Model with Spectrum and Waveform Alignment Algorithms" in EUSIPCO, 2021
120 International Conference Jiyoung Lee*, Soo-Whan Chung*, Sunok Kim, Hong-Goo Kang**, Kwanghoon Sohn** "Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation" in CVPR, 2021
119 International Conference Zainab Alhakeem, Hong-Goo Kang "Confidence Learning from Noisy Labels for Arabic Dialect Identification" in ITC-CSCC, 2021
118 International Conference Huu-Kim Nguyen, Kihyuk Jeong, Hong-Goo Kang "Fast and Lightweight Speech Synthesis Model based on FastSpeech2" in ITC-CSCC, 2021
117 International Conference Yoohwan Kwon*, Hee-Soo Heo*, Bong-Jin Lee, Joon Son Chung "The ins and outs of speaker recognition: lessons from VoxSRC 2020" in ICASSP, 2021
116 International Conference You Jin Kim, Hee Soo Heo, Soo-Whan Chung, Bong-Jin Lee "End-to-end Lip Synchronisation Based on Pattern Classification" in IEEE Spoken Language Technology Workshop (SLT), 2020
115 International Conference Seong Min Kye, Yoohwan Kwon, Joon Son Chung "Cross Attentive Pooling for Speaker Verification" in IEEE Spoken Language Technology Workshop (SLT), 2020
114 International Conference Suhyeon Oh, Hyungseob Lim, Kyungguen Byun, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis" in APSIPA (*awarded Best Paper), 2020
113 International Conference Hyeon-Kyeong Shin, Hyewon Han, Kyungguen Byun, Hong-Goo Kang "Speaker-invariant Psychological Stress Detection Using Attention-based Network" in APSIPA, 2020