Papers

LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis

International Conference
2016~2020
작성자
한혜원
작성일
2020-12-01 16:57
조회
295
Authors : Min-Jae Hwang, Frank Soong, Eunwoo Song, Xi Wang, Hyeonjoo Kang, Hong-Goo Kang

Year : 2020

Publisher / Conference : APSIPA

We propose a linear prediction (LP)-based wave-form generation method via WaveNet vocoding framework. A WaveNet-based neural vocoder has significantly improved the quality of parametric text-to-speech (TTS) systems. However, it is challenging to effectively train the neural vocoder when the target database contains massive amount of acoustical information such as prosody, style or expressiveness. As a solution, the approaches that only generate the vocal source component by a neural vocoder have been proposed. However, they tend to generate synthetic noise because the vocal source component is independently handled without considering the entire speech production process; where it is inevitable to come up with a mismatch between vocal source and vocal tract filter. To address this problem, we propose an LP-WaveNet vocoder, where the complicated interactions between vocal source and vocal tract components are jointly trained within a mixture density networkbased WaveNet model. The experimental results verify that the proposed system outperforms the conventional WaveNet vocoders both objectively and subjectively. In particular, the proposed method achieves 4.47 MOS within the TTS framework.
전체 319
309 International Conference Seong Min Kye, Yoohwan Kwon, Joon Son Chung "Cross Attentive Pooling for Speaker Verification" in IEEE Spoken Language Technology Workshop (SLT), 2020
308 International Conference Suhyeon Oh, Hyungseob Lim, Kyungguen Byun, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis" in APSIPA (*awarded Best Paper), 2020
307 International Conference Hyeon-Kyeong Shin, Hyewon Han, Kyungguen Byun, Hong-Goo Kang "Speaker-invariant Psychological Stress Detection Using Attention-based Network" in APSIPA, 2020
306 International Conference Min-Jae Hwang, Frank Soong, Eunwoo Song, Xi Wang, Hyeonjoo Kang, Hong-Goo Kang "LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis" in APSIPA, 2020
305 International Conference Hyungseob Lim, Suhyeon Oh, Kyungguen Byun, Hong-Goo Kang "A Study on Conditional Features for a Flow-based Neural Vocoder" in Asilomar Conference on Signals, Systems, and Computers, 2020
304 International Conference Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang "FaceFilter: Audio-visual speech separation using still images" in INTERSPEECH (*awarded Best Student Paper), 2020
303 International Conference Soo-Whan Chung, Hong-Goo Kang, Joon Son Chung "Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision" in INTERSPEECH, 2020
302 International Conference Hyewon Han, Soo-Whan Chung, Hong-Goo Kang "MIRNet: Learning multiple identities representations in overlapped speech" in INTERSPEECH, 2020
301 International Conference Yoohwan Kwon, Soo-Whan Chung, Hong-Goo Kang "Intra-Class Variation Reduction of Speaker Representation in Disentanglement Framework" in INTERSPEECH, 2020
300 International Conference Minh-Tri Ho, Jinyoung Lee, Bong-Ki Lee, Dong Hoon Yi, Hong-Goo Kang "A Cross-channel Attention-based Wave-U-Net for Multi-channel Speech Enhancement" in INTERSPEECH, 2020