Papers

A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis

International Conference
2016~2020
작성자
한혜원
작성일
2019-11-01 16:48
조회
3290
Authors : Hyeonjoo Kang, Young-Sun Joo, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang

Year : 2019

Publisher / Conference : APSIPA

In this paper, we investigate the variation in the performance of a deep learning-based speech synthesis (DLSS) system based on the configuration of output acoustic parameters. Our method is mainly applicable for vocoding-based statistical parametric speech synthesis (SPSS), which has advantages in lowresource scenarios. Given the independence assumption of the source-filter model for the spectral and fundamental frequency F0 parameters, we propose a reliable network architecture for training acoustic parameters. Particularly, the F0 parameter suffers from high fluctuation and an extremely low number of dimensions. To relieve these problems, we introduce a contextwindow approach. Furthermore, we apply data augmentation to the proposed structure to overcome a lack of training data, which is a frequent issue with multi-speaker TTS systems. Experimental results confirm the superiority of the proposed algorithm over conventional ones in both single-speaker and multi-speaker TTS setups
전체 371
311 International Conference Yoohwan Kwon*, Hee-Soo Heo*, Bong-Jin Lee, Joon Son Chung "The ins and outs of speaker recognition: lessons from VoxSRC 2020" in ICASSP, 2021
310 International Conference You Jin Kim, Hee Soo Heo, Soo-Whan Chung, Bong-Jin Lee "End-to-end Lip Synchronisation Based on Pattern Classification" in IEEE Spoken Language Technology Workshop (SLT), 2020
309 International Conference Seong Min Kye, Yoohwan Kwon, Joon Son Chung "Cross Attentive Pooling for Speaker Verification" in IEEE Spoken Language Technology Workshop (SLT), 2020
308 International Conference Suhyeon Oh, Hyungseob Lim, Kyungguen Byun, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis" in APSIPA (*awarded Best Paper), 2020
307 International Conference Hyeon-Kyeong Shin, Hyewon Han, Kyungguen Byun, Hong-Goo Kang "Speaker-invariant Psychological Stress Detection Using Attention-based Network" in APSIPA, 2020
306 International Conference Min-Jae Hwang, Frank Soong, Eunwoo Song, Xi Wang, Hyeonjoo Kang, Hong-Goo Kang "LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis" in APSIPA, 2020
305 International Conference Hyungseob Lim, Suhyeon Oh, Kyungguen Byun, Hong-Goo Kang "A Study on Conditional Features for a Flow-based Neural Vocoder" in Asilomar Conference on Signals, Systems, and Computers, 2020
304 International Conference Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang "FaceFilter: Audio-visual speech separation using still images" in INTERSPEECH (*awarded Best Student Paper), 2020
303 International Conference Soo-Whan Chung, Hong-Goo Kang, Joon Son Chung "Seeing Voices and Hearing Voices: Learning Discriminative Embeddings Using Cross-Modal Self-Supervision" in INTERSPEECH, 2020
302 International Conference Hyewon Han, Soo-Whan Chung, Hong-Goo Kang "MIRNet: Learning multiple identities representations in overlapped speech" in INTERSPEECH, 2020