Papers

Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems

International Journal
2016~2020
작성자
이진영
작성일
2017-11-01 22:07
조회
3931
Authors : Eunwoo Song, Frank K. Soong, Hong-Goo Kang

Year : 2017

Publisher / Conference : IEEE/ACM Transactions on Audio, Speech, and Language Processing

Volume : 25, issue 11

Page : 2152-2161

In this paper, we report research results on modeling the parameters of an improved time-frequency trajectory excitation (ITFTE) and spectral envelopes of an LPC vocoder with a long short-term memory (LSTM)-based recurrent neural network (RNN) for high-quality text-to-speech (TTS) systems. The ITFTE vocoder has been shown to significantly improve the perceptual quality of statistical parameter-based TTS systems in our prior works. However, a simple feed-forward deep neural network (DNN) with a finite window length is inadequate to capture the time evolution of the ITFTE parameters. We propose to use the LSTM to exploit the time-varying nature of both trajectories of the excitation and filter parameters, where the LSTM is implemented to use the linguistic text input and to predict both ITFTE and LPC parameters holistically. In the case of LPC parameters, we further enhance the generated spectrum by applying LP bandwidth expansion and line spectral frequency-sharpening filters. These filters are not only beneficial for reducing unstable synthesis filter conditions but also advantageous toward minimizing the muffling problem in the generated spectrum. Experimental results have shown that the proposed LSTM-RNN system with the ITFTE vocoder significantly outperforms both similarly configured band aperiodicity-based systems and our best prior DNN-trainecounterpart, both objectively and subjectively.
전체 372
100 International Conference Min-Jae Hwang, Eunwoo Song, Jin-Seob Kim, Hong-Goo Kang "A Unified Framework for the Generation of Glottal Signals in Deep Learning-based Parametric Speech Synthesis Systems" in INTERSPEECH, 2018
99 International Journal Jinkyu Lee, Jan Skoglund, Turaj Shabestary, Hong-Goo Kang "Phase-Sensitive Joint Learning Algorithms for Deep Learning-Based Speech Enhancement" in IEEE Signal Processing Letters, vol.25, issue 8, pp.1276-1280, 2018
98 International Conference Haemin Yang, Soyeon Choe, Keulbit Kim, Hong-Goo Kang "Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement" in ICSigSys, 2018
97 International Conference Min-Jae Hwang, Eunwoo Song, Kyungguen Byun, Hong-Goo Kang "Modeling-by-Generation-Structured Noise Compensation Algorithm for Glottal Vocoding Speech Synthesis System" in ICASSP, 2018
96 International Conference Jinyoung Lee, Chahyeon Eom, Youngsu Kwak, Hong-Goo Kang, Chungyoung Lee "DNN-based Wireless Positioning in An Outdoor Environment" in ICASSP, 2018
95 International Conference Seung-chul Shin, Sangyeop Lee, Taeho Lee, Kyoungwoo Lee, Yong Seung Lee, Hong-Goo Kang "Two electrode based healthcare device for continuously monitoring ECG and BIA signals" in BHI, 2018
94 International Journal JeeSok Lee, Soo-Whan Chung, Min-Seok Choi, Hong-Goo Kang "Generic uniform search grid generation algorithm for far-field source localization" in The Journal of the Acoustical Society of America, vol.143, 2018
93 International Journal Min-Jae Hwang, JeeSok Lee, MiSuk Lee, Hong-Goo Kang "SVD-Based Adaptive QIM Watermarking on Stereo Audio Signals" in IEEE Transactions on Multimedia, vol.20, issue 1, pp.45-54, 2018
92 International Conference Eunwoo Song, Frank K. Soong, Hong-Goo Kang "Perceptual quality and modeling accuracy of excitation parameters in DLSTM-based speech synthesis systems" in ASRU, 2017
91 International Journal Eunwoo Song, Frank K. Soong, Hong-Goo Kang "Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems" in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue 11, pp.2152-2161, 2017