Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network
In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).
The recently proposed LPCNet vocoder has successfully achieved high-quality and lightweight speech synthesis systems by combining a vocal tract LP filter with a WaveRNN-based vocal source (i.e., excitation) generator.
However, the quality of synthesized speech is often unstable because the vocal source component is insufficiently represented by the mu-law quantization method, and the model is trained without considering the entire speech production mechanism.
To address this problem, we first introduce LP-MDN, which enables the autoregressive neural vocoder to structurally represent the interactions between the vocal tract and vocal source components.
Then, we propose to incorporate the LP-MDN to the LPCNet vocoder by replacing the conventional discretized output with continuous density distribution.
The experimental results verify that the proposed system provides high quality synthetic speech by achieving a mean opinion score of 4.41 within a text-to-speech framework.
|299||International Journal||Young-Sun Joo, Hanbin Bae, Young-Ik Kim, Hoon-Young Cho, Hong-Goo Kang "Effective Emotion Transplantation in an End-to-End Text-to-Speech System" in IEEE Access, vol.8, pp.161713-161719, 2020|
|298||Domestic Journal||권유환, 정수환, 강홍구 "화자 인식을 위한 적대학습 기반음성 분리 프레임워크에 대한 연구" in 한국음향학회지, vol.39, 제 5호, pp.447-453, 2020|
|297||Domestic Conference||오태양, 정기혁, 강홍구 "화자 및 발화 스타일 임베딩을 통한 다화자 음성합성 시스템 음질 향상" in 전자공학회 하계학술대회, pp.980-982, 2020|
|296||Domestic Conference||이성현, 강홍구 "딥러닝 기반 종단 간 다채널 음질 개선 알고리즘" in 전자공학회 하계학술대회, pp.968-970, 2020|
|295||Domestic Conference||임정운, 김지현, 강홍구 "메타러닝을 이용한 SAR 영상 자동표적 인식" in 한국항공우주학회 2020 춘계학술대회, pp.353-354, 2020|
|294||International Conference||Se-Yun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "Emotional Speech Synthesis with Rich and Granularized Control" in ICASSP, 2020|
|293||International Conference||Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang "Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network" in ICASSP, 2020|
|292||International Journal||Soo-Whan Chung, Joon Son Chung, Hong Goo Kang "Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval" in IEEE Journal of Selected Topics in Signal Processing, vol.14, issue 3, 2020|
|291||International Conference||Hyeonjoo Kang, Young-Sun Joo, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang "A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis" in APSIPA, 2019|
|290||International Journal||Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis" in IEEE Signal Processing Letters, vol.26, issue 9, pp.1383-1387, 2019|