Papers

Perceptual quality and modeling accuracy of excitation parameters in DLSTM-based speech synthesis systems

International Conference
2016~2020
작성자
한혜원
작성일
2017-12-01 16:38
조회
1776
Authors : Eunwoo Song, Frank K. Soong, Hong-Goo Kang

Year : 2017

Publisher / Conference : ASRU

This paper investigates how the perceptual quality of the synthesized speech is affected by reconstruction errors in excitation signals generated by a deep learning-based statistical model. In this framework, the excitation signal obtained by an LPC inverse filter is first decomposed into harmonic and noise components using an improved time-frequency trajectory excitation (ITFTE) scheme, then they are trained and generated by a deep long short-term memory (DLSTM)-based speech synthesis system. By controlling the parametric dimension of the ITFTE vocoder, we analyze the impact of the harmonic and noise components to the perceptual quality of the synthesized speech. Both objective and subjective experimental results confirm that the maximum perceptually allowable spectral distortion for the harmonic spectrum of the generated excitation is ~0.08 dB. On the other hand, the absolute spectral distortion in the noise components is meaningless, and only the spectral envelope is relevant to the perceptual quality.
전체 364
284 International Conference Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang "Excitation-by-SampleRNN Model for Text-to-Speech" in ITC-CSCC, 2019
283 International Journal Seung-Chul Shin, Jinkyu Lee, Soyeon Choe, Hyuk In Yang, Jihee Min, Ki-Yong Ahn, Justin Y. Jeon, Hong-Goo Kang "Dry Electrode-Based Body Fat Estimation System with Anthropometric Data for Use in a Wearable Device" in Sensors, vol.19, issue 9, 2019
282 International Conference Yang Yuan, Soo-Whan Chung, Hong-Goo Kang "Gradient-based active learning query strategy for end-to-end speech recognition" in ICASSP, 2019
281 International Conference Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang "Perfect match: Improved cross-modal embeddings for audio-visual synchronisation" in ICASSP, 2019
280 International Conference Hyewon Han, Kyungguen Byun, Hong-Goo Kang "A Deep Learning-based Stress Detection Algorithm with Speech Signal" in Workshop on Audio-Visual Scene Understanding for Immersive Multimedia (AVSU’18), 2018
279 International Conference Min-Jae Hwang, Eunwoo Song, Jin-Seob Kim, Hong-Goo Kang "A Unified Framework for the Generation of Glottal Signals in Deep Learning-based Parametric Speech Synthesis Systems" in INTERSPEECH, 2018
278 Domestic Conference 양원, 정수환, 강홍구 "비학습 데이터 적응화 기법을 이용한 딥러닝 기반 한국어 음성 인식 기술" in 한국음향학회 추계발표대회, 2018
277 Domestic Conference 최소연, 정수환, 강홍구 "임베딩 매트릭스를 기반으로 한 비정상적 잡음 제거 알고리즘의 분석과 딥러닝 음질개선 방법들과의 성능비교" in 한국음향학회 추계발표대회, 2018
276 International Journal Jinkyu Lee, Jan Skoglund, Turaj Shabestary, Hong-Goo Kang "Phase-Sensitive Joint Learning Algorithms for Deep Learning-Based Speech Enhancement" in IEEE Signal Processing Letters, vol.25, issue 8, pp.1276-1280, 2018
275 International Conference Haemin Yang, Soyeon Choe, Keulbit Kim, Hong-Goo Kang "Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement" in ICSigSys, 2018