Papers

A Unified Framework for the Generation of Glottal Signals in Deep Learning-based Parametric Speech Synthesis Systems

International Conference
2016~2020
작성자
한혜원
작성일
2018-09-01 16:37
조회
355
Authors : Min-Jae Hwang, Eunwoo Song, Jin-Seob Kim, Hong-Goo Kang

Year : 2018

Publisher / Conference : INTERSPEECH

In this paper, we propose a unified training framework for the generation of glottal signals in deep learning (DL)-based parametric speech synthesis systems. The glottal vocoding-based speech synthesis system, especially the modeling-by-generation (MbG) structure that we proposed recently, significantly improves the naturalness of synthesized speech by faithfully representing the noise component of the glottal excitation with an additional DL structure. Because the MbG method introduces a multistage processing pipeline, however, its training process is complicated and inefficient. To alleviate this problem, we propose a unified training approach that directly generates speech parameters by merging all the required models, such as acoustic, glottal and noise models into a single unified network. Considering the fact that noise analysis should be performed after training the glottal model, we also propose a stochastic noise analysis method that enables noise modeling to be included in the unified training process by iteratively analyzing the noise component in every epoch. Both objective and subjective test results verify the superiority of the proposed algorithm compared to conventional methods.
전체 327
287 International Journal Jinkyu Lee, Hong-Goo Kang "A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems" in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.27, issue 6, pp.1098-1108, 2019
286 International Conference Keulbit Kim, Jinkyu Lee, Jan Skoglund, Hong-Goo Kang "Model Order Selection for Wind Noise Reduction in Non-negative Matrix Factorization" in ITC-CSCC, 2019
285 International Conference Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework" in ITC-CSCC, 2019
284 International Conference Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang "Excitation-by-SampleRNN Model for Text-to-Speech" in ITC-CSCC, 2019
283 International Journal Seung-Chul Shin, Jinkyu Lee, Soyeon Choe, Hyuk In Yang, Jihee Min, Ki-Yong Ahn, Justin Y. Jeon, Hong-Goo Kang "Dry Electrode-Based Body Fat Estimation System with Anthropometric Data for Use in a Wearable Device" in Sensors, vol.19, issue 9, 2019
282 International Conference Yang Yuan, Soo-Whan Chung, Hong-Goo Kang "Gradient-based active learning query strategy for end-to-end speech recognition" in ICASSP, 2019
281 International Conference Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang "Perfect match: Improved cross-modal embeddings for audio-visual synchronisation" in ICASSP, 2019
280 International Conference Hyewon Han, Kyungguen Byun, Hong-Goo Kang "A Deep Learning-based Stress Detection Algorithm with Speech Signal" in Workshop on Audio-Visual Scene Understanding for Immersive Multimedia (AVSU’18), 2018
279 International Conference Min-Jae Hwang, Eunwoo Song, Jin-Seob Kim, Hong-Goo Kang "A Unified Framework for the Generation of Glottal Signals in Deep Learning-based Parametric Speech Synthesis Systems" in INTERSPEECH, 2018
278 Domestic Conference 양원, 정수환, 강홍구 "비학습 데이터 적응화 기법을 이용한 딥러닝 기반 한국어 음성 인식 기술" in 한국음향학회 추계발표대회, 2018