Papers

A pitch-synchronous speech analysis and synthesis method for DNN-SPSS system

International Conference
2016~2020
작성자
한혜원
작성일
2016-10-01 00:52
조회
3826
Authors : Jin-Seob Kim, Young-Sun Joo, Inseon Jang, ChungHyun Ahn, Jeongil Seo, Hong-Goo Kang

Year : 2016

Publisher / Conference : 21th International Conference on Digital Signal Processing (DSP)

This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.
전체 371
103 International Conference Hyeonjoo Kang, Young-Sun Joo, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang "A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis" in APSIPA, 2019
102 International Conference Min-Jae Hwang, Hong-Goo Kang "Parameter enhancement for MELP speech codec in noisy communication environment" in INTERSPEECH, 2019
101 International Conference Keulbit Kim, Jinkyu Lee, Jan Skoglund, Hong-Goo Kang "Model Order Selection for Wind Noise Reduction in Non-negative Matrix Factorization" in ITC-CSCC, 2019
100 International Conference Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework" in ITC-CSCC, 2019
99 International Conference Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang "Excitation-by-SampleRNN Model for Text-to-Speech" in ITC-CSCC, 2019
98 International Conference Yang Yuan, Soo-Whan Chung, Hong-Goo Kang "Gradient-based active learning query strategy for end-to-end speech recognition" in ICASSP, 2019
97 International Conference Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang "Perfect match: Improved cross-modal embeddings for audio-visual synchronisation" in ICASSP, 2019
96 International Conference Hyewon Han, Kyungguen Byun, Hong-Goo Kang "A Deep Learning-based Stress Detection Algorithm with Speech Signal" in Workshop on Audio-Visual Scene Understanding for Immersive Multimedia (AVSU’18), 2018
95 International Conference Min-Jae Hwang, Eunwoo Song, Jin-Seob Kim, Hong-Goo Kang "A Unified Framework for the Generation of Glottal Signals in Deep Learning-based Parametric Speech Synthesis Systems" in INTERSPEECH, 2018
94 International Conference Haemin Yang, Soyeon Choe, Keulbit Kim, Hong-Goo Kang "Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement" in ICSigSys, 2018