Papers

ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis

International Conference
2016~2020
작성자
한혜원
작성일
2020-12-01 16:59
조회
2619
Authors : Suhyeon Oh, Hyungseob Lim, Kyungguen Byun, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang

Year : 2020

Publisher / Conference : APSIPA (*awarded Best Paper)

Research area : Speech Signal Processing, Text-to-Speech

Presentation/Publication date : 2020.12.10

Related project : Embedded Neural TTS

Presentation : Oral

In this paper we propose ExcitGlow, a vocoder that incorporates the source-filter model of voice production theory into a flow-based deep generative model. By targeting the distribution of the excitation signal instead of the speech waveform itself, we significantly reduce the size of the flow-based generative model. To further reduce the number of parameters, we apply a parameter sharing technique in which a single affine coupling layer is used for several flow layers. To avoid quality degradation, we also introduce a closed-loop training framework to optimize the flow model for both the speech and excitation signal generation processes. Specifically, we choose negative log-likelihood (NLL) loss for the excitation signal and multi-resolution spectral distance for the speech signal. As a result, we are able to reduce the model size from 87.73M to 15.60M parameters while maintaining the perceptual quality of synthesized speech.

* Awarded Best paper in APSIPA 2020

전체 365
119 International Conference Zainab Alhakeem, Hong-Goo Kang "Confidence Learning from Noisy Labels for Arabic Dialect Identification" in ITC-CSCC, 2021
118 International Conference Huu-Kim Nguyen, Kihyuk Jeong, Hong-Goo Kang "Fast and Lightweight Speech Synthesis Model based on FastSpeech2" in ITC-CSCC, 2021
117 International Conference Yoohwan Kwon*, Hee-Soo Heo*, Bong-Jin Lee, Joon Son Chung "The ins and outs of speaker recognition: lessons from VoxSRC 2020" in ICASSP, 2021
116 International Conference You Jin Kim, Hee Soo Heo, Soo-Whan Chung, Bong-Jin Lee "End-to-end Lip Synchronisation Based on Pattern Classification" in IEEE Spoken Language Technology Workshop (SLT), 2020
115 International Conference Seong Min Kye, Yoohwan Kwon, Joon Son Chung "Cross Attentive Pooling for Speaker Verification" in IEEE Spoken Language Technology Workshop (SLT), 2020
114 International Conference Suhyeon Oh, Hyungseob Lim, Kyungguen Byun, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis" in APSIPA (*awarded Best Paper), 2020
113 International Conference Hyeon-Kyeong Shin, Hyewon Han, Kyungguen Byun, Hong-Goo Kang "Speaker-invariant Psychological Stress Detection Using Attention-based Network" in APSIPA, 2020
112 International Conference Min-Jae Hwang, Frank Soong, Eunwoo Song, Xi Wang, Hyeonjoo Kang, Hong-Goo Kang "LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis" in APSIPA, 2020
111 International Conference Hyungseob Lim, Suhyeon Oh, Kyungguen Byun, Hong-Goo Kang "A Study on Conditional Features for a Flow-based Neural Vocoder" in Asilomar Conference on Signals, Systems, and Computers, 2020
110 International Conference Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang "FaceFilter: Audio-visual speech separation using still images" in INTERSPEECH (*awarded Best Student Paper), 2020