Papers

ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis

International Conference
2016~2020
작성자
한혜원
작성일
2020-12-01 16:59
조회
1742
Authors : Suhyeon Oh, Hyungseob Lim, Kyungguen Byun, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang

Year : 2020

Publisher / Conference : APSIPA (*awarded Best Paper)

Research area : Speech Signal Processing, Text-to-Speech

Presentation/Publication date : 2020.12.10

Related project : Embedded Neural TTS

Presentation : Oral

In this paper we propose ExcitGlow, a vocoder that incorporates the source-filter model of voice production theory into a flow-based deep generative model. By targeting the distribution of the excitation signal instead of the speech waveform itself, we significantly reduce the size of the flow-based generative model. To further reduce the number of parameters, we apply a parameter sharing technique in which a single affine coupling layer is used for several flow layers. To avoid quality degradation, we also introduce a closed-loop training framework to optimize the flow model for both the speech and excitation signal generation processes. Specifically, we choose negative log-likelihood (NLL) loss for the excitation signal and multi-resolution spectral distance for the speech signal. As a result, we are able to reduce the model size from 87.73M to 15.60M parameters while maintaining the perceptual quality of synthesized speech.

* Awarded Best paper in APSIPA 2020

전체 355
8 International Conference Zhenyu Piao, Hyungseob Lim, Miseul Kim, Hong-goo Kang "PDF-NET: Pitch-adaptive Dynamic Filter Network for Intra-gender Speaker Verification" in APSIPA ASC, 2023
7 International Conference Byeong Hyeon Kim, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang "Progressive Multi-Stage Neural Audio Codec with Psychoacoustic Loss and Discriminator" in ICASSP, 2023
6 International Conference Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang "End-to-End Neural Audio Coding in the MDCT Domain" in ICASSP, 2023
5 Domestic Conference Hyungseob Lim, Hong-Goo Kang, Inseon Jang "엔트로피 모델을 활용한 심층 신경망 기반 오디오 압축 모델 최적화" in 한국방송·미디어공학회 2022년 하계학술대회, 2022
4 International Conference Chanwoo Lee, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang "Progressive Multi-Stage Neural Audio Coding with Guided References" in ICASSP, 2022
3 International Conference Jihyun Lee, Hyungseob Lim, Chanwoo Lee, Inseon Jang, Hong-Goo Kang "Adversarial Audio Synthesis Using a Harmonic-Percussive Discriminator" in ICASSP, 2022
2 International Conference Suhyeon Oh, Hyungseob Lim, Kyungguen Byun, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis" in APSIPA (*awarded Best Paper), 2020
1 International Conference Hyungseob Lim, Suhyeon Oh, Kyungguen Byun, Hong-Goo Kang "A Study on Conditional Features for a Flow-based Neural Vocoder" in Asilomar Conference on Signals, Systems, and Computers, 2020