Papers

A Study on Conditional Features for a Flow-based Neural Vocoder

International Conference
2016~2020
작성자
한혜원
작성일
2020-11-01 17:03
조회
1908
Authors : Hyungseob Lim, Suhyeon Oh, Kyungguen Byun, Hong-Goo Kang

Year : 2020

Publisher / Conference : Asilomar Conference on Signals, Systems, and Computers

In this paper, we propose an effective way of providing conditional features for a flow-based neural vocoder. Most conventional approaches utilize mel-spectrograms for conditioning neural vocoders, but this significantly increases the size of neural networks due to their high dimensional behavior. We show that the network size of a flow-based generative model can be reduced when we use acoustic parameters for a sinusoidal speech analysis-and-synthesis framework such as voiced/unvoiced flag, fundamental frequency, mel-cepstral coefficients, and energy of each analysis frame. We also conclude that training becomes much easier if we feed the fundamental frequency by an embedded vector representation after quantizing it with a small number of bits. Experimental results verify that the performance of the proposed algorithm is comparable to that of flow-based neural vocoders conditioned on mel-spectrograms while the required information for the feature representations and network complexity for generating speech become lower.
전체 364
128 International Conference Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang "Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement" in ICASSP, 2022
127 International Conference Chanwoo Lee, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang "Progressive Multi-Stage Neural Audio Coding with Guided References" in ICASSP, 2022
126 International Conference Jihyun Lee, Hyungseob Lim, Chanwoo Lee, Inseon Jang, Hong-Goo Kang "Adversarial Audio Synthesis Using a Harmonic-Percussive Discriminator" in ICASSP, 2022
125 International Conference Jinyoung Lee and Hong-Goo Kang "Stacked U-Net with High-level Feature Transfer for Parameter Efficient Speech Enhancement" in APSIPA ASC, 2021
124 International Conference Huu-Kim Nguyen, Kihyuk Jeong, Seyun Um, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "LiteTTS: A Decoder-free Light-weight Text-to-wave Synthesis Based on Generative Adversarial Networks" in INTERSPEECH, 2021
123 International Conference Zainab Alhakeem, Yoohwan Kwon, Hong-Goo Kang "Disentangled Representations for Arabic Dialect Identification based on Supervised Clustering with Triplet Loss" in EUSIPCO, 2021
122 International Conference Miseul Kim, Minh-Tri Ho, Hong-Goo Kang "Self-supervised Complex Network for Machine Sound Anomaly Detection" in EUSIPCO, 2021
121 International Conference Kihyuk Jeong, Huu-Kim Nguyen, Hong-Goo Kang "A Fast and Lightweight Text-To-Speech Model with Spectrum and Waveform Alignment Algorithms" in EUSIPCO, 2021
120 International Conference Jiyoung Lee*, Soo-Whan Chung*, Sunok Kim, Hong-Goo Kang**, Kwanghoon Sohn** "Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation" in CVPR, 2021
119 International Conference Zainab Alhakeem, Hong-Goo Kang "Confidence Learning from Noisy Labels for Arabic Dialect Identification" in ITC-CSCC, 2021