Papers

A Fast and Lightweight Text-To-Speech Model with Spectrum and Waveform Alignment Algorithms

International Conference
2021~
작성자
한혜원
작성일
2021-08-30 11:05
조회
609
Authors : Kihyuk Jeong, Huu-Kim Nguyen, Hong-Goo Kang

Year : 2021

Publisher / Conference : EUSIPCO

Research area : Speech Signal Processing, Text-to-Speech

In this paper, we propose a fast and lightweight text-to-speech (TTS) model that generates high-quality speech even in CPU-only environments. By leveraging the front-end architecture of FastSpeech2, we adopt an effective generative adversarial network (GAN) framework for waveform synthesis, which enables training the proposed model in a fully end-to-end manner. Since the waveform generator consists of smallsize convolutional networks, its inference speed is tremendously fast and the number of network parameters can be reduced by half compared to the FastSpeech2 model. However, the generated waveform segments are often not time-aligned with reference ones because of utilizing the predicted duration, which reduces the reliability of the discriminator module in the GAN framework. To solve the time mis-alignment problem, we propose a waveform alignment algorithm that synchronizes timing information between the reference and generated waveforms. In addition to the waveform aligning task, we include an auxiliary mel-spectrogram prediction task to further enhance perceptual quality. Since this task is only required for training, it does not increase the computational complexity during the inference stage. Objective and subjective experimental results show that the synthesized quality of the proposed model is comparable to that of conventional approaches.
전체 327
14 International Journal Jinyoung Lee, Hong-Goo Kang "Two-Stage Refinement of Magnitude and Complex Spectra for Real-Time Speech Enhancement" in IEEE Signal Processing Letters, vol.29, pp.2188-2192, 2022
13 International Conference Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang "Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting" in INTERSPEECH (*Best Student Paper Finalist), 2022
12 International Journal Kyungguen Byun, Se-yun Um, Hong-Goo Kang "Length-Normalized Representation Learning for Speech Signals" in IEEE Access, vol.10, pp.60362-60372, 2022
11 International Conference Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang "Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement" in ICASSP, 2022
10 International Conference Chanwoo Lee, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang "Progressive Multi-Stage Neural Audio Coding with Guided References" in ICASSP, 2022
9 International Conference Jihyun Lee, Hyungseob Lim, Chanwoo Lee, Inseon Jang, Hong-Goo Kang "Adversarial Audio Synthesis Using a Harmonic-Percussive Discriminator" in ICASSP, 2022
8 International Conference Jinyoung Lee and Hong-Goo Kang "Stacked U-Net with High-level Feature Transfer for Parameter Efficient Speech Enhancement" in APSIPA ASC, 2021
7 International Conference Huu-Kim Nguyen, Kihyuk Jeong, Se-Yun Um, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "LiteTTS: A Decoder-free Light-weight Text-to-wave Synthesis Based on Generative Adversarial Networks" in INTERSPEECH, 2021
6 International Conference Zainab Alhakeem, Yoohwan Kwon, Hong-Goo Kang "Disentangled Representations for Arabic Dialect Identification based on Supervised Clustering with Triplet Loss" in EUSIPCO, 2021
5 International Conference Miseul Kim, Minh-Tri Ho, Hong-Goo Kang "Self-supervised Complex Network for Machine Sound Anomaly Detection" in EUSIPCO, 2021