Papers

Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech

International Conference
2021~
작성자
dsp
작성일
2023-08-11 11:10
조회
1449
Authors : Hyungchan Yoon, Seyun Um, Changhwan Kim, Hong-Goo Kang

Year : 2023

Publisher / Conference : INTERSPEECH

Research area : Speech Signal Processing, Text-to-Speech

Presentation : Oral

To simplify the generation process, several text-to-speech (TTS) systems implicitly learn intermediate latent representations instead of relying on predefined features (e.g., mel-spectrogram). However, their generation quality is unsatisfactory as these representations lack speech variances. In this paper, we improve TTS performance by adding prosody embeddings to the latent representations. During training, we extract reference prosody embeddings from mel-spectrograms, and during inference, we estimate these embeddings from text using generative adversarial networks (GANs). Using GANs, we reliably estimate the prosody embeddings in a fast way, which have complex distributions due to the dynamic nature of speech. We also show that the prosody embeddings work as efficient features for learning a robust alignment between text and acoustic features. Our proposed model surpasses several publicly available models with less parameters and computational complexity in comparative experiments.
전체 365
355 International Conference Hyewon Han, Naveen Kumar "A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings" in Hands-free Speech Communication and Microphone Arrays (HSCMA, Satellite workshop in ICASSP), 2024
354 International Conference Yanjue Song, Doyeon Kim, Nilesh Madhu, Hong-Goo Kang "On the Disentanglement and Robustness of Self-Supervised Speech Representations" in International Conference on Electronics, Information, and Communication (ICEIC) (*awarded Best Paper), 2024
353 International Conference Yeona Hong, Miseul Kim, Woo-Jin Chung, Hong-Goo Kang "Contextual Learning for Missing Speech Automatic Speech Recognition" in International Conference on Electronics, Information, and Communication (ICEIC), 2024
352 International Conference Juhwan Yoon, Seyun Um, Woo-Jin Chung, Hong-Goo Kang "SC-ERM: Speaker-Centric Learning for Speech Emotion Recognition" in International Conference on Electronics, Information, and Communication (ICEIC), 2024
351 International Conference Hejung Yang, Hong-Goo Kang "On Fine-Tuning Pre-Trained Speech Models With EMA-Target Self-Supervised Loss" in ICASSP, 2024
350 International Journal Zainab Alhakeem, Se-In Jang, Hong-Goo Kang "Disentangled Representations in Local-Global Contexts for Arabic Dialect Identification" in Transactions on Audio, Speech, and Language Processing, 2024
349 International Conference Hong-Goo Kang, W. Bastiaan Kleijn, Jan Skoglund, Michael Chinen "Convolutional Transformer for Neural Speech Coding" in Audio Engineering Society Convention, 2023
348 International Conference Hong-Goo Kang, Jan Skoglund, W. Bastiaan Kleijn, Andrew Storus, Hengchin Yeh "A High-Rate Extension to Soundstream" in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
347 International Conference Zhenyu Piao, Hyungseob Lim, Miseul Kim, Hong-goo Kang "PDF-NET: Pitch-adaptive Dynamic Filter Network for Intra-gender Speaker Verification" in APSIPA ASC, 2023
346 International Conference WooSeok Ko, Seyun Um, Zhenyu Piao, Hong-goo Kang "Consideration of Varying Training Lengths for Short-Duration Speaker Verification" in APSIPA ASC, 2023