Papers

Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

International Conference
2021~
작성자
dsp
작성일
2023-08-11 11:08
조회
501
Authors : Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang

Year : 2023

Publisher / Conference : INTERSPEECH

Research area : Speech Signal Processing, Text-to-Speech

Presentation : Poster

For personalized speech generation, a neural text-to-speech (TTS) model must be successfully implemented with limited data from a target speaker. To this end, the baseline TTS model needs to be amply generalized to out-of-domain data (i.e., target speaker’s speech). However, approaches to address this out-of-domain generalization problem in TTS have yet to be thoroughly studied. In this work, we propose an effective pruning method for a transformer known as sparse attention, to improve the TTS model’s generalization abilities. In particular, we prune off redundant connections from self-attention layers whose attention weights are below the threshold. To flexibly determine the pruning strength for searching optimal degree of generalization, we also propose a new differentiable pruning method that allows the model to automatically learn the thresholds. Evaluations on zero-shot multi-speaker TTS verify the effectiveness of our method in terms of voice quality and speaker similarity.
전체 355
142 International Conference Seyun Um, Jihyun Kim, Jihyun Lee, Hong-Goo Kang "Facetron: A Multi-speaker Face-to-Speech Model based on Cross-Modal Latent Representations" in EUSIPCO, 2023
141 International Conference Hejung Yang, Hong-Goo Kang "Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement" in INTERSPEECH, 2023
140 International Conference Jihyun Kim, Hong-Goo Kang "Contrastive Learning based Deep Latent Masking for Music Source Seperation" in INTERSPEECH, 2023
139 International Conference Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang "MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion" in INTERSPEECH, 2023
138 International Conference Hyungchan Yoon, Seyun Um, Changhwan Kim, Hong-Goo Kang "Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech" in INTERSPEECH, 2023
137 International Conference Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang "Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech" in INTERSPEECH, 2023
136 International Conference Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang "HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders" in INTERSPEECH, 2023
135 International Conference Zhenyu Piao, Miseul Kim, Hyungchan Yoon, Hong-Goo Kang "HappyQuokka System for ICASSP 2023 Auditory EEG Challenge" in ICASSP, 2023
134 International Conference Byeong Hyeon Kim, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang "Progressive Multi-Stage Neural Audio Codec with Psychoacoustic Loss and Discriminator" in ICASSP, 2023
133 International Conference Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang "End-to-End Neural Audio Coding in the MDCT Domain" in ICASSP, 2023