Papers

Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech

International Conference
2021~
작성자
dsp
작성일
2023-08-11 11:08
조회
540
Authors : Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang

Year : 2023

Publisher / Conference : INTERSPEECH

Research area : Speech Signal Processing, Text-to-Speech

Presentation : Poster

For personalized speech generation, a neural text-to-speech (TTS) model must be successfully implemented with limited data from a target speaker. To this end, the baseline TTS model needs to be amply generalized to out-of-domain data (i.e., target speaker’s speech). However, approaches to address this out-of-domain generalization problem in TTS have yet to be thoroughly studied. In this work, we propose an effective pruning method for a transformer known as sparse attention, to improve the TTS model’s generalization abilities. In particular, we prune off redundant connections from self-attention layers whose attention weights are below the threshold. To flexibly determine the pruning strength for searching optimal degree of generalization, we also propose a new differentiable pruning method that allows the model to automatically learn the thresholds. Evaluations on zero-shot multi-speaker TTS verify the effectiveness of our method in terms of voice quality and speaker similarity.
전체 355
18 International Conference Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang "Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech" in INTERSPEECH, 2023
17 International Conference Huu-Kim Nguyen, Kihyuk Jeong, Seyun Um, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "LiteTTS: A Decoder-free Light-weight Text-to-wave Synthesis Based on Generative Adversarial Networks" in INTERSPEECH, 2021
16 International Conference Suhyeon Oh, Hyungseob Lim, Kyungguen Byun, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis" in APSIPA (*awarded Best Paper), 2020
15 International Conference Min-Jae Hwang, Frank Soong, Eunwoo Song, Xi Wang, Hyeonjoo Kang, Hong-Goo Kang "LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis" in APSIPA, 2020
14 International Conference Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang "Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network" in ICASSP, 2020
13 International Conference Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang "Excitation-by-SampleRNN Model for Text-to-Speech" in ITC-CSCC, 2019
12 International Conference Min-Jae Hwang, Eunwoo Song, Jin-Seob Kim, Hong-Goo Kang "A Unified Framework for the Generation of Glottal Signals in Deep Learning-based Parametric Speech Synthesis Systems" in INTERSPEECH, 2018
11 International Conference Min-Jae Hwang, Eunwoo Song, Kyungguen Byun, Hong-Goo Kang "Modeling-by-Generation-Structured Noise Compensation Algorithm for Glottal Vocoding Speech Synthesis System" in ICASSP, 2018
10 International Conference Eunwoo Song, Frank K. Soong, Hong-Goo Kang "Perceptual quality and modeling accuracy of excitation parameters in DLSTM-based speech synthesis systems" in ASRU, 2017
9 International Journal Eunwoo Song, Frank K. Soong, Hong-Goo Kang "Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems" in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.25, issue 11, pp.2152-2161, 2017