Papers

Fast and Lightweight Speech Synthesis Model based on FastSpeech2

International Conference

2021~

작성자

한혜원

작성일

2021-06-28 10:28

조회

1779

Authors : Huu-Kim Nguyen, Kihyuk Jeong, Hong-Goo Kang

Year : 2021

Publisher / Conference : ITC-CSCC

Research area : Speech Signal Processing, Text-to-Speech

Presentation : 구두

In this paper, we present a fast and lightweight speech synthesis model that is suitable for on-device applications. By leveraging the techniques of long-short range attention, depth-wise separable convolution, and linear attention, we significantly reduce the model size and complexity of the baseline FastSpeech2-based Transformer framework. Unlike the baseline model that requires O(N^2) to compute attention and convolution operations because of nested-loop computations, our proposed model only requires O(N) computations due to the modification of a nested-loop into two cascaded single loops.

« The ins and outs of speaker recognition: lessons from VoxSRC 2020

Confidence Learning from Noisy Labels for Arabic Dialect Identification »

목록보기

전체 355

315	International Conference	Kihyuk Jeong, Huu-Kim Nguyen, Hong-Goo Kang "A Fast and Lightweight Text-To-Speech Model with Spectrum and Waveform Alignment Algorithms" in EUSIPCO, 2021
314	International Conference	Jiyoung Lee, Soo-Whan Chung, Sunok Kim, Hong-Goo Kang, Kwanghoon Sohn "Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation" in CVPR, 2021
313	International Conference	Zainab Alhakeem, Hong-Goo Kang "Confidence Learning from Noisy Labels for Arabic Dialect Identification" in ITC-CSCC, 2021
312	International Conference	Huu-Kim Nguyen, Kihyuk Jeong, Hong-Goo Kang "Fast and Lightweight Speech Synthesis Model based on FastSpeech2" in ITC-CSCC, 2021
311	International Conference	Yoohwan Kwon, Hee-Soo Heo, Bong-Jin Lee, Joon Son Chung "The ins and outs of speaker recognition: lessons from VoxSRC 2020" in ICASSP, 2021
310	International Conference	You Jin Kim, Hee Soo Heo, Soo-Whan Chung, Bong-Jin Lee "End-to-end Lip Synchronisation Based on Pattern Classification" in IEEE Spoken Language Technology Workshop (SLT), 2020
309	International Conference	Seong Min Kye, Yoohwan Kwon, Joon Son Chung "Cross Attentive Pooling for Speaker Verification" in IEEE Spoken Language Technology Workshop (SLT), 2020
308	International Conference	Suhyeon Oh, Hyungseob Lim, Kyungguen Byun, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis" in APSIPA (*awarded Best Paper), 2020
307	International Conference	Hyeon-Kyeong Shin, Hyewon Han, Kyungguen Byun, Hong-Goo Kang "Speaker-invariant Psychological Stress Detection Using Attention-based Network" in APSIPA, 2020
306	International Conference	Min-Jae Hwang, Frank Soong, Eunwoo Song, Xi Wang, Hyeonjoo Kang, Hong-Goo Kang "LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis" in APSIPA, 2020

Fast and Lightweight Speech Synthesis Model based on FastSpeech2

Previous

Sister Lab.

Yonsei University

Academic Website