Papers

Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework

International Conference
2016~2020
작성자
한혜원
작성일
2019-06-01 16:43
조회
4048
Authors : Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang

Year : 2019

Publisher / Conference : ITC-CSCC

In this paper, we propose a speech synthesis system that effectively generates multiple types of emotional speech using the concept of global style token (GST); where the emotion-related style information is presented by an additional style embedding vector. Although the GST is not a new idea, no one has been utilized the idea for an emotional speech synthesis task. We explicitly combine the GST idea with the Tacotron2 framework to implement an emotional text-to-speech system. The analysis results demonstrate that the proposed GST structure successfully transfers various types of emotional information to the synthesized speech. Subjective listening tests to evaluate the naturalness and emotional expression of synthesized speech are conducted to verify the superiority of the proposed algorithm.
전체 372
124 International Conference Huu-Kim Nguyen, Kihyuk Jeong, Seyun Um, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "LiteTTS: A Decoder-free Light-weight Text-to-wave Synthesis Based on Generative Adversarial Networks" in INTERSPEECH, 2021
123 International Conference Zainab Alhakeem, Yoohwan Kwon, Hong-Goo Kang "Disentangled Representations for Arabic Dialect Identification based on Supervised Clustering with Triplet Loss" in EUSIPCO, 2021
122 International Conference Miseul Kim, Minh-Tri Ho, Hong-Goo Kang "Self-supervised Complex Network for Machine Sound Anomaly Detection" in EUSIPCO, 2021
121 International Conference Kihyuk Jeong, Huu-Kim Nguyen, Hong-Goo Kang "A Fast and Lightweight Text-To-Speech Model with Spectrum and Waveform Alignment Algorithms" in EUSIPCO, 2021
120 International Conference Jiyoung Lee*, Soo-Whan Chung*, Sunok Kim, Hong-Goo Kang**, Kwanghoon Sohn** "Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation" in CVPR, 2021
119 International Conference Zainab Alhakeem, Hong-Goo Kang "Confidence Learning from Noisy Labels for Arabic Dialect Identification" in ITC-CSCC, 2021
118 International Conference Huu-Kim Nguyen, Kihyuk Jeong, Hong-Goo Kang "Fast and Lightweight Speech Synthesis Model based on FastSpeech2" in ITC-CSCC, 2021
117 International Conference Yoohwan Kwon*, Hee-Soo Heo*, Bong-Jin Lee, Joon Son Chung "The ins and outs of speaker recognition: lessons from VoxSRC 2020" in ICASSP, 2021
116 International Conference You Jin Kim, Hee Soo Heo, Soo-Whan Chung, Bong-Jin Lee "End-to-end Lip Synchronisation Based on Pattern Classification" in IEEE Spoken Language Technology Workshop (SLT), 2020
115 International Conference Seong Min Kye, Yoohwan Kwon, Joon Son Chung "Cross Attentive Pooling for Speaker Verification" in IEEE Spoken Language Technology Workshop (SLT), 2020