Papers

An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis

International Journal
2016~2020
작성자
이진영
작성일
2019-09-01 22:16
조회
1945
Authors : Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang

Year : 2019

Publisher / Conference : IEEE Signal Processing Letters

Volume : 26, issue 9

Page : 1383-1387

In this letter, we propose a high-quality emotional speech synthesis system, using emotional vector space, i.e., the weighted sum of global style tokens (GSTs). Our previous research verified the feasibility of GST-based emotional speech synthesis in an end-to-end text-to-speech synthesis framework. However, selecting appropriate reference audio (RA) signals to extract emotion embedding vectors to the specific types of target emotions remains problematic. To ameliorate the selection problem, we propose an effective way of generating emotion embedding vectors by utilizing the trained GSTs. By assuming that the trained GSTs represent an emotional vector space, we first investigate the distribution of all the training samples depending on the type of each emotion. We then regard the centroid of the distribution as an emotion-specific weighting value, which effectively controls the expressiveness of synthesized speech, even without using the RA for guidance, as it did before. Finally, we confirm that the proposed controlled weight-based method is superior to the conventional emotion label-based methods in terms of perceptual quality and emotion classification accuracy.
전체 363
323 International Journal Kyungguen Byun, Seyun Um, Hong-Goo Kang "Length-Normalized Representation Learning for Speech Signals" in IEEE Access, vol.10, pp.60362-60372, 2022
322 International Conference Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang "Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement" in ICASSP, 2022
321 International Conference Chanwoo Lee, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang "Progressive Multi-Stage Neural Audio Coding with Guided References" in ICASSP, 2022
320 International Conference Jihyun Lee, Hyungseob Lim, Chanwoo Lee, Inseon Jang, Hong-Goo Kang "Adversarial Audio Synthesis Using a Harmonic-Percussive Discriminator" in ICASSP, 2022
319 International Conference Jinyoung Lee and Hong-Goo Kang "Stacked U-Net with High-level Feature Transfer for Parameter Efficient Speech Enhancement" in APSIPA ASC, 2021
318 International Conference Huu-Kim Nguyen, Kihyuk Jeong, Seyun Um, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "LiteTTS: A Decoder-free Light-weight Text-to-wave Synthesis Based on Generative Adversarial Networks" in INTERSPEECH, 2021
317 International Conference Zainab Alhakeem, Yoohwan Kwon, Hong-Goo Kang "Disentangled Representations for Arabic Dialect Identification based on Supervised Clustering with Triplet Loss" in EUSIPCO, 2021
316 International Conference Miseul Kim, Minh-Tri Ho, Hong-Goo Kang "Self-supervised Complex Network for Machine Sound Anomaly Detection" in EUSIPCO, 2021
315 International Conference Kihyuk Jeong, Huu-Kim Nguyen, Hong-Goo Kang "A Fast and Lightweight Text-To-Speech Model with Spectrum and Waveform Alignment Algorithms" in EUSIPCO, 2021
314 International Conference Jiyoung Lee*, Soo-Whan Chung*, Sunok Kim, Hong-Goo Kang**, Kwanghoon Sohn** "Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation" in CVPR, 2021