Papers

An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis

International Journal
2016~2020
작성자
이진영
작성일
2019-09-01 22:16
조회
1674
Authors : Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang

Year : 2019

Publisher / Conference : IEEE Signal Processing Letters

Volume : 26, issue 9

Page : 1383-1387

In this letter, we propose a high-quality emotional speech synthesis system, using emotional vector space, i.e., the weighted sum of global style tokens (GSTs). Our previous research verified the feasibility of GST-based emotional speech synthesis in an end-to-end text-to-speech synthesis framework. However, selecting appropriate reference audio (RA) signals to extract emotion embedding vectors to the specific types of target emotions remains problematic. To ameliorate the selection problem, we propose an effective way of generating emotion embedding vectors by utilizing the trained GSTs. By assuming that the trained GSTs represent an emotional vector space, we first investigate the distribution of all the training samples depending on the type of each emotion. We then regard the centroid of the distribution as an emotion-specific weighting value, which effectively controls the expressiveness of synthesized speech, even without using the RA for guidance, as it did before. Finally, we confirm that the proposed controlled weight-based method is superior to the conventional emotion label-based methods in terms of perceptual quality and emotion classification accuracy.
전체 355
48 International Journal Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis" in IEEE Signal Processing Letters, vol.26, issue 9, pp.1383-1387, 2019
47 International Conference Min-Jae Hwang, Hong-Goo Kang "Parameter enhancement for MELP speech codec in noisy communication environment" in INTERSPEECH, 2019
46 Domestic Journal 오상신, 엄세연, 장인선, 안충현, 강홍구 "k-평균 알고리즘을 활용한 음성의 대표 감정 스타일 결정 방법" in 한국음향학회지, vol.38, 제 5호, pp.614-620, 2019
45 International Journal Jinkyu Lee, Hong-Goo Kang "A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems" in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.27, issue 6, pp.1098-1108, 2019
44 International Conference Keulbit Kim, Jinkyu Lee, Jan Skoglund, Hong-Goo Kang "Model Order Selection for Wind Noise Reduction in Non-negative Matrix Factorization" in ITC-CSCC, 2019
43 International Conference Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework" in ITC-CSCC, 2019
42 International Conference Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang "Excitation-by-SampleRNN Model for Text-to-Speech" in ITC-CSCC, 2019
41 International Journal Seung-Chul Shin, Jinkyu Lee, Soyeon Choe, Hyuk In Yang, Jihee Min, Ki-Yong Ahn, Justin Y. Jeon, Hong-Goo Kang "Dry Electrode-Based Body Fat Estimation System with Anthropometric Data for Use in a Wearable Device" in Sensors, vol.19, issue 9, 2019
40 International Conference Yang Yuan, Soo-Whan Chung, Hong-Goo Kang "Gradient-based active learning query strategy for end-to-end speech recognition" in ICASSP, 2019
39 International Conference Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang "Perfect match: Improved cross-modal embeddings for audio-visual synchronisation" in ICASSP, 2019