Papers

FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTS

International Conference
작성자
dsp
작성일
2022-06-16 17:07
조회
118
Authors : Changhwan Kim, Se-yun Um, Hyungchan Yoon, Hong-goo Kang

Year : 2022

Publisher / Conference : INTERSPEECH

Research area : Speech Signal Processing, Text-to-Speech, Speech Synthesis


Presentation : Poster

In this paper, we propose a method to flexibly control the local prosodic variation of a neural text-to-speech (TTS) model. To provide expressiveness for synthesized speech, conventional TTS models utilize utterance-wise global style embeddings that are obtained by compressing frame-level embeddings along the time axis. However, since utterance-wise global features do not contain sufficient information to represent the characteristics of word-level local features, they are not appropriate for direct use on controlling prosody at a fine scale. In multi-style TTS models, it is very important to have the capability to control local prosody because it plays a key role in finding the most appropriate text-to-speech pair among many one-to-many mapping candidates.
To explicitly present local prosodic characteristics to the contextual information of the corresponding input text, we propose a module to predict the fundamental frequency (F0) of each text by conditioning on the utterance-wise global style embedding. We also estimate multi-style embeddings using a multi-style encoder, which takes as inputs both a global utterance-wise embedding and a local F0 embedding. Our multi-style embedding enhances the naturalness and expressiveness of synthesized speech and is able to control prosody styles at the word-level or phoneme -level.
전체 326
326 International Conference Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang "Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting" in INTERSPEECH, 2022
325 International Conference Changhwan Kim, Se-yun Um, Hyungchan Yoon, Hong-goo Kang "FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTS" in INTERSPEECH, 2022
324 International Conference Miseul Kim, Zhenyu Piao, Seyun Um, Ran Lee, Jaemin Joh, Seungshin Lee, Hong-Goo Kang "Light-Weight Speaker Verification with Global Context Information" in INTERSPEECH, 2022
323 International Journal Kyungguen Byun, Se-yun Um, Hong-Goo Kang "Length-Normalized Representation Learning for Speech Signals" in IEEE Access, vol.10, pp.60362-60372, 2022
322 International Conference Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang "Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement" in ICASSP, 2022
321 International Conference Chanwoo Lee, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang "Progressive Multi-Stage Neural Audio Coding with Guided References" in ICASSP, 2022
320 International Conference Jihyun Lee, Hyungseob Lim, Chanwoo Lee, Inseon Jang, Hong-Goo Kang "Adversarial Audio Synthesis Using a Harmonic-Percussive Discriminator" in ICASSP, 2022
319 International Conference Jinyoung Lee and Hong-Goo Kang "Stacked U-Net with High-level Feature Transfer for Parameter Efficient Speech Enhancement" in APSIPA ASC, 2021
318 International Conference Huu-Kim Nguyen, Kihyuk Jeong, Se-Yun Um, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "LiteTTS: A Decoder-free Light-weight Text-to-wave Synthesis Based on Generative Adversarial Networks" in INTERSPEECH, 2021
317 International Conference Zainab Alhakeem, Yoohwan Kwon, Hong-Goo Kang "Disentangled Representations for Arabic Dialect Identification based on Supervised Clustering with Triplet Loss" in EUSIPCO, 2021