Papers

화자 및 발화 스타일 임베딩을 통한 다화자 음성합성 시스템 음질 향상

Domestic Conference

2016~2020

작성자

이지현

작성일

2020-08-01 21:52

조회

2284

Authors : 오태양, 정기혁, 강홍구

Year : 2020

Publisher / Conference : 전자공학회 하계학술대회

Page : 980-982

In this paper, we improve the speech quality of multi-speaker text-to-speech (TTS) system by adding two embedding networks that represent speaker and speaking style characteristics. The speaker embedding is extracted from a d-vector based encoder and speaking style embedding from a global style token (GST) encoder. Since two encoders compensate each other for well-representing speaker and speaking style, the quality of synthesized speech is very good. Subjective listening tests show that our proposed model outperforms the d-vector based Tacotron2 system.

« 딥러닝 기반 종단 간 다채널 음질 개선 알고리즘

화자 인식을 위한 적대학습 기반음성 분리 프레임워크에 대한 연구 »

목록보기

전체 355

Domestic Conference

오태양, 정기혁, 강홍구 "화자 및 발화 스타일 임베딩을 통한 다화자 음성합성 시스템 음질 향상" in 전자공학회 하계학술대회, pp.980-982, 2020

화자 및 발화 스타일 임베딩을 통한 다화자 음성합성 시스템 음질 향상

Previous

Sister Lab.

Yonsei University

Academic Website