Papers

Content-Aware Style Augmentation for Zero-Shot Voice Conversion With Short Target Speech

International Journal
2021~
작성자
차현진
작성일
2026-01-21 01:06
조회
718
Authors : Hyeonjin Cha, Seyun Um, Miseul Kim, Changhwan Kim, Seungshin Lee, Hong-Goo Kang

Year : 2025

Publisher / Conference : IEEE Signal Processing Letters

Volume : 33

Page : 66-70

Research area : Speech Signal Processing, Speech Synthesis


Presentation/Publication date : 2025.11.24

Related project : 전 차종 음성 안내 다양성 확보를 위한 개인화 음성 합성 엔진 알고리즘 연구 (현대자동차(주)남양연구소)

Presentation : None

In this letter, we propose a neural zero-shot voice conversion (ZS-VC) system that simultaneously achieves high speaker similarity and speech intelligibility by incorporating a content-aware style generation module. Although recent neural ZS-VC systems have shown strong performance in either speaker similarity or speech intelligibility, attaining high performance in both remains challenging, especially when only a short target speech sample is available. We attribute this limitation to the insufficient content problem—where the linguistic content of the target speech fails to fully cover that of the source speech. To address this issue, we introduce a method that augments the target speaker’s style features for underrepresented content using self-supervised feature generation. Experimental results demonstrate that the proposed system, when integrated with the feature matching-based approach kNN-VC, outperforms existing methods in both key metrics. Demo samples are available at https://hyeonjincha.github.io/.
전체 381
67 International Journal Hyeonjin Cha, Seyun Um, Miseul Kim, Changhwan Kim, Seungshin Lee, Hong-Goo Kang "Content-Aware Style Augmentation for Zero-Shot Voice Conversion With Short Target Speech" in IEEE Signal Processing Letters, vol.33, pp.66-70, 2025
66 International Journal Hyewon Han, Xiulian Peng, Doyeon Kim, Yan Lu, Hong-Goo Kang "Dual-Branch Guidance Encoder for Robust Acoustic Echo Suppression" in IEEE Transactions on Audio, Speech and Language Processing (TASLP), vol.33, pp.627 - 639, 2025
65 International Journal Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang "Perceptual Neural Audio Coding with Modified Discrete Cosine Transform" in IEEE Journal of Special Topics in Signal Processing (JSTSP), 2024
64 International Journal Zainab Alhakeem, Se-In Jang, Hong-Goo Kang "Disentangled Representations in Local-Global Contexts for Arabic Dialect Identification" in Transactions on Audio, Speech, and Language Processing, 2024
63 International Journal Hyungchan Yoon, Changhwan Kim, Seyun Um, Hyun-Wook Yoon, Hong-Goo Kang "SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems" in IEEE Signal Processing Letters, vol.30, pp.593-597, 2023
62 International Journal Jinyoung Lee, Hong-Goo Kang "Real-Time Neural Speech Enhancement Based on Temporal Refinement Network and Channel-Wise Gating Methods" in Digital Signal Processing, vol.133, 2023
61 International Journal Taemin Kim, Yejee Shin, Kyowon Kang, Kiho Kim, Gwanho Kim, Yunsu Byeon, Hwayeon Kim, Yuyan Gao, Jeong Ryong Lee, Geonhui Son, Taeseong Kim, Yohan Jun, Jihyun Kim, Jinyoung Lee, Seyun Um, Yoohwan Kwon, Byung Gwan Son, Myeongki Cho, Mingyu Sang, Jongwoon Shin, Kyubeen Kim, Jungmin Suh, Heekyeong Choi, Seokjun Hong, Huanyu Cheng, Hong-Goo Kang, Dosik Hwang & Ki Jun Yu "Ultrathin crystalline-silicon-based strain gauges with deep learning algorithms for silent speech interfaces" in Nature Communications, vol.13, 2022
60 International Journal Jinyoung Lee, Hong-Goo Kang "Two-Stage Refinement of Magnitude and Complex Spectra for Real-Time Speech Enhancement" in IEEE Signal Processing Letters, vol.29, pp.2188-2192, 2022
59 International Journal Kyungguen Byun, Seyun Um, Hong-Goo Kang "Length-Normalized Representation Learning for Speech Signals" in IEEE Access, vol.10, pp.60362-60372, 2022
58 International Journal Young-Sun Joo, Hanbin Bae, Young-Ik Kim, Hoon-Young Cho, Hong-Goo Kang "Effective Emotion Transplantation in an End-to-End Text-to-Speech System" in IEEE Access, vol.8, pp.161713-161719, 2020