Papers

A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems

International Journal
2016~2020
작성자
이진영
작성일
2019-06-01 22:12
조회
359
Authors : Jinkyu Lee, Hong-Goo Kang

Year : 2019

Publisher / Conference : IEEE/ACM Transactions on Audio, Speech, and Language Processing

Volume : 27, issue 6

Page : 1098-1108

This paper presents a joint learning algorithm for complex-valued time-frequency (T-F) masks in single-channel speech enhancement systems. Most speech enhancement algorithms operating in a single-channel microphone environment aim to enhance the magnitude component in a T-F domain, while the input noisy phase component is used directly without any processing. Consequently, the mismatch between the processed magnitude and the unprocessed phase degrades the sound quality. To address this issue, a learning method of targeting a T-F mask that is defined in a complex domain has recently been proposed. However, due to a wide dynamic range and an irregular spectrogram pattern of the complex-valued T-F mask, the learning process is difficult even with a large-scale deep learning network. Moreover, the learning process targeting the T-F mask itself does not directly minimize the distortion in spectra or time domains. In order to address these concerns, we focus on three issues: 1) an effective estimation of complex numbers with a wide dynamic range; 2) a learning method that is directly related to speech enhancement performance; and 3) a way to resolve the mismatch between the estimated magnitude and phase spectra. In this study, we propose objective functions that can solve each of these issues and train the network by minimizing them with a joint learning framework. The evaluation results demonstrate that the proposed learning algorithm achieves significant performance improvement in various objective measures and subjective preference listening test.
전체 326
296 Domestic Conference 이성현, 강홍구 "딥러닝 기반 종단 간 다채널 음질 개선 알고리즘" in 전자공학회 하계학술대회, pp.968-970, 2020
295 Domestic Conference 임정운, 김지현, 강홍구 "메타러닝을 이용한 SAR 영상 자동표적 인식" in 한국항공우주학회 2020 춘계학술대회, pp.353-354, 2020
294 International Conference Se-Yun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "Emotional Speech Synthesis with Rich and Granularized Control" in ICASSP, 2020
293 International Conference Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang "Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network" in ICASSP, 2020
292 International Journal Soo-Whan Chung, Joon Son Chung, Hong Goo Kang "Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval" in IEEE Journal of Selected Topics in Signal Processing, vol.14, issue 3, 2020
291 International Conference Hyeonjoo Kang, Young-Sun Joo, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang "A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis" in APSIPA, 2019
290 International Journal Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis" in IEEE Signal Processing Letters, vol.26, issue 9, pp.1383-1387, 2019
289 International Conference Min-Jae Hwang, Hong-Goo Kang "Parameter enhancement for MELP speech codec in noisy communication environment" in INTERSPEECH, 2019
288 Domestic Journal 오상신, 엄세연, 장인선, 안충현, 강홍구 "k-평균 알고리즘을 활용한 음성의 대표 감정 스타일 결정 방법" in 한국음향학회지, vol.38, 제 5호, pp.614-620, 2019
287 International Journal Jinkyu Lee, Hong-Goo Kang "A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems" in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.27, issue 6, pp.1098-1108, 2019