Papers

Phase-Sensitive Joint Learning Algorithms for Deep Learning-Based Speech Enhancement

International Journal
2016~2020
작성자
이진영
작성일
2018-08-01 22:09
조회
1967
Authors : Jinkyu Lee, Jan Skoglund, Turaj Shabestary, Hong-Goo Kang

Year : 2018

Publisher / Conference : IEEE Signal Processing Letters

Volume : 25, issue 8

Page : 1276-1280

This letter presents a phase-sensitive joint learning algorithm for single-channel speech enhancement. Although a deep learning framework that estimates the time-frequency (T-F) domain ideal ratio masks demonstrates a strong performance, it is limited in the sense that the enhancement process is performed only in the magnitude domain, while the phase spectra remain unchanged. Thus, recent studies have been conducted to involve phase spectra in speech enhancement systems. A phase-sensitive mask (PSM) is a T-F mask that implicitly represents phase-related information. However, since the PSM has an unbounded value, the networks are trained to target its truncated values rather than directly estimating it. To effectively train the PSM, we first approximate it to have a bounded dynamic range under the assumption that speech and noise are uncorrelated. We then propose a joint learning algorithm that trains the approximated value through its parameterized variables in order to minimize the inevitable error caused by the truncation process. Specifically, we design a network that explicitly targets three parameterized variables: 1) speech magnitude spectra; 2) noise magnitude spectra; and 3) phase difference of clean to noisy spectra. To further improve the performance, we also investigate how the dynamic range of magnitude spectra controlled by a warping function affects the final performance in joint learning algorithms. Finally, we examined how the proposed additional constraint that preserves the sum of the estimated speech and noise power spectra affects the overall system performance. The experimental results show that the proposed learning algorithm outperforms the conventional learning algorithm with the truncated phase-sensitive approximation.
전체 364
294 International Conference Seyun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "Emotional Speech Synthesis with Rich and Granularized Control" in ICASSP, 2020
293 International Conference Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang "Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network" in ICASSP, 2020
292 International Journal Soo-Whan Chung, Joon Son Chung, Hong Goo Kang "Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval" in IEEE Journal of Selected Topics in Signal Processing, vol.14, issue 3, 2020
291 International Conference Hyeonjoo Kang, Young-Sun Joo, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang "A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis" in APSIPA, 2019
290 International Journal Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis" in IEEE Signal Processing Letters, vol.26, issue 9, pp.1383-1387, 2019
289 International Conference Min-Jae Hwang, Hong-Goo Kang "Parameter enhancement for MELP speech codec in noisy communication environment" in INTERSPEECH, 2019
288 Domestic Journal 오상신, 엄세연, 장인선, 안충현, 강홍구 "k-평균 알고리즘을 활용한 음성의 대표 감정 스타일 결정 방법" in 한국음향학회지, vol.38, 제 5호, pp.614-620, 2019
287 International Journal Jinkyu Lee, Hong-Goo Kang "A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems" in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.27, issue 6, pp.1098-1108, 2019
286 International Conference Keulbit Kim, Jinkyu Lee, Jan Skoglund, Hong-Goo Kang "Model Order Selection for Wind Noise Reduction in Non-negative Matrix Factorization" in ITC-CSCC, 2019
285 International Conference Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework" in ITC-CSCC, 2019