Papers

Phase-Sensitive Joint Learning Algorithms for Deep Learning-Based Speech Enhancement

International Journal
2016~2020
작성자
이진영
작성일
2018-08-01 22:09
조회
2956
Authors : Jinkyu Lee, Jan Skoglund, Turaj Shabestary, Hong-Goo Kang

Year : 2018

Publisher / Conference : IEEE Signal Processing Letters

Volume : 25, issue 8

Page : 1276-1280

This letter presents a phase-sensitive joint learning algorithm for single-channel speech enhancement. Although a deep learning framework that estimates the time-frequency (T-F) domain ideal ratio masks demonstrates a strong performance, it is limited in the sense that the enhancement process is performed only in the magnitude domain, while the phase spectra remain unchanged. Thus, recent studies have been conducted to involve phase spectra in speech enhancement systems. A phase-sensitive mask (PSM) is a T-F mask that implicitly represents phase-related information. However, since the PSM has an unbounded value, the networks are trained to target its truncated values rather than directly estimating it. To effectively train the PSM, we first approximate it to have a bounded dynamic range under the assumption that speech and noise are uncorrelated. We then propose a joint learning algorithm that trains the approximated value through its parameterized variables in order to minimize the inevitable error caused by the truncation process. Specifically, we design a network that explicitly targets three parameterized variables: 1) speech magnitude spectra; 2) noise magnitude spectra; and 3) phase difference of clean to noisy spectra. To further improve the performance, we also investigate how the dynamic range of magnitude spectra controlled by a warping function affects the final performance in joint learning algorithms. Finally, we examined how the proposed additional constraint that preserves the sum of the estimated speech and noise power spectra affects the overall system performance. The experimental results show that the proposed learning algorithm outperforms the conventional learning algorithm with the truncated phase-sensitive approximation.
전체 368
106 International Conference Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework" in ITC-CSCC, 2019
105 International Conference Kyungguen Byun, Eunwoo Song, Jinseob Kim, Jae-Min Kim, Hong-Goo Kang "Excitation-by-SampleRNN Model for Text-to-Speech" in ITC-CSCC, 2019
104 International Journal Seung-Chul Shin, Jinkyu Lee, Soyeon Choe, Hyuk In Yang, Jihee Min, Ki-Yong Ahn, Justin Y. Jeon, Hong-Goo Kang "Dry Electrode-Based Body Fat Estimation System with Anthropometric Data for Use in a Wearable Device" in Sensors, vol.19, issue 9, 2019
103 International Conference Yang Yuan, Soo-Whan Chung, Hong-Goo Kang "Gradient-based active learning query strategy for end-to-end speech recognition" in ICASSP, 2019
102 International Conference Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang "Perfect match: Improved cross-modal embeddings for audio-visual synchronisation" in ICASSP, 2019
101 International Conference Hyewon Han, Kyungguen Byun, Hong-Goo Kang "A Deep Learning-based Stress Detection Algorithm with Speech Signal" in Workshop on Audio-Visual Scene Understanding for Immersive Multimedia (AVSU’18), 2018
100 International Conference Min-Jae Hwang, Eunwoo Song, Jin-Seob Kim, Hong-Goo Kang "A Unified Framework for the Generation of Glottal Signals in Deep Learning-based Parametric Speech Synthesis Systems" in INTERSPEECH, 2018
99 International Journal Jinkyu Lee, Jan Skoglund, Turaj Shabestary, Hong-Goo Kang "Phase-Sensitive Joint Learning Algorithms for Deep Learning-Based Speech Enhancement" in IEEE Signal Processing Letters, vol.25, issue 8, pp.1276-1280, 2018
98 International Conference Haemin Yang, Soyeon Choe, Keulbit Kim, Hong-Goo Kang "Deep learning-based speech presence probability estimation for noise PSD estimation in single-channel speech enhancement" in ICSigSys, 2018
97 International Conference Min-Jae Hwang, Eunwoo Song, Kyungguen Byun, Hong-Goo Kang "Modeling-by-Generation-Structured Noise Compensation Algorithm for Glottal Vocoding Speech Synthesis System" in ICASSP, 2018