Papers

A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems

International Journal
2016~2020
작성자
이진영
작성일
2019-06-01 22:12
조회
1439
Authors : Jinkyu Lee, Hong-Goo Kang

Year : 2019

Publisher / Conference : IEEE/ACM Transactions on Audio, Speech, and Language Processing

Volume : 27, issue 6

Page : 1098-1108

This paper presents a joint learning algorithm for complex-valued time-frequency (T-F) masks in single-channel speech enhancement systems. Most speech enhancement algorithms operating in a single-channel microphone environment aim to enhance the magnitude component in a T-F domain, while the input noisy phase component is used directly without any processing. Consequently, the mismatch between the processed magnitude and the unprocessed phase degrades the sound quality. To address this issue, a learning method of targeting a T-F mask that is defined in a complex domain has recently been proposed. However, due to a wide dynamic range and an irregular spectrogram pattern of the complex-valued T-F mask, the learning process is difficult even with a large-scale deep learning network. Moreover, the learning process targeting the T-F mask itself does not directly minimize the distortion in spectra or time domains. In order to address these concerns, we focus on three issues: 1) an effective estimation of complex numbers with a wide dynamic range; 2) a learning method that is directly related to speech enhancement performance; and 3) a way to resolve the mismatch between the estimated magnitude and phase spectra. In this study, we propose objective functions that can solve each of these issues and train the network by minimizing them with a joint learning framework. The evaluation results demonstrate that the proposed learning algorithm achieves significant performance improvement in various objective measures and subjective preference listening test.
전체 355
64 International Journal Zainab Alhakeem, Se-In Jang, Hong-Goo Kang "Disentangled Representations in Local-Global Contexts for Arabic Dialect Identification" in Transactions on Audio, Speech, and Language Processing, 2024
63 International Journal Hyungchan Yoon, Changhwan Kim, Seyun Um, Hyun-Wook Yoon, Hong-Goo Kang "SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems" in IEEE Signal Processing Letters, vol.30, pp.593-597, 2023
62 International Journal Jinyoung Lee, Hong-Goo Kang "Real-Time Neural Speech Enhancement Based on Temporal Refinement Network and Channel-Wise Gating Methods" in Digital Signal Processing, vol.133, 2023
61 International Journal Taemin Kim, Yejee Shin, Kyowon Kang, Kiho Kim, Gwanho Kim, Yunsu Byeon, Hwayeon Kim, Yuyan Gao, Jeong Ryong Lee, Geonhui Son, Taeseong Kim, Yohan Jun, Jihyun Kim, Jinyoung Lee, Seyun Um, Yoohwan Kwon, Byung Gwan Son, Myeongki Cho, Mingyu Sang, Jongwoon Shin, Kyubeen Kim, Jungmin Suh, Heekyeong Choi, Seokjun Hong, Huanyu Cheng, Hong-Goo Kang, Dosik Hwang & Ki Jun Yu "Ultrathin crystalline-silicon-based strain gauges with deep learning algorithms for silent speech interfaces" in Nature Communications, vol.13, 2022
60 International Journal Jinyoung Lee, Hong-Goo Kang "Two-Stage Refinement of Magnitude and Complex Spectra for Real-Time Speech Enhancement" in IEEE Signal Processing Letters, vol.29, pp.2188-2192, 2022
59 International Journal Kyungguen Byun, Seyun Um, Hong-Goo Kang "Length-Normalized Representation Learning for Speech Signals" in IEEE Access, vol.10, pp.60362-60372, 2022
58 International Journal Young-Sun Joo, Hanbin Bae, Young-Ik Kim, Hoon-Young Cho, Hong-Goo Kang "Effective Emotion Transplantation in an End-to-End Text-to-Speech System" in IEEE Access, vol.8, pp.161713-161719, 2020
57 International Journal Soo-Whan Chung, Joon Son Chung, Hong Goo Kang "Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval" in IEEE Journal of Selected Topics in Signal Processing, vol.14, issue 3, 2020
56 International Journal Ohsung Kwon, Inseon Jang, ChungHyun Ahn, Hong-Goo Kang "An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis" in IEEE Signal Processing Letters, vol.26, issue 9, pp.1383-1387, 2019
55 International Journal Jinkyu Lee, Hong-Goo Kang "A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems" in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.27, issue 6, pp.1098-1108, 2019