Two-Stage Refinement of Magnitude and Complex Spectra for Real-Time Speech Enhancement
In this letter, we propose a two-stage network for performing speech enhancement that predicts magnitude spectra in the first stage and complex spectra in the second stage. To maximize the model’s performance at each stage, we propose two convolutional modules: magnitude spectral masking (MSM) and complex spectra refinement (CSR). Each module is designed to take
into account the specific characteristics of the signal type it handles. The MSM estimates multiplicative masks to remove noise in the magnitude component of the convolutional features, and the CSR refines the complex component of the convolutional features using additive features. By using these modules, our proposed two-stage enhancement model shows higher performance than previously proposed state-of-the-art algorithms. In addition, the number of parameters of our model is only 2.63 million, and it can operate in real time thanks to its causal characteristics and low computational complexity.
|4||International Journal||Jinyoung Lee, Hong-Goo Kang "Real-Time Neural Speech Enhancement Based on Temporal Refinement Network and Channel-Wise Gating Methods" in Digital Signal Processing, vol.133, 2023|
|3||International Journal||Taemin Kim, Yejee Shin, Kyowon Kang, Kiho Kim, Gwanho Kim, Yunsu Byeon, Hwayeon Kim, Yuyan Gao, Jeong Ryong Lee, Geonhui Son, Taeseong Kim, Yohan Jun, Jihyun Kim, Jinyoung Lee, Seyun Um, Yoohwan Kwon, Byung Gwan Son, Myeongki Cho, Mingyu Sang, Jongwoon Shin, Kyubeen Kim, Jungmin Suh, Heekyeong Choi, Seokjun Hong, Huanyu Cheng, Hong-Goo Kang, Dosik Hwang & Ki Jun Yu "Ultrathin crystalline-silicon-based strain gauges with deep learning algorithms for silent speech interfaces" in Nature Communications, vol.13, 2022|
|2||International Journal||Jinyoung Lee, Hong-Goo Kang "Two-Stage Refinement of Magnitude and Complex Spectra for Real-Time Speech Enhancement" in IEEE Signal Processing Letters, vol.29, pp.2188-2192, 2022|
|1||International Journal||Kyungguen Byun, Se-yun Um, Hong-Goo Kang "Length-Normalized Representation Learning for Speech Signals" in IEEE Access, vol.10, pp.60362-60372, 2022|