Real-Time Neural Speech Enhancement Based on Temporal Refinement Network and Channel-Wise Gating Methods

International Journal
2023-01-30 16:43
Authors : Jinyoung Lee, Hong-Goo Kang

Year : 2023

Publisher / Conference : Digital Signal Processing

Volume : 133

Research area : Speech Signal Processing, Speech Enhancement

Presentation/Publication date : 08 December 2022

Presentation : None

Neural speech enhancement systems have seen dramatic improvements in performance recently. However, it is still difficult to create systems that can operate in real-time, with low delay, low complexity, and causality. In this paper, we propose a temporal and channel attention framework for a U-Net-based speech enhancement architecture that uses short analysis frame lengths. Specifically, we propose an attention-based temporal refinement network (TRN) that estimates convolutional features subject to the importance of temporal location. By adding the TRN output to the channel-attentive convolution output, we can further enhance speech-related features even in low-attentive channel outputs. To further improve the representation power of the convolutional features, we also apply a squeeze-and-excitation (SE)-based channel attention mechanism for three different network modules: main convolutional blocks after processing the TRN, skip connections, and residual connections in the bottleneck recurrent neural network (RNN) layer. In particular, a channel-wise gate architecture placed on the skip connections and residual connections reliably controls the data flow, which avoids transferring redundant information to the following stages. We show the effectiveness of the proposed TRN and channel-wise gating methods by visualizing the spectral characteristics of the corresponding features, evaluating overall enhancement performance, and performing ablation studies in various configurations. Our proposed real-time enhancement system outperforms several recent neural enhancement models in terms of quality, model size, and complexity.
전체 345
335 International Conference Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang "Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech" in INTERSPEECH, 2023
334 International Conference Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang "HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders" in INTERSPEECH, 2023
333 International Conference Zhenyu Piao, Miseul Kim, Hyungchan Yoon, Hong-Goo Kang "HappyQuokka System for ICASSP 2023 Auditory EEG Challenge" in ICASSP, 2023
332 International Conference Byeong Hyeon Kim, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang "Progressive Multi-Stage Neural Audio Codec with Psychoacoustic Loss and Discriminator" in ICASSP, 2023
331 International Conference Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang "End-to-End Neural Audio Coding in the MDCT Domain" in ICASSP, 2023
330 International Conference Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang "Style Modeling for Multi-Speaker Articulation-to-Speech" in ICASSP, 2023
329 International Journal Jinyoung Lee, Hong-Goo Kang "Real-Time Neural Speech Enhancement Based on Temporal Refinement Network and Channel-Wise Gating Methods" in Digital Signal Processing, vol.133, 2023
328 International Journal Taemin Kim, Yejee Shin, Kyowon Kang, Kiho Kim, Gwanho Kim, Yunsu Byeon, Hwayeon Kim, Yuyan Gao, Jeong Ryong Lee, Geonhui Son, Taeseong Kim, Yohan Jun, Jihyun Kim, Jinyoung Lee, Seyun Um, Yoohwan Kwon, Byung Gwan Son, Myeongki Cho, Mingyu Sang, Jongwoon Shin, Kyubeen Kim, Jungmin Suh, Heekyeong Choi, Seokjun Hong, Huanyu Cheng, Hong-Goo Kang, Dosik Hwang & Ki Jun Yu "Ultrathin crystalline-silicon-based strain gauges with deep learning algorithms for silent speech interfaces" in Nature Communications, vol.13, 2022
327 International Journal Jinyoung Lee, Hong-Goo Kang "Two-Stage Refinement of Magnitude and Complex Spectra for Real-Time Speech Enhancement" in IEEE Signal Processing Letters, vol.29, pp.2188-2192, 2022
326 International Conference Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang "Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting" in INTERSPEECH (*Best Student Paper Finalist), 2022