Papers

Real-Time Neural Speech Enhancement Based on Temporal Refinement Network and Channel-Wise Gating Methods

International Journal
2021~
작성자
dsp
작성일
2023-01-30 16:43
조회
1051
Authors : Jinyoung Lee, Hong-Goo Kang

Year : 2023

Publisher / Conference : Digital Signal Processing

Volume : 133

Research area : Speech Signal Processing, Speech Enhancement

Presentation/Publication date : 08 December 2022

Presentation : None

Neural speech enhancement systems have seen dramatic improvements in performance recently. However, it is still difficult to create systems that can operate in real-time, with low delay, low complexity, and causality. In this paper, we propose a temporal and channel attention framework for a U-Net-based speech enhancement architecture that uses short analysis frame lengths. Specifically, we propose an attention-based temporal refinement network (TRN) that estimates convolutional features subject to the importance of temporal location. By adding the TRN output to the channel-attentive convolution output, we can further enhance speech-related features even in low-attentive channel outputs. To further improve the representation power of the convolutional features, we also apply a squeeze-and-excitation (SE)-based channel attention mechanism for three different network modules: main convolutional blocks after processing the TRN, skip connections, and residual connections in the bottleneck recurrent neural network (RNN) layer. In particular, a channel-wise gate architecture placed on the skip connections and residual connections reliably controls the data flow, which avoids transferring redundant information to the following stages. We show the effectiveness of the proposed TRN and channel-wise gating methods by visualizing the spectral characteristics of the corresponding features, evaluating overall enhancement performance, and performing ablation studies in various configurations. Our proposed real-time enhancement system outperforms several recent neural enhancement models in terms of quality, model size, and complexity.
전체 355
7 International Conference Yeona Hong, Miseul Kim, Woo-Jin Chung, Hong-Goo Kang "Contextual Learning for Missing Speech Automatic Speech Recognition" in International Conference on Electronics, Information, and Communication (ICEIC), 2024
6 International Conference Zhenyu Piao, Hyungseob Lim, Miseul Kim, Hong-goo Kang "PDF-NET: Pitch-adaptive Dynamic Filter Network for Intra-gender Speaker Verification" in APSIPA ASC, 2023
5 International Conference Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang "BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0" in The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2023
4 International Conference Zhenyu Piao, Miseul Kim, Hyungchan Yoon, Hong-Goo Kang "HappyQuokka System for ICASSP 2023 Auditory EEG Challenge" in ICASSP, 2023
3 International Conference Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang "Style Modeling for Multi-Speaker Articulation-to-Speech" in ICASSP, 2023
2 International Conference Miseul Kim, Zhenyu Piao, Seyun Um, Ran Lee, Jaemin Joh, Seungshin Lee, Hong-Goo Kang "Light-Weight Speaker Verification with Global Context Information" in INTERSPEECH, 2022
1 International Conference Miseul Kim, Minh-Tri Ho, Hong-Goo Kang "Self-supervised Complex Network for Machine Sound Anomaly Detection" in EUSIPCO, 2021