Papers

Real-Time Neural Speech Enhancement Based on Temporal Refinement Network and Channel-Wise Gating Methods

International Journal
2021~
작성자
dsp
작성일
2023-01-30 16:43
조회
2525
Authors : Jinyoung Lee, Hong-Goo Kang

Year : 2023

Publisher / Conference : Digital Signal Processing

Volume : 133

Research area : Speech Signal Processing, Speech Enhancement

Presentation/Publication date : 08 December 2022

Presentation : None

Neural speech enhancement systems have seen dramatic improvements in performance recently. However, it is still difficult to create systems that can operate in real-time, with low delay, low complexity, and causality. In this paper, we propose a temporal and channel attention framework for a U-Net-based speech enhancement architecture that uses short analysis frame lengths. Specifically, we propose an attention-based temporal refinement network (TRN) that estimates convolutional features subject to the importance of temporal location. By adding the TRN output to the channel-attentive convolution output, we can further enhance speech-related features even in low-attentive channel outputs. To further improve the representation power of the convolutional features, we also apply a squeeze-and-excitation (SE)-based channel attention mechanism for three different network modules: main convolutional blocks after processing the TRN, skip connections, and residual connections in the bottleneck recurrent neural network (RNN) layer. In particular, a channel-wise gate architecture placed on the skip connections and residual connections reliably controls the data flow, which avoids transferring redundant information to the following stages. We show the effectiveness of the proposed TRN and channel-wise gating methods by visualizing the spectral characteristics of the corresponding features, evaluating overall enhancement performance, and performing ablation studies in various configurations. Our proposed real-time enhancement system outperforms several recent neural enhancement models in terms of quality, model size, and complexity.
전체 370
370 International Conference Yeona Hong, Hyewon Han, Woo-jin Chung, Hong-Goo Kang "StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models" in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
369 International Conference Sangmin Lee, Woojin Chung, Hong-Goo Kang "LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration" in Association for the Advancement of Artificial Intelligence (AAAI), 2025
368 International Journal Hyewon Han, Xiulian Peng, Doyeon Kim, Yan Lu, Hong-Goo Kang "Dual-Branch Guidance Encoder for Robust Acoustic Echo Suppression" in IEEE Transactions on Audio, Speech and Language Processing (TASLP), 2024
367 International Journal Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang "Perceptual Neural Audio Coding with Modified Discrete Cosine Transform" in IEEE Journal of Special Topics in Signal Processing (JSTSP), 2025
366 International Conference Juhwan Yoon, Hyungseob Lim, Hyeonjin Cha, Hong-Goo Kang "StylebookTTS: Zero-Shot Text-to-Speech Leveraging Unsupervised Style Representation" in APSIPA ASC, 2024
365 International Conference Doyeon Kim, Yanjue Song, Nilesh Madhu, Hong-Goo Kang "Enhancing Neural Speech Embeddings for Generative Speech Models" in APSIPA ASC, 2024
364 Domestic Conference 최웅집, 김병현, 강홍구 "자기 지도 학습 특징을 활용한 음성 신호의 논 블라인드 대역폭 확장" in 대한전자공학회 2024년도 하계종합학술대회, 2024
363 Domestic Conference 홍연아, 정우진, 강홍구 "효율적인 양자화 기법을 통한 DNN 기반 화자 인식 모델 최적화" in 대한전자공학회 2024년도 하계종합학술대회, 2024
362 Domestic Conference 김병현, 강홍구, 장인선 "저지연 조건하의 심층신경망 기반 음성 압축" in 한국방송·미디어공학회 2024년 하계학술대회, 2024
361 International Conference Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi "Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation" in INTERSPEECH, 2024