Papers

Perceptual Neural Audio Coding with Modified Discrete Cosine Transform

International Journal
2021~
작성자
임형섭
작성일
2024-10-21 17:26
조회
905
Authors : Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang

Year : 2024

Publisher / Conference : IEEE Journal of Special Topics in Signal Processing (JSTSP)

Research area : Audio Signal Processing, Coding

Presentation : None

Despite efforts to leverage the modeling power of deep neural networks (DNNs) on audio coding, effectively deploying them in real-world applications is still problematic due to their high computational cost and the restricted range of target signals or achievable bit-rates. In this paper, we propose an alternative approach for integrating DNNs into a perceptual audio codec that allows for the optimization of the whole system in a data-driven, and end-to-end manner. The key idea of the proposed method is to make DNNs control the quantization noise in the classic transform coding framework, specifically based on the modified discrete cosine transform (MDCT). The proposal includes a new DNN-based mechanism for adaptively adjusting the quantization step sizes of frequency bands targeting an arbitrary bit-rate, eventually acting as a data-driven differentiable psychoacoustic model. The side information regarding the adaptive quantization is also encoded and decoded by DNNs via latent variables. The perceptual distortion during training is evaluated by a perceptual quality estimation model
trained on actual human ratings so that the proposed audio codec can effectively allocate bits considering their effect on the perceptual quality. Through comparisons with legacy audio codecs (MP3 and AAC) and a neural audio codec (EnCodec), we show that our method can achieve further coding gains over the legacy codecs with a substantially lower computational load on the decoder compared to other neural audio codecs.
전체 371
50 International Conference Stijn Kindt,Jihyun Kim,Hong-Goo Kang,Nilesh Madhu "Efficient, Cluster-Informed, Deep Speech Separation with Cross-Cluster Information in AD-HOC Wireless Acoustic Sensor Networks" in International Workshop on Acoustic Signal Enhancement (IWAENC), 2024
49 International Conference Yeona Hong, Hyewon Han, Woo-jin Chung, Hong-Goo Kang "StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models" in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
48 International Conference Sangmin Lee, Woojin Chung, Hong-Goo Kang "LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration" in Association for the Advancement of Artificial Intelligence (AAAI), 2025
47 International Journal Hyewon Han, Xiulian Peng, Doyeon Kim, Yan Lu, Hong-Goo Kang "Dual-Branch Guidance Encoder for Robust Acoustic Echo Suppression" in IEEE Transactions on Audio, Speech and Language Processing (TASLP), vol.33, pp.627 - 639, 2025
46 International Journal Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang "Perceptual Neural Audio Coding with Modified Discrete Cosine Transform" in IEEE Journal of Special Topics in Signal Processing (JSTSP), 2024
45 International Conference Juhwan Yoon, Hyungseob Lim, Hyeonjin Cha, Hong-Goo Kang "StylebookTTS: Zero-Shot Text-to-Speech Leveraging Unsupervised Style Representation" in APSIPA ASC, 2024
44 International Conference Doyeon Kim, Yanjue Song, Nilesh Madhu, Hong-Goo Kang "Enhancing Neural Speech Embeddings for Generative Speech Models" in APSIPA ASC, 2024
43 Domestic Conference 김병현, 강홍구, 장인선 "저지연 조건하의 심층신경망 기반 음성 압축" in 한국방송·미디어공학회 2024년 하계학술대회, 2024
42 International Conference Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi "Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation" in INTERSPEECH, 2024
41 International Conference Woo-Jin Chung, Hong-Goo Kang "Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator" in INTERSPEECH, 2024