Papers

End-to-End Neural Audio Coding in the MDCT Domain

International Conference
작성자
이지현
작성일
2023-02-21 14:06
조회
853
Authors : Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang

Year : 2023

Publisher / Conference : ICASSP

Research area : Audio Signal Processing, Coding

Related project : 생성모델 기반 음향압축 기술 연구(3/5)

Presentation : Poster

Modern deep neural network (DNN)-based audio coding approaches utilize complicated non-linear functions (e.g., convolutional neural networks and non-linear activations), which leads to high complexity and memory usage. However, their decoded signal quality is still not much higher than that of signal processing-based legacy codecs. In this paper, we propose an effective frequency-domain neural audio coding paradigm that adopts a modified discrete-cosine transform (MDCT) for analysis and synthesis and DNNs for the quantization of encoded parameters. It includes an efficient method for encoding MDCT bins as well as a mechanism to adapt the quantization level for each bins. Our neural audio codec is trained in an end-to-end manner with the help of psychoacoustics-based perceptual loss, removing the burden of module-by-module fine-tuning. Experimental results show that our proposed model’s performance is comparable with other state-of-the-art audio codecs.
전체 360
137 International Conference Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang "Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech" in INTERSPEECH, 2023
136 International Conference Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang "HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders" in INTERSPEECH, 2023
135 International Conference Zhenyu Piao, Miseul Kim, Hyungchan Yoon, Hong-Goo Kang "HappyQuokka System for ICASSP 2023 Auditory EEG Challenge" in ICASSP, 2023
134 International Conference Byeong Hyeon Kim, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang "Progressive Multi-Stage Neural Audio Codec with Psychoacoustic Loss and Discriminator" in ICASSP, 2023
133 International Conference Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang "End-to-End Neural Audio Coding in the MDCT Domain" in ICASSP, 2023
132 International Conference Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang "Style Modeling for Multi-Speaker Articulation-to-Speech" in ICASSP, 2023
131 International Conference Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang "Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting" in INTERSPEECH (*Best Student Paper Finalist), 2022
130 International Conference Changhwan Kim, Seyun Um, Hyungchan Yoon, Hong-goo Kang "FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTS" in INTERSPEECH, 2022
129 International Conference Miseul Kim, Zhenyu Piao, Seyun Um, Ran Lee, Jaemin Joh, Seungshin Lee, Hong-Goo Kang "Light-Weight Speaker Verification with Global Context Information" in INTERSPEECH, 2022
128 International Conference Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang "Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement" in ICASSP, 2022