Authors : Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang
Year : 2023
Publisher / Conference : ICASSP
Research area : Audio Signal Processing, Coding
Related project : 생성모델 기반 음향압축 기술 연구(3/5)
Presentation : Poster
Modern deep neural network (DNN)-based audio coding approaches utilize complicated non-linear functions (e.g., convolutional neural networks and non-linear activations), which leads to high complexity and memory usage. However, their decoded signal quality is still not much higher than that of signal processing-based legacy codecs. In this paper, we propose an effective frequency-domain neural audio coding paradigm that adopts a modified discrete-cosine transform (MDCT) for analysis and synthesis and DNNs for the quantization of encoded parameters. It includes an efficient method for encoding MDCT bins as well as a mechanism to adapt the quantization level for each bins. Our neural audio codec is trained in an end-to-end manner with the help of psychoacoustics-based perceptual loss, removing the burden of module-by-module fine-tuning. Experimental results show that our proposed model’s performance is comparable with other state-of-the-art audio codecs.