Papers

End-to-End Neural Audio Coding in the MDCT Domain

International Conference
작성자
이지현
작성일
2023-02-21 14:06
조회
945
Authors : Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang

Year : 2023

Publisher / Conference : ICASSP

Research area : Audio Signal Processing, Coding

Related project : 생성모델 기반 음향압축 기술 연구(3/5)

Presentation : Poster

Modern deep neural network (DNN)-based audio coding approaches utilize complicated non-linear functions (e.g., convolutional neural networks and non-linear activations), which leads to high complexity and memory usage. However, their decoded signal quality is still not much higher than that of signal processing-based legacy codecs. In this paper, we propose an effective frequency-domain neural audio coding paradigm that adopts a modified discrete-cosine transform (MDCT) for analysis and synthesis and DNNs for the quantization of encoded parameters. It includes an efficient method for encoding MDCT bins as well as a mechanism to adapt the quantization level for each bins. Our neural audio codec is trained in an end-to-end manner with the help of psychoacoustics-based perceptual loss, removing the burden of module-by-module fine-tuning. Experimental results show that our proposed model’s performance is comparable with other state-of-the-art audio codecs.
전체 364
344 International Conference Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang "BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0" in The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2023
343 International Conference Seyun Um, Jihyun Kim, Jihyun Lee, Hong-Goo Kang "Facetron: A Multi-speaker Face-to-Speech Model based on Cross-Modal Latent Representations" in EUSIPCO, 2023
342 International Conference Hejung Yang, Hong-Goo Kang "Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement" in INTERSPEECH, 2023
341 International Conference Jihyun Kim, Hong-Goo Kang "Contrastive Learning based Deep Latent Masking for Music Source Seperation" in INTERSPEECH, 2023
340 International Conference Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang "MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion" in INTERSPEECH, 2023
339 International Conference Hyungchan Yoon, Seyun Um, Changhwan Kim, Hong-Goo Kang "Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech" in INTERSPEECH, 2023
338 International Conference Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang "Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech" in INTERSPEECH, 2023
337 International Conference Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang "HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders" in INTERSPEECH, 2023
336 Domestic Conference Jihyun Lee, Wootaek Lim, Hong-Goo Kang "음성 압축에서의 심층 신경망 기반 장구간 예측" in 한국방송·미디어공학회 2023년 하계학술대회, 2023
335 Domestic Conference Hwayeon Kim, Hong-Goo Kang "Band-Split based Dual-Path Convolution Recurrent Network for Music Source Separation" in 2023년도 한국음향학회 춘계학술발표대회 및 제38회 수중음향학 학술발표회, 2023