Papers

End-to-End Neural Audio Coding in the MDCT Domain

International Conference
작성자
이지현
작성일
2023-02-21 14:06
조회
768
Authors : Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang

Year : 2023

Publisher / Conference : ICASSP

Research area : Audio Signal Processing, Coding

Related project : 생성모델 기반 음향압축 기술 연구(3/5)

Presentation : Poster

Modern deep neural network (DNN)-based audio coding approaches utilize complicated non-linear functions (e.g., convolutional neural networks and non-linear activations), which leads to high complexity and memory usage. However, their decoded signal quality is still not much higher than that of signal processing-based legacy codecs. In this paper, we propose an effective frequency-domain neural audio coding paradigm that adopts a modified discrete-cosine transform (MDCT) for analysis and synthesis and DNNs for the quantization of encoded parameters. It includes an efficient method for encoding MDCT bins as well as a mechanism to adapt the quantization level for each bins. Our neural audio codec is trained in an end-to-end manner with the help of psychoacoustics-based perceptual loss, removing the burden of module-by-module fine-tuning. Experimental results show that our proposed model’s performance is comparable with other state-of-the-art audio codecs.
전체 355
132 International Conference Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang "Style Modeling for Multi-Speaker Articulation-to-Speech" in ICASSP, 2023
131 International Conference Hyeon-Kyeong Shin, Hyewon Han, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang "Learning Audio-Text Agreement for Open-vocabulary Keyword Spotting" in INTERSPEECH (*Best Student Paper Finalist), 2022
130 International Conference Changhwan Kim, Seyun Um, Hyungchan Yoon, Hong-goo Kang "FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTS" in INTERSPEECH, 2022
129 International Conference Miseul Kim, Zhenyu Piao, Seyun Um, Ran Lee, Jaemin Joh, Seungshin Lee, Hong-Goo Kang "Light-Weight Speaker Verification with Global Context Information" in INTERSPEECH, 2022
128 International Conference Doyeon Kim, Hyewon Han, Hyeon-Kyeong Shin, Soo-Whan Chung, Hong-Goo Kang "Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement" in ICASSP, 2022
127 International Conference Chanwoo Lee, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang "Progressive Multi-Stage Neural Audio Coding with Guided References" in ICASSP, 2022
126 International Conference Jihyun Lee, Hyungseob Lim, Chanwoo Lee, Inseon Jang, Hong-Goo Kang "Adversarial Audio Synthesis Using a Harmonic-Percussive Discriminator" in ICASSP, 2022
125 International Conference Jinyoung Lee and Hong-Goo Kang "Stacked U-Net with High-level Feature Transfer for Parameter Efficient Speech Enhancement" in APSIPA ASC, 2021
124 International Conference Huu-Kim Nguyen, Kihyuk Jeong, Seyun Um, Min-Jae Hwang, Eunwoo Song, Hong-Goo Kang "LiteTTS: A Decoder-free Light-weight Text-to-wave Synthesis Based on Generative Adversarial Networks" in INTERSPEECH, 2021
123 International Conference Zainab Alhakeem, Yoohwan Kwon, Hong-Goo Kang "Disentangled Representations for Arabic Dialect Identification based on Supervised Clustering with Triplet Loss" in EUSIPCO, 2021