Papers

Progressive Multi-Stage Neural Audio Codec with Psychoacoustic Loss and Discriminator

International Conference
작성자
이지현
작성일
2023-02-21 14:09
조회
2070
Authors : Byeong Hyeon Kim, Hyungseob Lim, Jihyun Lee, Inseon Jang, Hong-Goo Kang

Year : 2023

Publisher / Conference : ICASSP

Research area : Audio Signal Processing, Coding

Related project : 생성모델 기반 음향압축 기술 연구(3/5)

Presentation : Poster

In this paper, we improve the efficiency of the progressive multi-stage neural audio codec (PR-Codec) by utilizing perceptually motivated training criteria. Although our baseline PR-Codec successfully reconstructs full-band signals by progressively decoding the pre-defined subband signals, transparent quality can only be guaranteed in high bit-rates. To reduce bit-rates while maintaining perceptually transparent quality, we adopt a psychoacoustic model (PAM)-based loss and propose a perceptual weighting discriminator (PWD), which enables us to synthesize and discriminate audio signals in the perceptually motivated domain. We also introduce a scalar quantization with an entropy model to further enhance the quantization efficiency. Our experimental results show that our proposed model significantly improves perceptual reconstruction quality at the expense of the waveform disparity in the time-domain, compared to our previous model.
전체 370
158 International Conference WooSeok Ko, Seyun Um, Zhenyu Piao, Hong-goo Kang "Consideration of Varying Training Lengths for Short-Duration Speaker Verification" in APSIPA ASC, 2023
157 International Journal Hyungchan Yoon, Changhwan Kim, Seyun Um, Hyun-Wook Yoon, Hong-Goo Kang "SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems" in IEEE Signal Processing Letters, vol.30, pp.593-597, 2023
156 International Conference Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang "BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0" in The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2023
155 International Conference Seyun Um, Jihyun Kim, Jihyun Lee, Hong-Goo Kang "Facetron: A Multi-speaker Face-to-Speech Model based on Cross-Modal Latent Representations" in EUSIPCO, 2023
154 International Conference Hejung Yang, Hong-Goo Kang "Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement" in INTERSPEECH, 2023
153 International Conference Jihyun Kim, Hong-Goo Kang "Contrastive Learning based Deep Latent Masking for Music Source Seperation" in INTERSPEECH, 2023
152 International Conference Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang "MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion" in INTERSPEECH, 2023
151 International Conference Hyungchan Yoon, Seyun Um, Changhwan Kim, Hong-Goo Kang "Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech" in INTERSPEECH, 2023
150 International Conference Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang "Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech" in INTERSPEECH, 2023
149 International Conference Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang "HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders" in INTERSPEECH, 2023