Papers

A High-Rate Extension to Soundstream

International Conference
2021~
작성자
dsp
작성일
2023-10-16 18:18
조회
254
Authors : Hong-Goo Kang, Jan Skoglund, W. Bastiaan Kleijn, Andrew Storus, Hengchin Yeh

Year : 2023

Publisher / Conference : IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Research area : Speech Signal Processing, Coding

Presentation/Publication date : 2023.10.25

Presentation : Oral

In this paper, we propose a high-rate extension of the SoundStream codec which is able to generate almost transparent quality audio at 16 kbps for wideband speech signals. SoundStream shows reasonably good performance at low bit-rates (e.g. around 9 kbps), but its performance does not improve much when more bits are used for encoding the latent embeddings. Motivated by experimental results showing that neural audio codec performance is highly related to the characteristics of latent embeddings such as dimensionality, dependency, and probability density function shape, we propose a convolutional transformer architecture and an attention-based multi-scale latent decomposition method that significantly enhances codec performance when quantizing high-dimensional embeddings. Experimental results show the superiority of our proposed model over conventional approaches.
전체 355
7 International Conference Yeona Hong, Miseul Kim, Woo-Jin Chung, Hong-Goo Kang "Contextual Learning for Missing Speech Automatic Speech Recognition" in International Conference on Electronics, Information, and Communication (ICEIC), 2024
6 International Conference Zhenyu Piao, Hyungseob Lim, Miseul Kim, Hong-goo Kang "PDF-NET: Pitch-adaptive Dynamic Filter Network for Intra-gender Speaker Verification" in APSIPA ASC, 2023
5 International Conference Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang "BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0" in The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2023
4 International Conference Zhenyu Piao, Miseul Kim, Hyungchan Yoon, Hong-Goo Kang "HappyQuokka System for ICASSP 2023 Auditory EEG Challenge" in ICASSP, 2023
3 International Conference Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang "Style Modeling for Multi-Speaker Articulation-to-Speech" in ICASSP, 2023
2 International Conference Miseul Kim, Zhenyu Piao, Seyun Um, Ran Lee, Jaemin Joh, Seungshin Lee, Hong-Goo Kang "Light-Weight Speaker Verification with Global Context Information" in INTERSPEECH, 2022
1 International Conference Miseul Kim, Minh-Tri Ho, Hong-Goo Kang "Self-supervised Complex Network for Machine Sound Anomaly Detection" in EUSIPCO, 2021