Papers

BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0

International Conference
2021~
작성자
dsp
작성일
2023-08-11 11:32
조회
1858
Authors : Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang

Year : 2023

Publisher / Conference : The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)

Research area : Speech Signal Processing, Etc


Presentation : Poster

Decoding spoken speech from neural activity in the brain is a fast-emerging research topic, as it could enable communication for people who have difficulties with producing audible speech. For this task, electrocorticography (ECoG) is a common method for recording brain activity with high temporal resolution and high spatial precision. However, due to the risky surgical procedure required for obtaining ECoG recordings, relatively little of this data has been collected, and the amount is insufficient to train a neural network-based Brain-to-Speech (BTS) system. To address this problem, we propose BrainTalker—a novel BTS framework that generates intelligible spoken speech from ECoG signals under extremely low-resource scenarios. We apply a transfer learning approach utilizing a pre-trained self-supervised model, Wav2Vec 2.0. Specifically, we train an encoder module to map ECoG signals to latent embeddings that match Wav2Vec 2.0 representations of the corresponding spoken speech. These embeddings are then transformed into mel-spectrograms using stacked convolutional and transformer-based layers, which are fed into a neural vocoder to synthesize speech waveform. Experimental results demonstrate our proposed framework achieves outstanding performance in terms of subjective and objective metrics, including a Pearson correlation coefficient of 0.9 between generated and ground truth mel-spectrograms. We share publicly available Demos and Code.
전체 367
165 International Conference Yeona Hong, Miseul Kim, Woo-Jin Chung, Hong-Goo Kang "Contextual Learning for Missing Speech Automatic Speech Recognition" in International Conference on Electronics, Information, and Communication (ICEIC), 2024
164 International Conference Juhwan Yoon, Seyun Um, Woo-Jin Chung, Hong-Goo Kang "SC-ERM: Speaker-Centric Learning for Speech Emotion Recognition" in International Conference on Electronics, Information, and Communication (ICEIC), 2024
163 International Conference Hejung Yang, Hong-Goo Kang "On Fine-Tuning Pre-Trained Speech Models With EMA-Target Self-Supervised Loss" in ICASSP, 2024
162 International Journal Zainab Alhakeem, Se-In Jang, Hong-Goo Kang "Disentangled Representations in Local-Global Contexts for Arabic Dialect Identification" in Transactions on Audio, Speech, and Language Processing, 2024
161 International Conference Hong-Goo Kang, W. Bastiaan Kleijn, Jan Skoglund, Michael Chinen "Convolutional Transformer for Neural Speech Coding" in Audio Engineering Society Convention, 2023
160 International Conference Hong-Goo Kang, Jan Skoglund, W. Bastiaan Kleijn, Andrew Storus, Hengchin Yeh "A High-Rate Extension to Soundstream" in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
159 International Conference Zhenyu Piao, Hyungseob Lim, Miseul Kim, Hong-goo Kang "PDF-NET: Pitch-adaptive Dynamic Filter Network for Intra-gender Speaker Verification" in APSIPA ASC, 2023
158 International Conference WooSeok Ko, Seyun Um, Zhenyu Piao, Hong-goo Kang "Consideration of Varying Training Lengths for Short-Duration Speaker Verification" in APSIPA ASC, 2023
157 International Journal Hyungchan Yoon, Changhwan Kim, Seyun Um, Hyun-Wook Yoon, Hong-Goo Kang "SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems" in IEEE Signal Processing Letters, vol.30, pp.593-597, 2023
156 International Conference Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang "BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0" in The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2023