Papers

Contextual Learning for Missing Speech Automatic Speech Recognition

International Conference
2021~
작성자
dsp
작성일
2024-01-22 16:15
조회
2006
Authors : Yeona Hong, Miseul Kim, Woo-Jin Chung, Hong-Goo Kang

Year : 2024

Publisher / Conference : International Conference on Electronics, Information, and Communication (ICEIC)

Research area : Speech Signal Processing, Speech Recognition

Presentation/Publication date : 2024.01.29

Presentation : Poster

—In this paper, we present an automatic speech recognition (ASR) system that is capable of decoding complete transcriptions from speech even in cases where there are missing segments in the audio. To predict complete transcriptions from speech that may have missing segments, we utilize a contextual learning approach inspired by recent language model training approaches, in which our model leverages surrounding speech segments as cues for the prediction. Our model consists of two modules: a contextual feature extractor designed with the structure of wav2vec 2.0, and a projection layer. We further explore various masking lengths for model training so as to optimally benefit the ASR system without compromising its performance. Our proposed methodology demonstrates highquality ASR performance on missing speech segments of various lengths, ranging from a word error rate (WER) of 4.7% on 0.25 seconds segments to 18.5% on 1 second segments.
전체 370
49 International Conference Yeona Hong, Hyewon Han, Woo-jin Chung, Hong-Goo Kang "StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models" in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
48 International Conference Sangmin Lee, Woojin Chung, Hong-Goo Kang "LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration" in Association for the Advancement of Artificial Intelligence (AAAI), 2025
47 International Journal Hyewon Han, Xiulian Peng, Doyeon Kim, Yan Lu, Hong-Goo Kang "Dual-Branch Guidance Encoder for Robust Acoustic Echo Suppression" in IEEE Transactions on Audio, Speech and Language Processing (TASLP), vol.33, pp.627 - 639, 2025
46 International Journal Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang "Perceptual Neural Audio Coding with Modified Discrete Cosine Transform" in IEEE Journal of Special Topics in Signal Processing (JSTSP), 2025
45 International Conference Juhwan Yoon, Hyungseob Lim, Hyeonjin Cha, Hong-Goo Kang "StylebookTTS: Zero-Shot Text-to-Speech Leveraging Unsupervised Style Representation" in APSIPA ASC, 2024
44 International Conference Doyeon Kim, Yanjue Song, Nilesh Madhu, Hong-Goo Kang "Enhancing Neural Speech Embeddings for Generative Speech Models" in APSIPA ASC, 2024
43 Domestic Conference 김병현, 강홍구, 장인선 "저지연 조건하의 심층신경망 기반 음성 압축" in 한국방송·미디어공학회 2024년 하계학술대회, 2024
42 International Conference Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi "Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation" in INTERSPEECH, 2024
41 International Conference Woo-Jin Chung, Hong-Goo Kang "Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator" in INTERSPEECH, 2024
40 International Conference Juhwan Yoon, Woo Seok Ko, Seyun Um, Sungwoong Hwang, Soojoong Hwang, Changhwan Kim, Hong-Goo Kang "UNIQUE : Unsupervised Network for Integrated Speech Quality Evaluation" in INTERSPEECH, 2024