Prosody dependent speech recognition on radio news corpus of American English

International Journal
2006-01-01 14:00
Authors : Ken Chen, Mark Hasegawa-johnson, Aaron Cohen, Sarah Borys, Sung-Suk Kim, Jennifer Cole, Jeung-Yoon Choi

Year : 2006

Publisher / Conference : IEEE Transactions on Audio, Speech, and Language Processing

Volume : 14, issue 1

Page : 232-245

Does prosody help word recognition? This paper proposes a novel probabilistic framework in which word and phoneme are dependent on prosody in a way that reduces word error rates (WER) relative to a prosody-independent recognizer with comparable parameter count. In the proposed prosody-dependent speech recognizer, word and phoneme models are conditioned on two important prosodic variables: the intonational phrase boundary and the pitch accent. An information-theoretic analysis is provided to show that prosody dependent acoustic and language modeling can increase the mutual information between the true word hypothesis and the acoustic observation by exciting the interaction between prosody dependent acoustic model and prosody dependent language model. Empirically, results indicate that the influence of these prosodic variables on allophonic models are mainly restricted to a small subset of distributions: the duration PDFs (modeled using an explicit duration hidden Markov model or EDHMM) and the acoustic-prosodic observation PDFs (normalized pitch frequency). Influence of prosody on cepstral features is limited to a subset of phonemes: for example, vowels may be influenced by both accent and phrase position, but phrase-initial and phrase-final consonants are independent of accent. Leveraging these results, effective prosody dependent allophonic models are built with minimal increase in parameter count. These prosody dependent speech recognizers are able to reduce word error rates by up to 11% relative to prosody independent recognizers with comparable parameter count, in experiments based on the prosodically-transcribed Boston Radio News corpus.
전체 355
355 International Conference Hyewon Han, Naveen Kumar "A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings" in Hands-free Speech Communication and Microphone Arrays (HSCMA, Satellite workshop in ICASSP), 2024
354 International Conference Yanjue Song, Doyeon Kim, Nilesh Madhu, Hong-Goo Kang "On the Disentanglement and Robustness of Self-Supervised Speech Representations" in International Conference on Electronics, Information, and Communication (ICEIC) (*awarded Best Paper), 2024
353 International Conference Yeona Hong, Miseul Kim, Woo-Jin Chung, Hong-Goo Kang "Contextual Learning for Missing Speech Automatic Speech Recognition" in International Conference on Electronics, Information, and Communication (ICEIC), 2024
352 International Conference Juhwan Yoon, Seyun Um, Woo-Jin Chung, Hong-Goo Kang "SC-ERM: Speaker-Centric Learning for Speech Emotion Recognition" in International Conference on Electronics, Information, and Communication (ICEIC), 2024
351 International Conference Hejung Yang, Hong-Goo Kang "On Fine-Tuning Pre-Trained Speech Models With EMA-Target Self-Supervised Loss" in ICASSP, 2024
350 International Journal Zainab Alhakeem, Se-In Jang, Hong-Goo Kang "Disentangled Representations in Local-Global Contexts for Arabic Dialect Identification" in Transactions on Audio, Speech, and Language Processing, 2024
349 International Conference Hong-Goo Kang, W. Bastiaan Kleijn, Jan Skoglund, Michael Chinen "Convolutional Transformer for Neural Speech Coding" in Audio Engineering Society Convention, 2023
348 International Conference Hong-Goo Kang, Jan Skoglund, W. Bastiaan Kleijn, Andrew Storus, Hengchin Yeh "A High-Rate Extension to Soundstream" in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
347 International Conference Zhenyu Piao, Hyungseob Lim, Miseul Kim, Hong-goo Kang "PDF-NET: Pitch-adaptive Dynamic Filter Network for Intra-gender Speaker Verification" in APSIPA ASC, 2023
346 International Conference WooSeok Ko, Seyun Um, Zhenyu Piao, Hong-goo Kang "Consideration of Varying Training Lengths for Short-Duration Speaker Verification" in APSIPA ASC, 2023