Papers

Disentangled Representations in Local-Global Contexts for Arabic Dialect Identification

International Journal
2021~
작성자
김병현
작성일
2023-12-05 18:45
조회
2769
Authors : Zainab Alhakeem, Se-In Jang, Hong-Goo Kang

Year : 2024

Publisher / Conference : Transactions on Audio, Speech, and Language Processing

Research area : Speech Signal Processing, Speech Recognition

Presentation : None

In this article, we propose a locally and globally informed disentanglement network for Arabic dialect identification (ADI). Our proposed disentanglement network aims to detach all irrelevant information (e.g., speaker, gender and channel) from the source utterance and extract only dialect-related representations fitted for the ADI problem. The proposed network consists of local convolutional backbone modules to learn low-resolution feature maps and self-attention-based bottleneck transformers to efficiently aggregate the local information to represent the global context as the learned dialect embeddings. We propose a novel supervised clustering loss to minimize intra-class variations and maximize inter-class variations in a latent space. Our model achieves state-of-the-art results in qualitative and quantitative evaluations by outperforming other competitive solutions on ADI-17 datasets. Specifically, we have shown that local-global awareness from our proposed network boosts feature representation and enhances identification performance.
전체 372
51 International Conference Byeong Hyeon Kim,Hyungseob Lim,Inseon Jang,Hong-Goo Kang "Towards an Ultra-Low-Delay Neural Audio Coding with Computational Efficiency" in INTERSPEECH, 2025
50 International Conference Stijn Kindt,Jihyun Kim,Hong-Goo Kang,Nilesh Madhu "Efficient, Cluster-Informed, Deep Speech Separation with Cross-Cluster Information in AD-HOC Wireless Acoustic Sensor Networks" in International Workshop on Acoustic Signal Enhancement (IWAENC), 2024
49 International Conference Yeona Hong, Hyewon Han, Woo-jin Chung, Hong-Goo Kang "StableQuant: Layer Adaptive Post-Training Quantization for Speech Foundation Models" in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
48 International Conference Sangmin Lee, Woojin Chung, Hong-Goo Kang "LAMA-UT: Language Agnostic Multilingual ASR through Orthography Unification and Language-Specific Transliteration" in Association for the Advancement of Artificial Intelligence (AAAI), 2025
47 International Journal Hyewon Han, Xiulian Peng, Doyeon Kim, Yan Lu, Hong-Goo Kang "Dual-Branch Guidance Encoder for Robust Acoustic Echo Suppression" in IEEE Transactions on Audio, Speech and Language Processing (TASLP), vol.33, pp.627 - 639, 2025
46 International Journal Hyungseob Lim, Jihyun Lee, Byeong Hyeon Kim, Inseon Jang, Hong-Goo Kang "Perceptual Neural Audio Coding with Modified Discrete Cosine Transform" in IEEE Journal of Special Topics in Signal Processing (JSTSP), 2024
45 International Conference Juhwan Yoon, Hyungseob Lim, Hyeonjin Cha, Hong-Goo Kang "StylebookTTS: Zero-Shot Text-to-Speech Leveraging Unsupervised Style Representation" in APSIPA ASC, 2024
44 International Conference Doyeon Kim, Yanjue Song, Nilesh Madhu, Hong-Goo Kang "Enhancing Neural Speech Embeddings for Generative Speech Models" in APSIPA ASC, 2024
43 Domestic Conference 김병현, 강홍구, 장인선 "저지연 조건하의 심층신경망 기반 음성 압축" in 한국방송·미디어공학회 2024년 하계학술대회, 2024
42 International Conference Miseul Kim, Soo-Whan Chung, Youna Ji, Hong-Goo Kang, Min-Seok Choi "Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation" in INTERSPEECH, 2024