Papers

Disentangled Representations for Arabic Dialect Identification based on Supervised Clustering with Triplet Loss

International Conference
2021~
작성자
한혜원
작성일
2021-08-30 14:38
조회
2151
Authors : Zainab Alhakeem, Yoohwan Kwon, Hong-Goo Kang

Year : 2021

Publisher / Conference : EUSIPCO

Research area : Speech Signal Processing, 기타

In this paper, we propose a novel supervised clustering with triplet (SCT) loss that effectively learns disentangled representations for Arabic dialect identification (ADI). To improve the performance of ADI using latent representation-based approaches, we need to extract embeddings that include only dialect related information by dissociating all the irrelevant information such as gender, channel, and speaker. In consideration of the embedding-level distribution, our proposed SCT loss minimizes intra-class variations and maximizes inter-class variations. Specifically, it uses the centroid of each dialect as a triplet component, thereby avoiding the issue of choosing an undesirable triplet component due to random sampling. Experimental results on the ADI-17 dataset show that our proposed method significantly outperforms conventional state-of-the-art methods in terms of the identification accuracy.
전체 364
23 International Conference Hong-Goo Kang, Jan Skoglund, W. Bastiaan Kleijn, Andrew Storus, Hengchin Yeh "A High-Rate Extension to Soundstream" in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
22 International Conference Zhenyu Piao, Hyungseob Lim, Miseul Kim, Hong-goo Kang "PDF-NET: Pitch-adaptive Dynamic Filter Network for Intra-gender Speaker Verification" in APSIPA ASC, 2023
21 International Conference Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang "BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0" in The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2023
20 International Conference Seyun Um, Jihyun Kim, Jihyun Lee, Hong-Goo Kang "Facetron: A Multi-speaker Face-to-Speech Model based on Cross-Modal Latent Representations" in EUSIPCO, 2023
19 International Conference Hejung Yang, Hong-Goo Kang "Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement" in INTERSPEECH, 2023
18 International Conference Jihyun Kim, Hong-Goo Kang "Contrastive Learning based Deep Latent Masking for Music Source Seperation" in INTERSPEECH, 2023
17 International Conference Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang "MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion" in INTERSPEECH, 2023
16 International Conference Hyungchan Yoon, Seyun Um, Changhwan Kim, Hong-Goo Kang "Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech" in INTERSPEECH, 2023
15 International Conference Hyungchan Yoon, Changhwan Kim, Eunwoo Song, Hyun-Wook Yoon, Hong-Goo Kang "Pruning Self-Attention for Zero-Shot Multi-Speaker Text-to-Speech" in INTERSPEECH, 2023
14 International Conference Doyeon Kim, Soo-Whan Chung, Hyewon Han, Youna Ji, Hong-Goo Kang "HD-DEMUCS: General Speech Restoration with Heterogeneous Decoders" in INTERSPEECH, 2023