Papers

HANUI: Harnessing Distributional Discrepancies for Singing Voice Deepfake Detection

International Conference

작성자

dsp

작성일

2026-02-19 13:23

조회

589

Authors : Seyun Um, Doyeon Kim, Hong-Goo Kang

Year : 2026

Publisher / Conference : in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

Research area : Speech Signal Processing, Speech Synthesis

Presentation : Poster

In this work, we propose a novel framework for deepfake detection in singing voices, based on explicitly capturing the distributional differences between bona fide and spoofed signals. While prior approaches employing self-supervised models or graph neural networks have shown promising results, they remain vulnerable when tested on unseen singers, musical styles, and languages. Motivated by anomaly detection, our method integrates an autoencoder with a GAN-based architecture to exploit probabilistic distributional discrepancies between ground-truth and reconstructed signals. In particular, we leverage a discriminator to extract informative feature maps that highlight distinctive characteristics of bona fide and spoofed samples, which are subsequently utilized by a detector to perform classification. Experimental results demonstrate that our proposed framework not only improves detection accuracy over recent methods, but also achieves substantial relative reductions in error rates, confirming its robustness and generalizability under challenging unseen conditions.

« Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation

목록보기

전체 381

381	International Conference	Seyun Um, Doyeon Kim, Hong-Goo Kang "HANUI: Harnessing Distributional Discrepancies for Singing Voice Deepfake Detection" in in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2026
380	International Conference	Miseul Kim, Soo jin Park, Kyungguen Byun, Hyeon-Kyeong Shin, Sunkuk Moon, Shuhua Zhang, Erik Visser "Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation" in in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2026
379	International Conference	Woongjib Choi, Sangmin Lee, Hyungseob Lim, Hong-Goo Kang "UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching" in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2026
378	International Journal	Hyeonjin Cha, Seyun Um, Miseul Kim, Changhwan Kim, Seungshin Lee, Hong-Goo Kang "Content-Aware Style Augmentation for Zero-Shot Voice Conversion With Short Target Speech" in IEEE Signal Processing Letters, vol.33, pp.66-70, 2025
377	Domestic Conference	신재훈, 최웅집, 김병현, 장인선, 강홍구 "조건부 플로우 매칭을 활용한 심층 신경망 기반 음성 코덱 향상 기법" in 한국방송·미디어공학회 2025년 하계학술대회, 2025
376	International Conference	Miseul Kim, Seyun Um, Hyeonjin Cha, Hong-Goo Kang "SpeechMLC: Speech Multi-Label Classification" in INTERSPEECH, 2025
375	International Conference	Sangmin Lee, Woojin Chung, Seyun Um, and Hong-Goo Kang "UniCoM: A Universal Code-Switching Speech Generator" in EMNLP Findings, 2025
374	International Conference	Woongjib Choi, Byeong Hyeon Kim, Hyungseob Lim, Inseon Jang, Hong-Goo Kang "Neural Spectral Band Generation for Audio Coding" in INTERSPEECH, 2025
373	International Conference	Jihyun Kim, Doyeon Kim, Hyewon Han, Jinyoung Lee, Jonguk Yoo, Chang Woo Han, Jeongook Song, Hoon-Young Cho, Hong-Goo Kang "Quadruple Path Modeling with Latent Feature Transfer for Permutation-free Continuous Speech Separation" in INTERSPEECH, 2025
372	International Conference	Byeong Hyeon Kim,Hyungseob Lim,Inseon Jang,Hong-Goo Kang "Towards an Ultra-Low-Delay Neural Audio Coding with Computational Efficiency" in INTERSPEECH, 2025

HANUI: Harnessing Distributional Discrepancies for Singing Voice Deepfake Detection

Previous

Sister Lab.

Yonsei University

Academic Website