Adversarial Audio Synthesis Using a Harmonic-Percussive Discriminator
In this paper, we propose a discriminator design scheme for generative adversarial network (GAN)-based audio signal generation.
Unlike conventional discriminators which take an entire signal as input, our discriminator design separates the audio signal into harmonic and percussive components and analyzes each component independently.
The rationale behind this idea is that conventional discriminators cannot reliably capture subtle distortions in general audio signals, which have complicated time-frequency characteristics.
By considering the time-frequency resolution of audio signals, our proposed method encourages the generator to better reconstruct harmonic and percussive features, which are critical for the quality of the generated signals.
Listening tests show that our framework significantly enhances the stability of pitches and generates clearer audio compared to a baseline.
|345||International Journal||Zainab Alhakeem, Se-In Jang, Hong-Goo Kang "Disentangled Representations in Local-Global Contexts for Arabic Dialect Identification" in Transactions on Audio, Speech, and Language Processing, 2024|
|344||International Conference||Zhenyu Piao, Hyungseob Lim, Miseul Kim, Hong-goo Kang "PDF-NET: Pitch-adaptive Dynamic Filter Network for Intra-gender Speaker Verification" in APSIPA ASC, 2023|
|343||International Conference||WooSeok Ko, Seyun Um, Zhenyu Piao, Hong-goo Kang "Consideration of Varying Training Lengths for Short-Duration Speaker Verification" in APSIP ASC, 2023|
|342||International Journal||Hyungchan Yoon, Changhwan Kim, Seyun Um, Hyun-Wook Yoon, Hong-Goo Kang "SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems" in IEEE Signal Processing Letters, vol.30, pp.593-597, 2023|
|341||International Conference||Miseul Kim, Zhenyu Piao, Jihyun Lee, Hong-Goo Kang "BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0" in The IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2023|
|340||International Conference||Seyun Um, Jihyun Kim, Jihyun Lee, Hong-Goo Kang "Facetron: A Multi-speaker Face-to-Speech Model based on Cross-Modal Latent Representations" in EUSIPCO, 2023|
|339||International Conference||Hejung Yang, Hong-Goo Kang "Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement" in INTERSPEECH, 2023|
|338||International Conference||Jihyun Kim, Hong-Goo Kang "Contrastive Learning based Deep Latent Masking for Music Source Seperation" in INTERSPEECH, 2023|
|337||International Conference||Woo-Jin Chung, Doyeon Kim, Soo-Whan Chung, Hong-Goo Kang "MF-PAM: Accurate Pitch Estimation through Periodicity Analysis and Multi-level Feature Fusion" in INTERSPEECH, 2023|
|336||International Conference||Hyungchan Yoon, Seyun Um, Changhwan Kim, Hong-Goo Kang "Adversarial Learning of Intermediate Acoustic Feature for End-to-End Lightweight Text-to-Speech" in INTERSPEECH, 2023|