Papers

Perfect match: Improved cross-modal embeddings for audio-visual synchronisation

International Conference
2016~2020
작성자
한혜원
작성일
2019-05-01 16:39
조회
1507
Authors : Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang

Year : 2019

Publisher / Conference : ICASSP

This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronization. Here, we set up the problem as one of cross-modal retrieval, where the objective is to find the most relevant audio segment given a short video clip. The method builds on the recent advances in learning representations from cross-modal self-supervision. The main contributions of this paper are as follows: (1) we propose a new learning strategy where the embeddings are learnt via a multi-way matching problem, as opposed to a binary classification (matching or non-matching) problem as proposed by recent papers; (2) we demonstrate that performance of this method far exceeds the existing baselines on the synchronization task; (3) we use the learnt embeddings for visual speech recognition in self-supervision, and show that the performance matches the representations learnt end-to-end in a fully-supervised manner.
전체 360
360 International Conference Seyun Um, Doyeon Kim, Hong-Goo Kang "PARAN: Variational Autoencoder-based End-to-End Articulation-to-Speech System for Speech Intelligibility" in INTERSPEECH, 2024
359 International Conference Jihyun Kim, Stijn Kindt, Nilesh Madhu, Hong-Goo Kang "Enhanced Deep Speech Separation in Clustered Ad Hoc Distributed Microphone Environments" in INTERSPEECH, 2024
358 International Conference Woo-Jin Chung, Hong-Goo Kang "Speaker-Independent Acoustic-to-Articulatory Inversion through Multi-Channel Attention Discriminator" in INTERSPEECH, 2024
357 International Conference Juhwan Yoon, Woo Seok Ko, Seyun Um, Sungwoong Hwang, Soojoong Hwang, Changhwan Kim, Hong-Goo Kang "UNIQUE : Unsupervised Network for Integrated Speech Quality Evaluation" in INTERSPEECH, 2024
356 International Conference Yanjue Song, Doyeon Kim, Hong-Goo Kang, Nilesh Madhu "Spectrum-aware neural vocoder based on self-supervised learning for speech enhancement" in EUSIPCO, 2024
355 International Conference Hyewon Han, Naveen Kumar "A cross-talk robust multichannel VAD model for multiparty agent interactions trained using synthetic re-recordings" in Hands-free Speech Communication and Microphone Arrays (HSCMA, Satellite workshop in ICASSP), 2024
354 International Conference Yanjue Song, Doyeon Kim, Nilesh Madhu, Hong-Goo Kang "On the Disentanglement and Robustness of Self-Supervised Speech Representations" in International Conference on Electronics, Information, and Communication (ICEIC) (*awarded Best Paper), 2024
353 International Conference Yeona Hong, Miseul Kim, Woo-Jin Chung, Hong-Goo Kang "Contextual Learning for Missing Speech Automatic Speech Recognition" in International Conference on Electronics, Information, and Communication (ICEIC), 2024
352 International Conference Juhwan Yoon, Seyun Um, Woo-Jin Chung, Hong-Goo Kang "SC-ERM: Speaker-Centric Learning for Speech Emotion Recognition" in International Conference on Electronics, Information, and Communication (ICEIC), 2024
351 International Conference Hejung Yang, Hong-Goo Kang "On Fine-Tuning Pre-Trained Speech Models With EMA-Target Self-Supervised Loss" in ICASSP, 2024