번호
329 International Conference Disentangled Representations for Arabic Dialect Identification based on Supervised Clustering with Triplet Loss 2021-05-06
In this paper, we propose a novel supervised clustering with triplet (SCT) loss that effectively learns disentangled representations for Arabic dialect identification (ADI). To improve the performan...  
328 International Conference Self-supervised Complex Network for Machine Sound Anomaly Detection 2021-05-06
In this paper, we propose an anomaly detection algorithm for machine sounds with a deep complex network trained by self-supervision. Using the fact that phase continuity information is crucial for de...  
327 International Conference A Fast and Lightweight Text-To-Speech Model withSpectrum and Waveform Alignment Algorithms 2021-05-06
In this paper, we propose a fast and lightweight text-to-speech (TTS) model that generates high-quality speech even in CPU-only environments. By leveraging the front-end architecture of FastSpeech2, w...  
326 International Conference Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation 2021-03-04
AbstractIn this paper, we address the problem of separating individual speech signals from videos using audio-visual neural processing. Most conventional approaches utilize frame-wise matching criteria to ext...  
325 International Conference The ins and outs of speaker recognition: lessons from VoxSRC 2020 2021-02-03
The VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020 offers a challenging evaluation for speaker recognition systems, which includes celebrities playing different parts in movies. Th...  
324 International Conference A Study on Conditional Features for a Flow-based Neural Vocoder 2020-12-06
Abstract: In this paper, we propose an effective way of providing conditional features for a flow-based neural vocoder. Most conventional approaches utilize mel-spectrograms for conditioning neural voc...  
323 International Conference End-to-end Lip Synchronisation Based on Pattern Classification 2020-11-03
AbstractThe goal of this work is to synchronise audio and video of a talking face using deep neural network models. Existing works have trained networks on proxy tasks such as cross-modal similarity learn...  
322 International Conference CROSS ATTENTIVE POOLING FOR SPEAKER VERIFICATION 2020-11-03
The goal of this paper is text-independent speaker verification where utterances come from `in the wild' videos and may contain irrelevant signal. While speaker verification is naturally a pair-wise p...  
321 Domestic Journal 화자 인식을 위한 적대학습 기반음성 분리 프레임워크에 대한 연구 2020-10-29
초록 : 본 논문은 딥러닝 기법을 활용하여 음성신호로부터 효율적인 화자 벡터를 추출하는 시스템을 제안한다. 음성 신호에는 발화내용, 감정, 배경잡음 등과 같이 화자의 특징과는 관련이 없는 정보들이 포함되어 있다는 점에 착...  
320 International Conference ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis 2020-10-08
In this paper we propose ExcitGlow, a vocoder that incorporates the source-filter model of voice production theory into a flow-based deep generative model. By targeting the distribution of the ex...  
319 International Conference Speaker-invariant Psychological Stress Detection Using Attention-based Network 2020-10-07
When people get stressed in nervous or unfamiliar situations, their speaking styles or acoustic characteristics change. These changes are particularly emphasized in certain regions of speech, so a model tha...  
318 International Journal Effective Emotion Transplantation in an End-to-End Text-to-Speech System 2020-10-07
AbstractIn this paper, we propose an effective technique to transplant a source speaker’s emotional expression to a new target speaker’s voice within an end-to-end text-to-speech (TTS) framework. We ...  
317 International Conference LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis 2020-10-07
We propose a linear prediction (LP)-based waveform generation method via WaveNet vocoding framework. A WaveNet-based neural vocoder has significantly improved the quality of parametric text-to-speech (TT...  
316 Domestic Conference 화자 및 발화 스타일 임베딩을 통한 다화자 음성합성 시스템 음질 향상 2020-09-15
In this paper, we improve the speech quality of multi-speaker text-to-speech (TTS) system by adding two embedding networks that represent speaker and speaking style characteristics. The speaker embedding is ...  
315 International Journal Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval 2020-09-14
Abstract:This paper proposes a new strategy for learning effective cross-modal joint embeddings using self-supervision. We set up the problem as one of cross-modal retrieval, where the objective is to fin...  
314 Domestic Conference 딥러닝 기반 종단 간 다채널 음질 개선 알고리즘 imagefile 2020-08-28
AbstractIn this paper, we propose a deep learning-based multi-channel speech enhancement algorithm. The proposed system consists of three sub-modules such as magnitude estimation, phase estimation, and spatial ...  
313 International Conference FaceFilter: Audio-visual speech separation using still images 2020-08-13
AbstractThe objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network. Unlike previous works that used lip movement ...  
312 International Conference Seeing Voices and Hearing Voices: Learning discriminative embeddings using cross-modal self-supervision 2020-08-13
AbstractThe goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervised learning have shown that effective representatio...  
311 International Conference MIRNet: Learning multiple identities representations in overlapped speech 2020-08-13
Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determin...  
310 International Conference Intra-class variation reduction of speaker representation in disentanglement framework 2020-08-13
In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent represe...