324 |
International Conference
A Study on Conditional Features for a Flow-based Neural Vocoder
|
2020-12-06 |
Abstract: In this paper, we propose an effective way of providing conditional features for a flow-based neural vocoder. Most conventional approaches utilize mel-spectrograms for conditioning neural voc...
|
323 |
International Conference
End-to-end Lip Synchronisation Based on Pattern Classification
|
2020-11-03 |
AbstractThe goal of this work is to synchronise audio and video of a talking face using deep neural network models. Existing works have trained networks on proxy tasks such as cross-modal similarity learn...
|
322 |
International Conference
CROSS ATTENTIVE POOLING FOR SPEAKER VERIFICATION
|
2020-11-03 |
The goal of this paper is text-independent speaker verification where utterances come from `in the wild' videos and may contain irrelevant signal. While speaker verification is naturally a pair-wise p...
|
321 |
Domestic Journal
화자 인식을 위한 적대학습 기반음성 분리 프레임워크에 대한 연구
|
2020-10-29 |
초록 : 본 논문은 딥러닝 기법을 활용하여 음성신호로부터 효율적인 화자 벡터를 추출하는 시스템을 제안한다. 음성 신호에는 발화내용, 감정, 배경잡음 등과 같이 화자의 특징과는 관련이 없는 정보들이 포함되어 있다는 점에 착...
|
320 |
International Conference
ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis
|
2020-10-08 |
In this paper we propose ExcitGlow, a vocoder that incorporates the source-filter model of voice production theory into a flow-based deep generative model. By targeting the distribution of the ex...
|
319 |
International Conference
Speaker-invariant Psychological Stress Detection Using Attention-based Network
|
2020-10-07 |
When people get stressed in nervous or unfamiliar situations, their speaking styles or acoustic characteristics change. These changes are particularly emphasized in certain regions of speech, so a model tha...
|
318 |
International Journal
Effective Emotion Transplantation in an End-to-End Text-to-Speech System
|
2020-10-07 |
AbstractIn this paper, we propose an effective technique to transplant a source speaker’s emotional expression to a new target speaker’s voice within an end-to-end text-to-speech (TTS) framework. We ...
|
317 |
International Conference
LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis
|
2020-10-07 |
We propose a linear prediction (LP)-based waveform generation method via WaveNet vocoding framework. A WaveNet-based neural vocoder has significantly improved the quality of parametric text-to-speech (TT...
|
316 |
Domestic Conference
화자 및 발화 스타일 임베딩을 통한 다화자 음성합성 시스템 음질 향상
|
2020-09-15 |
In this paper, we improve the speech quality of multi-speaker text-to-speech (TTS) system by adding two embedding networks that represent speaker and speaking style characteristics. The speaker embedding is ...
|
315 |
International Journal
Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval
|
2020-09-14 |
Abstract:This paper proposes a new strategy for learning effective cross-modal joint embeddings using self-supervision. We set up the problem as one of cross-modal retrieval, where the objective is to fin...
|
314 |
 |
Domestic Conference
딥러닝 기반 종단 간 다채널 음질 개선 알고리즘
 |
2020-08-28 |
AbstractIn this paper, we propose a deep learning-based multi-channel speech enhancement algorithm. The proposed system consists of three sub-modules such as magnitude estimation, phase estimation, and spatial ...
|
313 |
International Conference
FaceFilter: Audio-visual speech separation using still images
|
2020-08-13 |
AbstractThe objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network. Unlike previous works that used lip movement ...
|
312 |
International Conference
Seeing Voices and Hearing Voices: Learning discriminative embeddings using cross-modal self-supervision
|
2020-08-13 |
AbstractThe goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervised learning have shown that effective representatio...
|
311 |
International Conference
MIRNet: Learning multiple identities representations in overlapped speech
|
2020-08-13 |
Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determin...
|
310 |
International Conference
Intra-class variation reduction of speaker representation in disentanglement framework
|
2020-08-13 |
In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent represe...
|
309 |
International Conference
A Cross-channel Attention-based Wave-U-Net for Multi-channel Speech Enhancement
|
2020-08-11 |
In this paper, we present a novel architecture for multi-channel speech enhancement using a cross-channel attention-based Wave-U-Net structure. Despite the advantages of utilizing spatial information as we...
|
308 |
Domestic Conference
메타러닝을 이용한 SAR 영상 자동표적 인식
|
2020-07-13 |
공군의 공대지 작전에서 지상의 물체를 정확하게 식별하는 것은 매우 중요하다. 그러나, 임무 특성상 대부분 높은 고도에서 임무를 수행하기 때문에 조종사가 육안으로 표적을 정확하게 식별하는 것은 어렵고, 구름이나 안개와 같...
|
307 |
International Conference
Emotional Speech Synthesis with Rich and Granularized Control
|
2020-04-19 |
This paper proposes an effective emotion control method for an end-to-end
text-to-speech (TTS) system. To flexibly control the distinct characteristic of
a target emotion category, it is essential to ...
|
306 |
Domestic Conference
저사양 TV 사운드 설계환경을 위한 IIR 필터 기반 주파수 등화기
|
2020-04-01 |
In countries that are developing low-end TVs (eg
India, Africa, etc.), the lack of development
environment and infrastructure often do not take
into account the sound environment of the TV. To
solve ...
|
305 |
International Conference
Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network
|
2020-01-31 |
In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).The recently proposed LPCNet vocoder has successfully achieved high-quality ...
|