121 |
International Conference
A Study on Conditional Features for a Flow-based Neural Vocoder
|
2020-12-06 |
Abstract: In this paper, we propose an effective way of providing conditional features for a flow-based neural vocoder. Most conventional approaches utilize mel-spectrograms for conditioning neural voc...
|
120 |
International Conference
End-to-end Lip Synchronisation Based on Pattern Classification
|
2020-11-03 |
AbstractThe goal of this work is to synchronise audio and video of a talking face using deep neural network models. Existing works have trained networks on proxy tasks such as cross-modal similarity learn...
|
119 |
International Conference
CROSS ATTENTIVE POOLING FOR SPEAKER VERIFICATION
|
2020-11-03 |
The goal of this paper is text-independent speaker verification where utterances come from `in the wild' videos and may contain irrelevant signal. While speaker verification is naturally a pair-wise p...
|
118 |
International Conference
ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis
|
2020-10-08 |
APSIPA ASC 2020
|
117 |
International Conference
Speaker-invariant Psychological Stress Detection Using Attention-based Network
|
2020-10-07 |
When people get stressed in nervous or unfamiliar situations, their speaking styles or acoustic characteristics change. These changes are particularly emphasized in certain regions of speech, so a model tha...
|
116 |
International Conference
LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis
|
2020-10-07 |
We propose a linear prediction (LP)-based waveform generation method via WaveNet vocoding framework. A WaveNet-based neural vocoder has significantly improved the quality of parametric text-to-speech (TT...
|
115 |
International Conference
FaceFilter: Audio-visual speech separation using still images
|
2020-08-13 |
AbstractThe objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network. Unlike previous works that used lip movement ...
|
114 |
International Conference
Seeing Voices and Hearing Voices: Learning discriminative embeddings using cross-modal self-supervision
|
2020-08-13 |
AbstractThe goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervised learning have shown that effective representatio...
|
113 |
International Conference
MIRNet: Learning multiple identities representations in overlapped speech
|
2020-08-13 |
Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determin...
|
112 |
International Conference
Intra-class variation reduction of speaker representation in disentanglement framework
|
2020-08-13 |
In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent represe...
|
111 |
International Conference
A Cross-channel Attention-based Wave-U-Net for Multi-channel Speech Enhancement
|
2020-08-11 |
In this paper, we present a novel architecture for multi-channel speech enhancement using a cross-channel attention-based Wave-U-Net structure. Despite the advantages of utilizing spatial information as we...
|
110 |
International Conference
Emotional Speech Synthesis with Rich and Granularized Control
|
2020-04-19 |
This paper proposes an effective emotion control method for an end-to-end
text-to-speech (TTS) system. To flexibly control the distinct characteristic of
a target emotion category, it is essential to ...
|
109 |
International Conference
Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network
|
2020-01-31 |
In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).The recently proposed LPCNet vocoder has successfully achieved high-quality ...
|
108 |
International Conference
A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis
|
2019-11-25 |
In this paper, we investigate the variation in the
performance of a deep learning-based speech synthesis (DLSS)
system based on the configuration of output acoustic parameters.
Our method is mainly applicable...
|
107 |
International Conference
Model Order Selection for Wind Noise Reduction in Non-negative Matrix Factorization
|
2019-06-18 |
In this paper, we propose a wind noise reduction method based on various types of non-negative matrix factorization (NMF) approaches. Since wind noise has highly non- stationary spectral characterist...
|
106 |
International Conference
Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework
|
2019-06-18 |
In this paper, we propose a speech synthesis system that effectively generates multiple types of emotional speech using the concept of global style token (GST); where the emotion-related style informati...
|
105 |
International Conference
Excitation-by-SampleRNN Model for Text-to-Speech
|
2019-06-18 |
.In this paper, we propose a neural vocoder-based textto-speech (TTS) system that effectively utilizes a source-filter modeling framework. Although neural vocoder algorithms such as SampleRNN and WaveNet ar...
|
104 |
International Conference
Parameter Enhancement for MELP Speech Codec in Noisy Communication Environment
|
2019-06-18 |
In this paper, we propose a deep learning (DL)-based parameter enhancement method for a mixed excitation linear prediction (MELP) speech codec in noisy communication environment.Unlike conventional...
|
103 |
International Conference
Gradient-based active learning query strategy for end-to-end speech recognition
|
2019-02-07 |
In this paper, we propose an effective active learning query strategy for an automatic speech recognition system with the aim of reducing the training cost. Generally, training a deep neural network ...
|
102 |
International Conference
Perfect match: Improved cross-modal embeddings for audio-visual synchronisation
|
2019-02-07 |
This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronization. Here, we set up the problem as one of cross-modal retrieval, where the objective is to ...
|