번호
134 International Conference ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis 2020-10-08
APSIPA ASC 2020  
133 International Conference Speaker-invariant Psychological Stress Detection Using Attention-based Network 2020-10-07
When people get stressed in nervous or unfamiliar situations, their speaking styles or acoustic characteristics change. These changes are particularly emphasized in certain regions of speech, so a model tha...  
132 International Journal Effective Emotion Transplantation in an End-to-End Text-to-Speech System 2020-10-07
AbstractIn this paper, we propose an effective technique to transplant a source speaker’s emotional expression to a new target speaker’s voice within an end-to-end text-to-speech (TTS) framework. We ...  
131 International Conference LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis 2020-10-07
We propose a linear prediction (LP)-based waveform generation method via WaveNet vocoding framework. A WaveNet-based neural vocoder has significantly improved the quality of parametric text-to-speech (TT...  
130 International Conference FaceFilter: Audio-visual speech separation using still images 2020-08-13
AbstractThe objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network. Unlike previous works that used lip movement ...  
129 International Conference FaceFilter: Audio-visual speech separation using still images 2020-08-13
AbstractThe objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network. Unlike previous works that used lip movement ...  
128 International Conference Seeing Voices and Hearing Voices: Learning discriminative embeddings using cross-modal self-supervision 2020-08-13
AbstractThe goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervised learning have shown that effective representatio...  
127 International Conference MIRNet: Learning multiple identities representations in overlapped speech 2020-08-13
Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determin...  
126 International Conference Intra-class variation reduction of speaker representation in disentanglement framework 2020-08-13
In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent represe...  
125 International Conference A Cross-channel Attention-based Wave-U-Net for Multi-channel Speech Enhancement 2020-08-11
In this paper, we present a novel architecture for multi-channel speech enhancement using a cross-channel attention-based Wave-U-Net structure. Despite the advantages of utilizing spatial information as we...  
124 International Conference Emotional Speech Synthesis with Rich and Granularized Control 2020-04-19
This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to ...  
123 International Conference Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network 2020-01-31
In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).The recently proposed LPCNet vocoder has successfully achieved high-quality ...  
122 International Conference A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis 2019-11-25
In this paper, we investigate the variation in the performance of a deep learning-based speech synthesis (DLSS) system based on the configuration of output acoustic parameters. Our method is mainly applicable...  
121 International Journal An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis 2019-08-10
In this letter, we propose a high-quality emotional speech synthesis system, using emotional vector space, i.e., the weighted sum of global style tokens (GSTs). Our previous research verified the feasibilit...  
120 International Journal Dry Electrode-Based Body Fat Estimation System with Anthropometric Data for Use in a Wearable Device 2019-07-18
The bioelectrical impedance analysis (BIA) method is widely used to predict percent bodyfat (PBF). However, it requires four to eight electrodes, and it takes a few minutes to accuratelyobtain the mea...  
119 International Conference Model Order Selection for Wind Noise Reduction in Non-negative Matrix Factorization 2019-06-18
In this paper, we propose a wind noise reduction method based on various types of non-negative matrix factorization (NMF) approaches. Since wind noise has highly non- stationary spectral characterist...  
118 International Conference Emotional Speech Synthesis Based on Style Embedded Tacotron2 Framework 2019-06-18
In this paper, we propose a speech synthesis system that effectively generates multiple types of emotional speech using the concept of global style token (GST); where the emotion-related style informati...  
117 International Conference Excitation-by-SampleRNN Model for Text-to-Speech 2019-06-18
.In this paper, we propose a neural vocoder-based textto-speech (TTS) system that effectively utilizes a source-filter modeling framework. Although neural vocoder algorithms such as SampleRNN and WaveNet ar...  
116 International Conference Parameter Enhancement for MELP Speech Codec in Noisy Communication Environment 2019-06-18
In this paper, we propose a deep learning (DL)-based parameter enhancement method for a mixed excitation linear prediction (MELP) speech codec in noisy communication environment.Unlike conventional...  
115 International Journal A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems 2019-05-02
This paper presents a joint learning algorithm for complex-valued time-frequency (T-F) masks in single-channel speech enhancement systems. Most speech enhancement algorithms operating in a single-channel micro...