번호
320 International Conference ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis 2020-10-08
APSIPA ASC 2020  
319 International Conference Speaker-invariant Psychological Stress Detection Using Attention-based Network 2020-10-07
When people get stressed in nervous or unfamiliar situations, their speaking styles or acoustic characteristics change. These changes are particularly emphasized in certain regions of speech, so a model tha...  
318 International Journal Effective Emotion Transplantation in an End-to-End Text-to-Speech System 2020-10-07
AbstractIn this paper, we propose an effective technique to transplant a source speaker’s emotional expression to a new target speaker’s voice within an end-to-end text-to-speech (TTS) framework. We ...  
317 International Conference LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis 2020-10-07
We propose a linear prediction (LP)-based waveform generation method via WaveNet vocoding framework. A WaveNet-based neural vocoder has significantly improved the quality of parametric text-to-speech (TT...  
316 Domestic Conference 화자 및 발화 스타일 임베딩을 통한 다화자 음성합성 시스템 음질 향상 2020-09-15
In this paper, we improve the speech quality of multi-speaker text-to-speech (TTS) system by adding two embedding networks that represent speaker and speaking style characteristics. The speaker embedding is ...  
315 International Journal Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval 2020-09-14
Abstract:This paper proposes a new strategy for learning effective cross-modal joint embeddings using self-supervision. We set up the problem as one of cross-modal retrieval, where the objective is to fin...  
314 Domestic Conference 딥러닝 기반 종단 간 다채널 음질 개선 알고리즘 imagefile 2020-08-28
AbstractIn this paper, we propose a deep learning-based multi-channel speech enhancement algorithm. The proposed system consists of three sub-modules such as magnitude estimation, phase estimation, and spatial ...  
313 International Conference FaceFilter: Audio-visual speech separation using still images 2020-08-13
AbstractThe objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network. Unlike previous works that used lip movement ...  
312 International Conference Seeing Voices and Hearing Voices: Learning discriminative embeddings using cross-modal self-supervision 2020-08-13
AbstractThe goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervised learning have shown that effective representatio...  
311 International Conference MIRNet: Learning multiple identities representations in overlapped speech 2020-08-13
Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determin...  
310 International Conference Intra-class variation reduction of speaker representation in disentanglement framework 2020-08-13
In this paper, we propose an effective training strategy to ex-tract robust speaker representations from a speech signal. Oneof the key challenges in speaker recognition tasks is to learnlatent represe...  
309 International Conference A Cross-channel Attention-based Wave-U-Net for Multi-channel Speech Enhancement 2020-08-11
In this paper, we present a novel architecture for multi-channel speech enhancement using a cross-channel attention-based Wave-U-Net structure. Despite the advantages of utilizing spatial information as we...  
308 Domestic Conference 메타러닝을 이용한 SAR 영상 자동표적 인식 2020-07-13
공군의 공대지 작전에서 지상의 물체를 정확하게 식별하는 것은 매우 중요하다. 그러나, 임무 특성상 대부분 높은 고도에서 임무를 수행하기 때문에 조종사가 육안으로 표적을 정확하게 식별하는 것은 어렵고, 구름이나 안개와 같...  
307 International Conference Emotional Speech Synthesis with Rich and Granularized Control 2020-04-19
This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to ...  
306 Domestic Conference 저사양 TV 사운드 설계환경을 위한 IIR 필터 기반 주파수 등화기 2020-04-01
In countries that are developing low-end TVs (eg India, Africa, etc.), the lack of development environment and infrastructure often do not take into account the sound environment of the TV. To solve ...  
305 International Conference Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network 2020-01-31
In this paper, we propose an improved LPCNet vocoder using a linear prediction (LP)-structured mixture density network (MDN).The recently proposed LPCNet vocoder has successfully achieved high-quality ...  
304 Domestic Journal k-평균 알고리즘을 활용한 음성의 대표 감정 스타일 결정 방법 2019-12-17
In this paper, we propose a method to effectively determine the representative style embedding of each emotion class to improve the global style token-based end-to-end speech synthesis system. The emot...  
303 International Conference A Study on Acoustic Parameter Selection Strategies to Improve Deep Learning-Based Speech Synthesis 2019-11-25
In this paper, we investigate the variation in the performance of a deep learning-based speech synthesis (DLSS) system based on the configuration of output acoustic parameters. Our method is mainly applicable...  
302 International Journal An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis 2019-08-10
In this letter, we propose a high-quality emotional speech synthesis system, using emotional vector space, i.e., the weighted sum of global style tokens (GSTs). Our previous research verified the feasibilit...  
301 International Journal Dry Electrode-Based Body Fat Estimation System with Anthropometric Data for Use in a Wearable Device 2019-07-18
The bioelectrical impedance analysis (BIA) method is widely used to predict percent bodyfat (PBF). However, it requires four to eight electrodes, and it takes a few minutes to accuratelyobtain the mea...