Authors : Eunwoo Song, Young-Sun Joo, Hong-Goo Kang
Year : 2015
Publisher / Conference : ICASSP
This paper proposes an improved time-frequency trajectory excitation (TFTE) modeling method for a statistical parametric speech synthesis system. The proposed approach overcomes the dimensional variation problem of the training process caused by the inherent nature of the pitch-dependent analysis paradigm. By reducing the redundancies of the parameters using predicted average block coefficients (PABC), the proposed algorithm efficiently models excitation, even if its dimension is varied. Objective and subjective test results verify that the proposed algorithm provides not only robustness to the training process but also naturalness to the synthesized speech.