This paper analyzes the pitch fluctuations of different notes in Taiwanese singing in order to build an F0 note-type based control model that improves the naturalness of Taiwanese synthesized singing voice by producing the more natural F0 contours. The factors that significantly differentiate singing synthesis from speech synthesis must be taken into consideration when designing a singing synthesizer. Among these, the fundamental frequency (F0) contour is an important feature that deeply affects singing voice perception and needs to be controlled precisely. The F0 contour contains fluctuations instead of a predefined stepwise pitch curve derived from musical notes. These fluctuations are important features that should be taken into consideration in singing-related applications such as singing synthesis, singing voice detection, performance analysis, singing/music recognition, singing style identification, and query-by humming. Overshoot percentage and preparation percentage are proposed to solve the problems of determining the fluctuation extent. Statistics for each note category were established from a corpus of Taiwanese nursery rhymes. Different extents of the overshoot and preparation of separate categories of notes for males, females, and children were modeled according to the statistic results. A PID controller that controls a second-order system is proposed to quickly adjust to the correct F0 level of notes and remain sufficiently steady at the correct F0 level to produce a pleasant singing voice.
Lai, Wen-Hsing; Liang, Sen-Fu
Affiliation: Dept. of Computer and Communication Engineering, National Kaohsiung University of Science and Technology, Kaohsiung City, Taiwan
JAES Volume 66 Issue 5 pp. 343-359; May 2018
Publication Date: May 24, 2018
No AES members have commented on this paper yet.
If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.