AES Conference Papers Forum

SyncTS: Automatic Synchronization of Speech and Text Documents

Document Thumbnail

In this paper, we present an automatic approach for aligning speech signals to corresponding text documents. For this sake, we propose to first use text-to-speech synthesis (TTS) to obtain a speech signal from the textual representation. Subsequently, both speech signals are transformed to sequences of audio features which are then time-aligned using a variant of greedy dynamic time-warping (DTW). The proposed approach is both efficient (with linear running time), computationally simple, and does not rely on a prior training phase as it is necessary when using HMM-based approaches. It benefits from the combination of a) a novel type of speech feature, being correlated to the phonetic progression of speech, b) a greedy left-to-right variant of DTW, and c) the TTS-based approach for creating a feature representation from the input text documents. The feasibility of the proposed method is demonstrated in several experiments.

AES Conference:
Paper Number:
Publication Date:

Click to purchase paper as a non-member or you can login as an AES member to see more options.

No AES members have commented on this paper yet.

Subscribe to this discussion

RSS Feed To be notified of new comments on this paper you can subscribe to this RSS feed. Forum users should login to see additional options.

Start a discussion!

If you would like to start a discussion about this paper and are an AES member then you can login here:

If you are not yet an AES member and have something important to say about this paper then we urge you to join the AES today and make your voice heard. You can join online today by clicking here.

AES - Audio Engineering Society