Unsupervised optimal phoneme segmentation: theory and experimental evaluation

2013 
Automatic phoneme segmentation of a speech sequence is a basic problem in speech engineering. This study investigates unsupervised phoneme segmentation without using prior information on linguistic contents and acoustic models of an input sequence. The authors formulate the unsupervised segmentation as an optimal problem by means of maximum likelihood, and show that the optimal segmentation corresponds to minimising the coding length of the input sequence. Under different assumptions, five different objective functions are developed, namely log determinant, rate distortion (RD), Bayesian log determinant, Mahalanobis distance and Euclidean distance objectives. The authors prove that the optimal segmentations have the transformation-invariant properties, introduce a time-constrained agglomerative clustering algorithm to find the optimal segmentations, and propose an efficient implementation of the algorithm by using integration functions. The experiments are carried out on the TIMIT database to compare the above five objective functions. The results show that RD achieves the best performance, and the proposed method outperforms the previous unsupervised segmentation methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    5
    Citations
    NaN
    KQI
    []