Selecting frames for automatic speech recognition based on acoustic landmarks

Di He,Boon Pang Lim,Xuesong Yang,Mark Allan Hasegawa-Johnson,Deming Chen

Selecting frames for automatic speech recognition based on acoustic landmarks

2017

Most mainstream Mel-frequency cepstral coefficient (MFCC) based Automatic Speech Recognition (ASR) systems consider all feature frames equally important. However, the acoustic landmark theory disagrees with this idea. Acoustic landmark theory exploits the quantal non-linear articulatory-acoustic relationships from human speech perception experiments and provides a theoretical basis of extracting acoustic features in the vicinity of landmark regions where an abrupt change occurs in the spectrum of speech signals. In this work, we conducted experiments, using the TIMIT corpus, on both GMM and DNN based ASR systems and found that frames containing landmarks are more informative than others during the recognition process. We proved that altering the level of emphasis on landmark and non-landmark frames, through re-weighting or removing frame acoustic likelihoods accordingly, can change the phone error rate (PER) of the ASR system in a way dramatically different from making similar changes to random frames. Fu...

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations