F 0 range and peak alignment across speakers and emotions
2011
We present an analysis of F 0 range and peak alignment in emotional speech from a heterogeneous group of speakers varying in age and gender. Both speaker and emotion had a strong effect on F 0 range. Despite these large changes in the F 0 trajectory, peak alignment was remarkably stable. Using the Linear Alignment Model (LAM) [1], we show that the effects on alignment of emotion and speaker differences, although statistically significant, are small. This stability results in a conclusion that peak alignment, unlike F 0 range, does not appear to carry much information about speaker identity or emotional state. The LAM is effective in that it explains 42% of the variance in peak location on average, and furthermore it predicts the time of F 0 peaks with an average RMS error of 12ms.
Keywords:
- Speaker recognition
- Emotion recognition
- Robustness (computer science)
- Analysis of variance
- Artificial intelligence
- Speech synthesis
- Pattern recognition
- Root-mean-square deviation
- Linear regression
- Computer science
- Statistical significance
- Trajectory
- peak alignment
- heterogeneous group
- Correlation
- Speech recognition
- Human voice
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
11
References
4
Citations
NaN
KQI