LSTM neural network-based speaker segmentation using acoustic and language modelling

Miquel India,José A. R. Fonollosa,Javier Hernando

LSTM neural network-based speaker segmentation using acoustic and language modelling

2017

Miquel India
José A. R. Fonollosa
Javier Hernando

This paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different Joint Factor Analysis (JFA) acoustic approaches: i-vectors and speaker factors. Both of them are compared with a baseline algorithm that uses cosine distance to detect speaker turn changes. LSTM neural networks with both linguistic and acoustic features have been able to produce a robust speaker segmentation. The experimental results show that our proposal clearly outperforms the baseline system.

Keywords:

Speech recognition
Cosine Distance
Segmentation
Artificial intelligence
Pattern recognition
Change detection
Artificial neural network
Computer science

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations