Admitting the addressee detection faultiness of voice assistants to improve the activation performance using a continuous learning framework

Ingo Siegert,Norman Weißkirchen,Julia Krüger,Oleg Akhtiamov,Andreas Wendemuth

Admitting the addressee detection faultiness of voice assistants to improve the activation performance using a continuous learning framework

2021

Abstract The main promise of voice assistants is their ability to correctly interpret and learn from user input as well as the ability to utilize this knowledge to achieve specific goals and tasks. These systems need predetermined activation actions to start a conversation. Unfortunately, the typically used solution, wake-words, force an unnatural interaction. Furthermore, this method can also confuse when the wake-word, or a phonetically similar phrase, has been said but no interaction with the system is intended by the user. Thereby, the system not only lacks the adequacy of interpersonal interaction, it moreover suffers from an addressee detection faultiness. Although various aspects have already been investigated in this field of acoustic addressee detection research, we demonstrated that the test data used so far rely on ideal conditions: The dialog complexity between human–human and human–device interactions is essentially different while in reality, the behavior of each individual addressing either another human or a device is of large variation. Thus the problem of addressee detection is simplified too much. Our approach works with a specifically designed dataset comprising of human–human and human–computer interactions of similar dialog complexity. Our proposed addressee detection faultiness framework actively communicates the system’s uncertainty that may arise. In connection with a continuous learning framework, this enables a voice assistant system to adapt itself to the users’ individual addressee behavior. This approach achieves significantly improved classification rates of 85.77%, which gives an absolute improvement of 32.22% in comparison to similar experiments employing human annotations as ground truth.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations