Imagined, Intended, and Spoken Speech Envelope Synthesis from Neuromagnetic Signals.

Neural speech decoding retrieves speech information directly from the brain, providing promise towards better communication assistance to patients with locked-in syndrome (e.g. due to amyotrophic lateral sclerosis, ALS). Currently, speech decoding research using non-invasive neural signals is limited to discrete classifications of only a few speech units (e.g., words/syllables/phrases). Considerable work remains to achieve the ultimate goal of decoding continuous speech sounds. One stepping stone towards this goal would be to reconstruct the inner speech envelope in real-time from neural activity. Numerous studies have shown the possibility of tracking the speech envelope during speech perception but this has not been demonstrated for speech production, imagination or intention. Here, we attempted to reconstruct the intended, imagined, and spoken speech envelope by decoding the temporal information of speech directly from neural signals. Using magnetoencephalography (MEG), we collected the neuromagnetic activity from 7 subjects imagining and speaking various cued phrases and from 7 different subjects speaking yes or no randomly without any cue. We used a bidirectional long short-term memory recurrent neural network (BLSTM-RNN) for single-trial regression of the speech envelope using all brainwaves (0.3–250 Hz). For the phrase stimuli, we obtained an average correlation score of 0.41 and 0.72 for reconstructing imagined and spoken speech envelope respectively, both significantly higher than the chance level (\({<}0.1\)). For the word stimuli, the correlation score of the reconstructed speech envelope was 0.77 and 0.82, respectively for intended and spoken speech. Furthermore, to evaluate the efficacy of low frequency neural oscillations in reconstructing spoken speech envelope, we used delta (0.3–4 Hz) and delta + theta (0.3–8 Hz) brainwaves and found that the performance for word stimuli was significantly lower compared to when brainwaves with all frequencies were used but no such significant difference was observed for phrase stimuli. These findings provide a foundation for direct speech synthesis from non-invasive neural signals.
    • Correction
    • Source
    • Cite
    • Save