top of page
Search

Background Research: SSI

  • Writer: Lyu Theresa
    Lyu Theresa
  • Oct 13, 2019
  • 2 min read

Silent Speech Interface:

Silent speech interface is a device that allows communication without using the sound made when people vocalize their speech sounds. The main goal of a silent speech interface is to accurately capture speech without in need of vocalization. The end result is similar to “reading someone’s mind”. It’s an exciting and growing technology as it is very suitable for human-machine interactions.

Not all silent speech interfaces are created for one general purpose. The goal of a silent speech interface can be generating actual sound (e.g. for larynx cancer patients) or generating text, or to be used as a mean of an interface between humans and computer systems… As the goal differs; methods and even tools differ as well.

Such devices are created as aids to those unable to create the sound phonation needed for audible speech such as after laryngectomies. Another use is for communication when speech is masked by background noise or distorted by self-contained breathing apparatus. A further practical use is where a need exists for silent communication, such as when privacy is required in a public place, or hands-free data silent transmission is needed during a military or security operation.

There have been several previous attempts at achieving silent speech communication. These systems can be

categorized under two primary approaches: invasive and non-invasive systems.


Invasive Systems:


Brumberg et al. 2010 used direct brain implants in the speech motor cortex to achieve silent speech recognition, demonstrating reasonable accuracies on limited vocabulary datasets. There have been explorations surrounding measurement of the movement of internal speech articulators by placing sensors inside these articulators. Hueber et al.

2008 used sensors placed on the tongue to measure tongue movements. Hofe et al. 2013 and Fagan et al. 2008 [9] used permanent magnet (PMA) sensors to capture the movement of specific points on muscles used in speech articulation. The approach requires permanent fixing of magnetic beads invasively which does not scale well in a real-world setting. Florescu et al. 2010 propose characterization of the vocal tract using ultrasound to achieve silent speech. The system only achieves good results when combined with a video camera looking

directly at the user’s mouth. The invasiveness and obtrusiveness of the immobility of the apparatus impede the scalability of these solutions in real-world settings, beyond clinical scenarios.


Non-Invasive Systems:

There have been multiple approaches proposed to detect and recognize silent speech in a non-invasive manner. Porbadnik et al. 2009 used EEG sensors for silent

speech recognition but suffered from low signal-to-noise ratio to robustly detect speech formation and thereby encountered poor performance. Wand et al. 2016 used

deep learning on video without acoustic vocalization but requires externally placed cameras to decode language from the movement of the lips. Hirahara et al. use Non-Audible

Murmur microphone to digitally transform signals. There have been instances of decoding speech from facial muscles movements using surface electromyography.

Wand and Schultz 2011 have demonstrated surface EMG silent speech using a phoneme based acoustic model, but the user has to explicitly mouth the words and has to use pronounced facial movements. Jorgensen et al. use surface EMG to detect subvocal words with accuracy fluctuating down to 33% with the system also unable to recognize alveolar consonants with high accuracy, which is a significant obstruction to actual usage as a speech interface.

 
 
 

Recent Posts

See All

Comments


bottom of page