top of page
Search

Background Research: VUI

  • Writer: Lyu Theresa
    Lyu Theresa
  • Oct 10, 2019
  • 1 min read

Voice User Interfaces (VUI)


A voice-user interface (VUI) makes spoken human interaction with computers possible, using speech recognition to understand spoken commands and answer questions, and typically text to speech to play a reply.

This has facilitated the advent of ubiquitous natural voice interfaces, currently deployed in

mobile computational devices as virtual assistants (e.g- Siri, Alexa, Cortana etc.). These interfaces have also been embedded in other devices such as smart wearables,

dedicated hardware speakers (e.g – Google Home, Amazon Echo), and social robots. Another broad category under voice interfaces are modern telecommunications devices for person-person communication (e.g - smartphones, Skype etc). Although, all the aforementioned platforms offer robust voice-based interaction, they share common limitations.


There are fundamental impediments to current speech interfaces that limit the possibility of their adoption as a primary human-machine interface. List a few here amongst others:


Privacy of conversation: Speech is broadcasted to the environment by the user when communicating via these interfaces and therefore user privacy is not maintained (e.g -

a phone call with another person; communicating with Siri etc.).


Eavesdropping: Voice interfaces are always listening in on conversations, when not desired, only to be visibly activated later on by a specific trigger-word (e.g - 'Ok

Google' activates Google Assistant but the application is on nevertheless).


Impersonal devices: These devices are not personal devices and any other user can intentionally or unintentionally send valid voice inputs to these devices.


Attention requiring: Current voice interaction devices have low usability as a device, a user cannot use a speech interface hands-free on-the-go, which is the case oftentimes with immobile dedicated speech devices (e.g - Amazon Echo). Moreover, user proximity to the device is required for optimal speech recognition (telecommunications devices).


Ref:

 
 
 

Recent Posts

See All

Comments


bottom of page