Talking Web - Voice Control using Semantic HTML and JavaScript

24 Sep 2018

An HTML element and its attributes that carry intrinsic meaning. When you compose these elements into a page or web application, in a way that is semantic (and accessible), support for Voice control falls into place.

Searching for Meaning

Let’s look an example using Search. Search is a pretty easy choice, because it’s such ubiquitous, consistent, and standardized feature of web pages. This is a Chrome extension I’ve dubbed “Say11y”, to demonstrate using website search with Voice, with no added HTML or back-end APIs.

The plugin uses the Speech Recognition API to access the microphone and capture utterances. Once the recognizer detects a discernible phrase, we look up the command. In this case, if the phrase starts with “Search”, we find a HTML element on the page that meets one of these selector criteria:

If we find a matching element, we set the form value for that field to the content of the utterance that following the command, and submit the form.

Leaning on Standards

It took me a couple of hours to make this work fairly consistently across a dozen different sites that I tried. That’s because, when web content creators adhere to standards, it’s trivial to make Voice control work.

This can apply to many different features. Here are a couple off the top of my head:

Accessibility Innovations

Web standards advocates have long seen the benefits of semantic markup for accessibility and screen reader support. Now, with the advent of Voice Assistants, it should be clear that this is a universal principal. I’m excited to see how we can take other lessons from accessible technology to set us up for the future.