First, watch this video and laugh:

Writing a Perl program with text-to-speech

Cue discussion of how voice-activated interfaces are a terrible idea, that real computers will never work like computers on Star Trek, that Microsoft’s UI designers were consumed with hubris if they ever thought this would work, etc. etc.

Too pessimistic! Thing is, there’s actually a very simple solution to this problem. It doesn’t even rely on any exotic technologies. We could do it today.

Things that a human says within a computer’s hearing might be:

  1. commands (“Do this”)
  2. content (“Put this text into my document”)
  3. noise (things not intended for the computer at all, such as conversations between humans)

The very same words can be any of these three categories, depending on the intent behind them. Human language is ambiguous! That’s why humans rely on so many non-verbal cues, like tone of voice and facial expression, to interpret what other humans are saying.

Computers can’t interpret like that. But they wouldn’t have to if we just had a microphone with a couple of buttons on it.

  • Hold button 1 and talk: The software interprets your speech as commands.
  • Hold button 2 and talk: The software interprets your speech as content.
  • Hold neither button: The software ignores anything you say.

I’m gonna go out on a limb here and predict that we’ll see a decent voice-activated system within the next few years, that will rely on a non-verbal communication channel (such as a few buttons) to help resolve the ambiguity of speech.

There will still be plenty of applications where people would rather type than talk (think about all the reasons why people would use text messaging on cell phones instead calling up and talking to someone), but once the novelty wears off I think speech-based interfaces will soon be seen as one more useful tool among many.