What would the web be like if you could tell it what you want to do as easily as you currently tell it where you want to go?

Mozilla Labs is starting to experiment with linguistic interfaces. That is, we’re playing around with interfaces where you type commands and stuff happens — in much the same way that you can type a location into the address bar in order to go somewhere.

I think this is cool because, for one thing, I think language-based interfaces are seriously under-explored compared to pointing-based interfaces. For another thing, I used to work on a project called Enso. Enso’s a language-based interface, where you type commands in and stuff happens. I think we got certain things right and certain things wrong in Enso’s UI design, so I want to take another crack at doing it better.

What makes a good linguistic UI?

Here’s my current theory.

  1. It’s easy to learn.
  2. It’s efficient.
  3. It’s expressive.

Those are the three “E”s. Let’s unpack ’em a little.

“Easy to learn” should be self-explanatory. Even if a tool is super-efficient and incredibly powerful, if it’s too hard to learn, it’ll be relegated to the “experts only” ghetto. Yeah, that’s right, I’m talking about you, Unix shell. (ahem, Unix/Linux/Posix/BSD/etc.) The original language-based UI, the oldest UI still in common use, and the one which has given the whole concept of “type what you want to do” a bad name for the last thirty years, serves as an excellent counter-example. For a linguistic UI to be easy to learn, it should strive to avoid all of the following misfeatures of the shell:

  • Not discoverable: There’s no guidance given to a first-time user. You type some letters and nothing happens: it feels like shouting into a void. If you don’t already have the basic commands memorized, there’s no way to figure out what they are.
  • Cryptic names: Whether for historical reasons or for brevity, the standard names of commands, programs, and locations are all called stuff like ‘tar’ and ‘mkdir’ and ‘/usr/local/bin’. Because these names are unnatural and unfamiliar, they have to be learned by rote.
  • No feedback: I just entered a command and all I got back was a blank line! It worked, but what did I just do?
  • Options are hard to remember: Does the ln command take the source file first or the destination file first? What does the -z option on tar do again?
  • Really easy to make mistakes: One wrong character and your innocent command is transformed into a ruthless file murderer. And there’s no undo!

But the CLI isn’t all bad. Obviously, if it was all bad, there wouldn’t be so many people still using it! I’m a programmer. I live on the command line. The learning curve was years ago and now it’s second-nature. I couldn’t live without it. So what are the good points?

The first good point is that you can get a whole lot done with just a few keystrokes, thanks to the very short names, the tab-based autocompletion, and the command history that lets you easily repeat or modify earlier commands. This makes it a very efficient interface. You can learn more about the precise (quantitative) definition of information-theoretic efficiency in Aza’s article, Know When to Stop Designing — Quantitatively. There are logarithms involved. If you don’t feel like doing math, all you need to know for now is this extremely simple concept: the fewer keys you have to hit to get the computer to understand what you want, the less wasted effort, the more efficient the interface.

The second good point is that it’s not just a set of commands, it’s a language. (BASH is turing-complete.) Pipes, stdin/stdout redirection, backticks, environment variables, etc. form the grammar. Executables (“small programs that do one thing and do it well”) form the vocabulary. Every command line you write is a little one-time program. With shell scripting, you can make that one-time program into a reusable command. You’re not limited to a small set of commands: Like any programming language, or any human language, an infinite number of ideas can be expressed with a finite vocabulary. I call this quality “Expressiveness”.

There’s our three “E”s: Easy, efficient, and expressive. Unix has… well, two out of three ain’t bad! It beats the Mac/Windows style GUI hands down in both efficiency and expressiveness, but loses badly in ease of learning.

So here’s the riddle. This is what attracts me to the challenge of language-based UI:

How can we make a UI with the efficiency and expressiveness of the Unix command line, but that’s easy to learn and that won’t shoot you in the foot?

Enso was our attempt at a very easy-to-learn linguistic UI. You hold down the Caps Lock key and start typing a command. Enso displays an automatic completion of whatever command best matches your input, along with a description of what the command will do if executed. On the lines below the input, Enso shows suggestions for other commands similar to the input. You can use the arrow keys to hilight one of the suggestions, and release Caps Lock to execute it.

It’s far from perfect, but I’m still proud of how easy Enso is to learn. Once someone grasps the basic idea of “Hold down caps lock and type a command”, they can almost always figure the rest out on their own. The suggestion list makes them passively aware of what other commands exist, while the description text teaches them what commands do. Seeing what Enso thinks you mean before you execute helps reduce errors, too.

However, this design for Enso had some basic limitations. Commands were always verb-noun, like “Open notepad”, and could only take a single argument apiece, so the expressiveness was limited. We had plans for how multiple commands could be chained together, but this still hasn’t been implemented.

At Humanized, one of the most common feature requests we got was for a way to abbreviate commands — especially the very frequently used “open” command. Another of the most common feature requests was for the ability to enter the noun before the verb — e.g. to open notepad by typing “notepad” first instead of “open notepad” (the way Quicksilver works). Both these requests were clues that people were being frustrated by the need to type “open” over and over again. The interface was inefficient because the first five characters of “open notepad” were wasted keystrokes.

We tried to improve the efficiency by allowing the tab key to be used to autocomplete the rest of the current word, but hitting tab while holding caps lock requires finger contortions, so this feature was seldom used. We wanted to stick with verb-noun because of the similarity to natural English word order, and we avoided abbreviations because we wanted the behavior to always be consistent, but I’m now convinced the users were right — the inefficiency was a major problem.

There are plenty of other linguistic UIs we could analyze using the three “E”s. For example: The Awesome Bar in Firefox 3 is sort of a linguistic interface by my definition (You type stuff in and stuff happens). It’s very easily learnable (most people can figure it out without being taught), and also very efficient (usually just a few keystrokes to get to the website you want) but it’s not expressive at all (all it does is open pages).

Are the three “E”s mutually contradictory? Do we have to settle for “Easy, efficient, expressive: choose two”? That would be pretty depressing. But I’m not ready to throw in the towel just yet.

How should the ideal linguistic UI behave?

Based on all of these experiences, here’s my current thinking about what the ideal linguistic UI should be able to do.

For ease of learning, it should:

  • Accept input in something very close to the human language I’m already familiar with.
  • Give me clues about what commands are available.
  • Give me clues about what I can type next.
  • Give me clues about what the current command will do if executed.
  • Give me suggestions about other commands it thinks I might be looking for.
  • Help me understand what ranges of arguments to a command are valid, and what the arguments mean.
  • Propose commands appropriate to my working context or to the type of data I have selected.

For efficiency, it should:

  • Allow the user to start with the noun or to start with the verb.
  • Let me autocomplete a partial word with a keystroke.
  • Recognize words even if they’re super-abbreviated.
  • Remember what suggestions I’ve chosen in the past and pop them up next time I give the same input.
  • Let me partially enter something, see the suggestions, choose one as mostly-right, and edit that one some more before executing it.
  • Guess, from my context and my selection, what I want, and fill most of it in for me, while letting me easily override it if it’s wrong.

For expressiveness, it should:

  • Handle commands with multiple arguments, including optional arguments, that can take various data types.
  • If I have data selected, let me use that selection as an input for any of the multiple arguments — or for none of them.
  • Let me chain commands together, with the output of one going to the input of the next, like Unix pipes.
  • If my input could mean more than one thing, give me a sensible way to resolve the ambiguity.
  • Let me compose a complex command out of small parts, in the flexible way that natural language does.
  • Let me save a complex command that I’ve created and give it a simple name so I can re-use it in the future.
  • Give me an easy way to create my own commands — and to share them with others.

An impressive list of demands? Yes!

Conflicting design goals? Probably!

Impossible? I don’t think so!

Tune in next time for the design workshop where we try to satisfy all of these constraints at once. I’ve got some ideas, but I’ll be looking for your input, too.