Language-Based Interfaces, Part 2: Where do we stand now?

Please note: everything in this article refers to the 0.1.1 release. The tip of the source tree is changing rapidly, so if you’re using Ubiquity from a source checkout, some of its behavior may have changed by the time you read this article.

Two months ago I wrote a post describing the properties of the ideal linguistic interface. Now that we’ve released Ubiquity 0.1.1, I want to look at how well the current state of the Ubiquity interface measures up to that ideal. Where does it provide the desired behavior and where does it still fall short? Where are there clear and obvious improvements that can be made, and where are we puzzled by design questions still murky and nebulous?

Is it easy to learn? How could it be easier? Is it efficient? How could it be more efficient? Is it expressive? How could it be more expressive?

Yes, it’s still in the public beta phase, so nobody should expect it to be perfect. All the more reason to correct UI design problems now, before they get baked-in and people start getting used to them.

The parser nitty-gritty

You can get the general idea of how the parser works by reading the new users’ tutorial and trying out Ubiquity for yourself.

However, there are some subtle-but-important parser behaviors that you might not notice at first glance. In order to understand some of the issues going on with the interface, we will have to dive into the implementation details of the parser. Ready? Here we go!

Click for larger version

This is the parser’s idea of an English sentence: A verb, optionally followed by arguments. The verb is always the first word. Each word after the verb is either part of the direct object or part of a modifier. The direct object is a noun, and each modifier is a preposition followed by a noun. Nouns can be multiple words (“hello everyone” is a single noun), but verbs and prepositions are only single words.

The parser expects every sentence to be like that, but it understands that sometimes parts of the sentence will be abbreviated or left out entirely. It’s not fazed by sentences missing expected modifiers, missing direct objects, or even missing the verb. It will also try to guess a partially-typed verb or noun (from as little as one character).

On execution, the direct object noun and modifier nouns, if any, are extracted from the parsed sentence, and passed as arguments to the execute method of the verb object.

Verb matching

The parser starts by splitting the input on spaces to get a list of words. It then looks at the first word to see if it matches any known verbs. It’s a “match” if a verb starts with the first word:

If the first word does match one or more verbs, then the parser attempts to do a “verb-first completion”:

If there’s no match, then the parser assumes the user has skipped the verb and gone straight for the arguments, so it attempts a “noun-first completion”:

Watch what happens if you type “calendar”:

The input “calendar” does not match to any verbs — even though we have commands called “add-to-calendar” and “check-calendar” — because those verb names don’t start with “calendar”.

Because it doesn’t match a verb, the input “calendar” triggers a noun-first completion. That’s why you get suggestions you probably didn’t expect, like “wikipedia calendar” and “yahoo-search calendar“. It’s because the parser is treating the word “calendar” as a noun, and looking for verbs that could accept that noun as an argument.

Verb-First Completions

So, the first word of the input matched a verb, did it? Great! Let’s do a verb-first completion. But wait a minute. Why do we only compare the first word? Can’t a verb ever be more than one word?

At the moment, no. We introduced the extremely lame restriction that verbs can only be one word, in order to make things easier for the parser. If you look at the command list, you’ll see that none of the command names have spaces in them. They have hyphens.

Click for a bigger image

Watch what happens if you input “add to calendar”:

The first word, “add”, is matched to the “add-to-calendar” verb. The “to calendar” part is parsed as arguments.

Single-word commands may be the standard in Bash, but that’s not natural-language! Typing “add-to-calendar this” is very awkward. I would greatly prefer to be able to type “add this to calendar”.

I look forward to improving the parser to the point where command names no longer need to be restricted, and we can drop the hyphens.

Aside from that major problem, the verb-first parsing is pretty flexible. See how it lets me provide the arguments in any order, for commands with multiple arguments:

The declaration of the email verb specifies that it takes a direct object, which can be arbitrary text, and one modifier, identified by the pronoun “to”, that must be a contact. This declaration informs the parser that it should look for the preposition “to”, and consider any words following “to” as candidates for the “to” argument. The word “aza” is given to the contact noun-type object, which looks through my contacts and produces two suggestions for completions; both of these possibilities end up in my suggestion list.

Notice what happens if I give email a “to” argument that’s not a match to anyone in my address book:

“your mom” is not a contact, and therefore “to your mom” is not a valid recipient. But the parser wants to use every word of the input. Since the “message” argument can accept anything, the parser treats “to your mom” as part of the message argument, and considers the “to” argument to still be blank.

Noun-First Completions

Sometimes it just makes more sense for the user to provide the noun first, instead of the verb. For instance, if the user selects some content on the page and then invokes Ubiquity, it makes sense to suggest commands that could act upon that selection.

It also makes sense that if, say, “check-calendar” is the only command that can take a date as an argument:

then if the user just types a date, we should be able to figure out that check-calendar must be the verb they want.

So, we implemented noun-first completion. If the first word of the input doesn’t mach any verbs, then it will be passed through all of the noun types that the parser knows about, to figure out what noun type it could be. For instance, an input of “tomorrow” gets a match from the Date noun type and the Arbitrary Text noun type. (Because “arbitrary text” matches everything, of course.)

Next, the parser builds up a list of every verb that can take any of these noun types as an argument. Since the “check-calendar” command takes a Date as an argument (as declared in the command definition), it will appear as one of the suggestions if the input is “tomorrow”.

Hey, what gives? Why isn’t it suggesting “check-calendar tomorrow”?

Well, actually it is generating that suggestion. But since “tomorrow” matches the Arbitrary Text noun type, it’s also generating suggestions based on every command that can run on Arbitrary Text, such as google and wikipedia. And those are the suggestions we see first.

This is not so good. We’d prefer to see the more specific suggestions first. Why don’t we?

The Suggestion List

The parser takes every possible valid completion and throws them all into the suggestion list. Only the top five suggestions are actually displayed; if we showed the entire suggestion list, it would often extend off the bottom of the browser window!

That’s because we’re taking every verb that matches the first word of the sentence — or, failing that, every verb that could use every noun type that matches the sentence — and producing multiple parsings based on it. Parsing is ambiguous (sometimes there are words that could either be part of a modifier or part of the direct object), and so every possible alternate parsing goes into the suggestion list.

This is the approach we use wherever and whenever we find ambiguity in the user input: produce every possible interpretation, and throw them all into the suggestion list. As another example, when the input contains the magic word “this”, we suggest both the literal sentence and the sentence with the seletion substituted for “this”:

To expand “this” or not to expand “this” is an independent binary choice, so it doesn’t just add one extra item to the suggestion list — it doubles the length of the suggestion list. Each time we introduce more flexibility into the parser, the suggestion list grows geometrically!

The suggestion list is not currently being sorted.

Let me say that again: The suggestion list is not currently being sorted. At all!

I assumed throughout development that the only way the throw-everything-into-the-suggestion-list approach would possibly result in a usable interface was if the suggestion list should be sorted by quality, so the best suggestions always appear at the top.

To my great surprise, I discovered once I implemented the suggestion list that Ubiquity seems to work fairly OK without any sorting. Not great, not humane, but usable. We had other higher priorities so we delayed sorting until after the 0.1.1 release. Now that 0.1.1 is out, the sorting algorithm is the main thing I’m working on.

But for now, remember — when you type something into Ubiquity and see a list of five suggestions, those aren’t the best five, they’re just the first five that the parser can come up with. That’s why you’ll see google and wikipedia and so on as the suggestions for any noun that you select. And that’s why noun-first completion isn’t very useful as it currently stands.

Coming up next…

I’m going to grade Ubiquity 0.1.1 on all the criteria I introduced in part 1. Then I’m going to tackle the current lack of any naming standards, and what it means in a world of wide-open decentralized command development.

Aza Says:

September 5, 2008 at 3:22 am

This is a great summary of the current state of the parser. I understand it better now 🙂

Julien Couvreur Says:

September 5, 2008 at 4:49 am

Thanks for the great write-up.
I’m curious as to what algorithms can be used to rank the suggestions by relevance.
Do you have some ideas how to do that already?

I wonder if this has been solved before. It sounds like its somewhat related to a Hidden Markov Model, with different probabilities for the different interpretations/suggestions.

Gerv Says:

September 5, 2008 at 8:54 am

“I look forward to improving the parser to the point where command names no longer need to be restricted, and we can drop the hyphens.”

One thing I’ve noticed is that humans don’t seem particularly bad at adapting to odd or modified interaction patterns. For example, driving (pedals, gearstick, wheel) is an odd interface when considered in isolation. The restriction to 160 chars in texts or tweets is another. Email addresses which have full names that don’t contain spaces are another.

Also, we need to make sure that the mental model of a ubiquity interaction is simple enough that people can keep all of it in their head at once.

Given the above two points, is it sensible to allow multi-word verbs? The goal is surely not to make it “as like English as possible”, but to make the interaction as smooth and understandable as possible. If single word verbs makes the mental model easier, perhaps we should keep them as a requirement. Then command authors would design around them, and users would be able to rely on simple rules like “command names are one word”.

Having said that, there would be nothing wrong with auto-completing based on hyphen-separated parts, so typing cal would match add-to-calendar.

Noel Grandin Says:

September 5, 2008 at 10:14 am

Sounds of the examples sound a little contrived.
It would probably be worth your while to build auto-feedback early, so you could see quickly what people were actually trying to do with the tool (and where they are failing).

Zorkzero Says:

September 5, 2008 at 11:37 am

To see what a good parser for a language-based interface should look like, you can look at a interactive fiction game written in Inform http://www.inform-fiction.org/I7/Welcome.html .

I suggest looking at http://en.wikipedia.org/wiki/Inform and playing a game mentioned there. You just need the game itself and an interpreter for your OS. You can get them from http://mirror.ifarchive.org/ . For Windows I suggest WindowsFrotz.

jonoscript Says:

September 5, 2008 at 2:16 pm

Zorkzero: Actually, I don’t have to download anything. I’ll play them online with Atul’s “Parchment” interpreter: http://www.toolness.com/wp/?p=49

I LOVE interactive fiction games, but I haven’t played any in a while. I remember the Inform parser being quite sophisticated in some ways but frustrating in others. Playing one and writing about what’s right and what’s wrong with its parser would be good fodder for a blog post. Thanks for the suggestion.

Not The User's Fault