Now that I’ve blogged about Ubiquity, you should understand why I’ve been obsessing over the properties of a good linguistic UI. It’s not an academic problem: It’s one of the interfaces to the extension I’m working on right now!

Some commenters have asked me the question (if not in these exact words): Is a linguistic UI the right kind of UI for Ubiquity, and if so, why?

(“Because Jono is obsessed with linguistic UIs” isn’t a good enough reason.)

First, the really big picture of what Ubiquity is supposed to be all about: It’s a step towards a Web where verbs (i.e. functionality, i.e. commands, i.e. services) are first-class citizens. And that’s why I’m thinking it should be renamed from Ubiquity to something like “Mozilla Verbs”, maybe.

Creating and sharing nouns — i.e., web pages, i.e. content, i.e. data — on the Web has always been very easy. All you have to do is give someone a link to a URL, and they can see your content. The Web was designed around this idea from the very beginning. But the modern Web is not the relatively static library of information that was originally imagined. It’s full of pages that do stuff. Some of them do so much stuff that we don’t even call them “web pages” anymore, we call them “web applications”. The modern web is full of sites that exist to provide a service rather than a list of facts. You can google something, you can digg something, you can slashdot something… The modern web is full of verbs! The next generation of web interfaces will need to make sharing, creating, interlinking, and combining these verbs as easy as the hypertext paradigm made it to share, create, interlink, and combine nouns. Aza wrote a great post about this, called Sharing Streamable Functionality.

So, keeping in mind that that’s the goal, there’s a couple reasons why a linguistic UI could be better than a point-and-click UI; not for every use case, but for many of them.

The first reason is that a point-and-click UI requires every verb to be graphically represented as an icon or menu item. As the namespace of commands grows, it becomes hard to find places to put all those icons and menu items; the advanced stage of this disease results in terrifyingly bloated GUI apps like Microsoft Word. On the other hand, having zillions and zillions of commands is not a problem when you can simply type the one you want. (Provided, of course, that you know the one you want, which is why I’m so concerned with learnability.) “Zillions and zillions of verbs” is where I think we’re going, because of how easy Ubiquity makes it to create verbs and share them.

The second reason is expressiveness, as I defined it in my last post. I want to be able to tell Firefox, for instance:

“Hey Firefox? Select this page, translate it to Spanish, encrypt it with my mom’s public key, email it to her, hit send, and oh yeah save this chain of commands as a new command so I can use it later. Let’s call the new command ‘garblify’.”

That’s a complex idea. I could do all that in Firefox as it stands now, but I would have to switch between lots of tabs, lots of web applications, copying and pasting and clicking on buttons and icons left and right, and it would take dozens of individual steps. That’s because it’s inherently hard to express a complex idea through the medium of pointing and clicking. It’s much much easier to express a complex idea using language, as I did above. That’s what language is for. This is all provided, of course, that we have an input language which is sufficiently expressive to get the idea across, while not being insanely hard to learn.

On the subject of linguistic UI vs point-and-click, a commenter by the name of VioletJoker left the following comment on a previous post:

What a brilliant idea. Less GUI, more typing. In fact, the same thing applies to scripting languages – why all the clean abstractions, what the programmer really needs is more flexibility, so by extension, we should all develop in machine language. NOT

Despite the sarcasm, VioletJoker makes a really good point! Interfaces are bad if they ask you to make decisions on a level of detail you don’t care about.

For instance, when programming in assembly language, you have to think about the exact memory locations of data and instructions, the instruction set of your processor, and what register you’re loading stuff into. This is a drastically higher level of detail than you want for most problems. C lets you work on a higher level of abstraction, but you still have to think about memory allocation and deallocation. When you’re writing malloc() and free() you are doing the computer’s chores for it instead of focusing on your problem domain. Java lets you work on a higher level of abstraction than C, and Python lets you work on an even higher level than Java. I’m a huge fan of Python because, compared to other languages, there are very few decisions I have to make when writing Python that aren’t relevant to my problem domain.

In user-interface design, it’s the same thing! The first GUI was a step forward, not because there’s something inherently bad about typing, but because it let users work on a higher level of abstraction and forget about irrelevant details like “what’s the exact filesystem path to the directory I want?”. (Well, that and the fact that it had superior discoverability.)

But the GUI is far from perfect. I could fill a book with examples of places where Windows Vista (or Leopard, or Ubuntu) forces us to make decisions that aren’t relevant to what we’re trying to do. Even the Firefox GUI makes us think about fiddly GUI bits unrelated to our web-surfing task. Fiddly bits like which text input field has keyboard focus? Where on the screen is that other tab that has the page I want? Am I currently logged in to my webmail or not? If I hit the Enter key right now, will it submit a form? Etc.

So, when I talk about a linguistic UI, I want something that lets me forget that stuff. I want it to let me work on an even higher level of abstraction than the Firefox GUI. The email verb should let me shoot off a message to somebody just by specifying who they are and what I want to say to them. I don’t want to have to think about navigating to the page for my webmail, or think about which webmail service I’m using or whether I’m logged into it already or not. The email verb should invisibly handle those details for me as much as possible; it should make smart guesses about what I want, while allowing me to easily override it when it guesses wrong, and it should attempt to improve the accuracy of its guesses over time.

So to my list of requirements for a good linguistic UI, I’ll add one more: it should abstract away details that are not relevant to the task at hand. In other words, the vocabulary — the command set — should be on the level of tasks that the user cares about.