Now that I’ve blogged about Ubiquity, you should understand why I’ve been obsessing over the properties of a good linguistic UI. It’s not an academic problem: It’s one of the interfaces to the extension I’m working on right now!
Some commenters have asked me the question (if not in these exact words): Is a linguistic UI the right kind of UI for Ubiquity, and if so, why?
(“Because Jono is obsessed with linguistic UIs” isn’t a good enough reason.)
First, the really big picture of what Ubiquity is supposed to be all about: It’s a step towards a Web where verbs (i.e. functionality, i.e. commands, i.e. services) are first-class citizens. And that’s why I’m thinking it should be renamed from Ubiquity to something like “Mozilla Verbs”, maybe.
Creating and sharing nouns — i.e., web pages, i.e. content, i.e. data — on the Web has always been very easy. All you have to do is give someone a link to a URL, and they can see your content. The Web was designed around this idea from the very beginning. But the modern Web is not the relatively static library of information that was originally imagined. It’s full of pages that do stuff. Some of them do so much stuff that we don’t even call them “web pages” anymore, we call them “web applications”. The modern web is full of sites that exist to provide a service rather than a list of facts. You can google something, you can digg something, you can slashdot something… The modern web is full of verbs! The next generation of web interfaces will need to make sharing, creating, interlinking, and combining these verbs as easy as the hypertext paradigm made it to share, create, interlink, and combine nouns. Aza wrote a great post about this, called Sharing Streamable Functionality.
So, keeping in mind that that’s the goal, there’s a couple reasons why a linguistic UI could be better than a point-and-click UI; not for every use case, but for many of them.
The first reason is that a point-and-click UI requires every verb to be graphically represented as an icon or menu item. As the namespace of commands grows, it becomes hard to find places to put all those icons and menu items; the advanced stage of this disease results in terrifyingly bloated GUI apps like Microsoft Word. On the other hand, having zillions and zillions of commands is not a problem when you can simply type the one you want. (Provided, of course, that you know the one you want, which is why I’m so concerned with learnability.) “Zillions and zillions of verbs” is where I think we’re going, because of how easy Ubiquity makes it to create verbs and share them.
The second reason is expressiveness, as I defined it in my last post. I want to be able to tell Firefox, for instance:
“Hey Firefox? Select this page, translate it to Spanish, encrypt it with my mom’s public key, email it to her, hit send, and oh yeah save this chain of commands as a new command so I can use it later. Let’s call the new command ‘garblify’.”
That’s a complex idea. I could do all that in Firefox as it stands now, but I would have to switch between lots of tabs, lots of web applications, copying and pasting and clicking on buttons and icons left and right, and it would take dozens of individual steps. That’s because it’s inherently hard to express a complex idea through the medium of pointing and clicking. It’s much much easier to express a complex idea using language, as I did above. That’s what language is for. This is all provided, of course, that we have an input language which is sufficiently expressive to get the idea across, while not being insanely hard to learn.
On the subject of linguistic UI vs point-and-click, a commenter by the name of VioletJoker left the following comment on a previous post:
What a brilliant idea. Less GUI, more typing. In fact, the same thing applies to scripting languages – why all the clean abstractions, what the programmer really needs is more flexibility, so by extension, we should all develop in machine language. NOT
Despite the sarcasm, VioletJoker makes a really good point! Interfaces are bad if they ask you to make decisions on a level of detail you don’t care about.
For instance, when programming in assembly language, you have to think about the exact memory locations of data and instructions, the instruction set of your processor, and what register you’re loading stuff into. This is a drastically higher level of detail than you want for most problems. C lets you work on a higher level of abstraction, but you still have to think about memory allocation and deallocation. When you’re writing malloc() and free() you are doing the computer’s chores for it instead of focusing on your problem domain. Java lets you work on a higher level of abstraction than C, and Python lets you work on an even higher level than Java. I’m a huge fan of Python because, compared to other languages, there are very few decisions I have to make when writing Python that aren’t relevant to my problem domain.
In user-interface design, it’s the same thing! The first GUI was a step forward, not because there’s something inherently bad about typing, but because it let users work on a higher level of abstraction and forget about irrelevant details like “what’s the exact filesystem path to the directory I want?”. (Well, that and the fact that it had superior discoverability.)
But the GUI is far from perfect. I could fill a book with examples of places where Windows Vista (or Leopard, or Ubuntu) forces us to make decisions that aren’t relevant to what we’re trying to do. Even the Firefox GUI makes us think about fiddly GUI bits unrelated to our web-surfing task. Fiddly bits like which text input field has keyboard focus? Where on the screen is that other tab that has the page I want? Am I currently logged in to my webmail or not? If I hit the Enter key right now, will it submit a form? Etc.
So, when I talk about a linguistic UI, I want something that lets me forget that stuff. I want it to let me work on an even higher level of abstraction than the Firefox GUI. The email verb should let me shoot off a message to somebody just by specifying who they are and what I want to say to them. I don’t want to have to think about navigating to the page for my webmail, or think about which webmail service I’m using or whether I’m logged into it already or not. The email verb should invisibly handle those details for me as much as possible; it should make smart guesses about what I want, while allowing me to easily override it when it guesses wrong, and it should attempt to improve the accuracy of its guesses over time.
So to my list of requirements for a good linguistic UI, I’ll add one more: it should abstract away details that are not relevant to the task at hand. In other words, the vocabulary — the command set — should be on the level of tasks that the user cares about.
July 28, 2008 at 1:10 pm
Who do you see as being a target audience or user of Mozilla Verbs? Firefox users in general, or more narrowly developer types?
July 28, 2008 at 4:51 pm
Ben: Good question. We’re hoping for cross-over appeal. The plan is that it will appeal to developers first because of how very, very easy it is to write a ubiquity command (Orders of magnitude easier than writing a Firefox extension — I’m going to do a full post about this soon).
Then, because there are so many useful commands being written for it (fingers crossed), it will appeal to the same group of Firefox users who install extensions. (Which is far from all Firefox users, by the way).
Then, (in my world-domination fantasies), due to popular demand it will be built-in to all future versions of Firefox. Which will then go up to 50% marketshare and millions of people will be using Verbs every day.
July 30, 2008 at 8:13 am
I think an easier way of fitting nouns and verbs into the web that exists now is comparing them to a static web and a dynamic web. With a static web there were web page links, you could read text … the nouns. The dynamic web is the action and interaction of AJAX, social, interactive, etc web that has been labeled Web 2.0.
So I’d say that Verbs are a type of UI for interfacing with a dynamic web. I’m not convinced that they are the right UI for that task, but then again I haven’t tried it. (an XPI snapshot would be nice)
August 15, 2008 at 7:22 pm
wow so cool.
August 26, 2008 at 11:10 pm
[…] Why Verbs by Jono DiCarlo […]
August 27, 2008 at 11:11 am
Brilliant concept, I love the idea. Would it still, however, require a keyboard as a means of input? While I believe the keyboard will be around for a long time to come, I think there will also be major advances in “physical” based input (touch screens, manipulating devices akin to Wii, etc.). I think there will also be major advances in how mobile devices interact with the web & greater demand for content on the go.
So, that said, how will Verbs adapt/respond to the challenge of a keyboard-less environment / an environment where keystrokes are a pain?
August 27, 2008 at 2:33 pm
But when do you get around VioletJoker’s main complaint about just adding more typing?
For example, it seems to me that Ubiquity is doing things with the keyboard that I can already do with my mouse, using OS X Services, for example. Why should I type “define this” when I can click on it and choose the dictionary service?
Unless we’re talking about a spoken-language approach – which is what your example of “telling” Firefox really looks like, I don’t see this as a big step forward.
August 27, 2008 at 3:08 pm
So, in essence, will Ubiquity be what Quicksilver for the Mac tried/tries to be?
August 27, 2008 at 3:08 pm
Cool. Have you guys thought of incorporating social features, e.g., like those of yubnug.org (an older, arguably less ambitious effort which has some commonality with your project)? Also, have you thought about implementing the equivalent of *ix pipes on this?
August 27, 2008 at 6:35 pm
[…] Why Verbs by Jono DiCarlo […]
August 28, 2008 at 12:12 am
To answer John:
I love ubiquity already and I’ve only been using it today. For someone like me who is keyboard oriented, it’s great – I don’t have to leave the keyboard to find the stuff I wanted. I dont’ have to “click on it” – I keep my fingers on the keyboard where they belong. It’s much more accurate and fast, for some of us, to type rather than mouse. Mouses are great, don’t get me wrong. But there are times when keyboard is just more efficient and Ubiquity takes advantage of that.
August 28, 2008 at 10:47 am
As long as we can get rid of unecessary “clicking” to get things done, I’m in… verbs are natural. The mouseclick is used to validate a choice, from a binary YES-NO alternative. In essence that’s what a mouseclick “knows” about language: YES & NO.
Complex situations require the user to go through large strings of “YES-NO” prior to getting to the final YES that we’re looking for.
Getting rid of such large strings of mouseclicks is an obvious idea.
You don’t want to rebuild your house brick by brick everytime you want to go home. Yet that’s what we all do today when we want something specific on the Web: click by click, again and again.
August 29, 2008 at 9:16 am
I would like to add more about the subject of language as a barrier. If you check out the work of linguists such as Eleanor Rosch, George Lakoff and even Wittgenstein it is easy to see how usability can be improved if more thought is given to the naming conventions we use.
The results of many years of linguistic study have shown that human kind across many different cultures name things most readily at the genus level, the mind must find it easier to interpret content if the genus is used when designing with nouns. With verbs however I think we find a move beyond this and towards a bodily movement which seems to me to be a more user friendly approach.
August 30, 2008 at 5:50 am
[…] some prosumer, pursuing his or her own interests, takes the time to create the “garbilify” command and shares it with the world, I will not need to know about it in order to use it. I will only have […]
August 31, 2008 at 3:30 pm
[…] choice and get weather. The video shows some very impressive functionality and gives a taste of the natural language functionality that they seem to have in mind. The review over at Ars Technica has an excellent overview of […]
September 3, 2008 at 4:25 pm
Ubiquity can’t be a step backwards, because in an interface the mode of input is not as important as the language of input. Whether it’s a command line or a menus-and-buttons GUI or a mouse gesture system, the input methods have no inherent superiority. What matters is the sophistication of the dialogue*.
A gesture system is very powerful because it is fast, intuitive and abstract. However, it is unsophisticated; more complex actions require more complex gestures or longer chains of simple gestures, and as the complexity grows, the speed advantage is lost and it is harder to recall commands. It is useful only for very simple commands (but very, very good at that).
Mouse-driven GUIs are also powerful, but slow, and limited in the sense that commands must be located, their locations remembered, and long sequences repeated in their entirety. GUIs are controlled more by their designer than by the user. Many GUIs offer macro action systems that compensate for these shortcomings, but most people are unfamiliar with them.
Command lines are the most versatile of interfaces because of the flexibility and speed of language. What Ubiquity offers is a more sophisticated dialogue; if you can write English, you can use Ubiquity. If Ubiquity is fluent in English – in verbs – then you can use language in all its nuanced power.
It is strange to see people suggest that command lines are a step backwards – to see them commenting on blogs and forums to that effect. After all, they are using their keyboards to type into text fields, are they not? It is no surprise that instant messaging is more popular than voice chat, that text messaging is more popular than telephoning, that emoticons 🙂 have become a shorthand for emotion. Language is how you Get Things Done in the world; it’s what we use to navigate through conversations, business agreements, relationships. We should be able to use it to talk to our computers.
Of course, at this stage, Ubiquity doesn’t really show off the power of verbs. Many of its features are hardly faster than what Firefox can already do in the Awesomebar. However, as its ability to understand our commands grows, the sophistication of our dialogue will rise, and someday, typing, “Send Bob Jones map directions from his house to mine, tell him ‘Hope to see you at the kegger this Friday’, and attach a random ‘beer goggles’ photo” won’t just be a cool concept.
(*I say “dialogue” because that’s what it is: the user inputs, and the computer outputs, but the input/output is the same thing: new information added to a conversation.)
September 5, 2008 at 10:20 am
[…] discovered this project after reading Jono DiCarlo’s blog post about linguistic UIs, in it he was discussing the difference between using a noun as a connection for a user and using a […]
September 10, 2008 at 5:34 pm
[…] Why Verbs by Jono DiCarlo […]
September 27, 2008 at 8:21 pm
After using Ubiquity for about a week, I’m completely sold on the idea of supplementing GUIs with CLIs. Any work that is done to create a functional, language based CLI can be combined with progress that is made in speech-to-text software. We’re not far off from being able to literally tell our computers what we want. I, for one, am extremely excited! Excellent work!
October 1, 2008 at 4:09 am
[…] the vision of jono over at Mozilla Labs, and it sounds pretty good to me Although I couldn’t help but think that if I could access […]
November 9, 2008 at 2:28 pm
[…] the whole Internet and all of its information easily accessible with a word. Here’s what Mozilla Labs developer Jono has to say about the decision to use a linguistic UI for […]
November 10, 2008 at 11:00 pm
“The email verb should invisibly handle those details for me as much as possible; it should make smart guesses about what I want, while allowing me to easily override it when it guesses wrong, and it should attempt to improve the accuracy of its guesses over time.”
If I may steal a bit of zen from Python, “In the face of ambiguity, refuse the temptation to guess.” (try: “import this”)
The worst thing I find about GUI’s is that it becomes really easy to inadvertantly initiate an action which you had no intention of initiating. I don’t know how many times I’ve seen someone show me something on their computer and accidentally open the wrong file or start the wrong application. While our brains devote a large portion of “computing power” to visual processing; most of it is simulation — what we’re actually seeing is largely simulated by our brain based on experience and expectation. What I’m getting at is that we’re still clumsy — what we see and what we intend to do about it are largely disconnected.
What becomes difficult in NLP is trying to assert intent. A large portion of our natural speech is ambiguous and relies heavily on implied and contextual knowledge to parse. We as human beings are intuitive and generally quite good at “guessing” what someone is talking about when they start using heavily implied language. Even still, we get things wrong. For us, it’s usually not a problem — but if a computer guesses our intent wrong, it could have potentially catastrophic effects.
Ubiquity is a really cool tool as long as it doesn’t try to guess too much. I simply wonder whether a GUI is necessary at all. I have essentially implemented large portions of the functionality of Ubiquity in a collection of scripts that have no common interface. It would be nice to have a unified (if simplified) NLP parser that could deduce which programs to run and how to pipe their text streams together. Even as a library, this would allow new programs to be built around Ubiquity’s concepts rather than forcing it to live inside a “walled garden” and unique to a single program.
I enjoyed your article. Keep up the good work; can’t wait to see how Ubiquity will develop.
December 14, 2008 at 2:08 pm
while i don’t understand the app design part of this (my training is in linguistics, not cis), the idea of using verbs makes great sense to me — in human language, the verb is a highly abstract slot that provides at least 2 different kinds of meaning — the first is semantic/lexical (what the action is) — the second is relational (how the action involves the noun/s) — rather like an app, then, the verb is part semantics, part function — and in english, which features zero derivation morphology(much like a bit of lg. app!), virtually any lexical item can be dropped into the verb slot to create a new verb, which may or may not make it into the conventionalized dictionary, but is understood in context by the users — the meanings of these neo-verbs tend to be constrained, highly typical, making use of the most salient features associated with “action” verbs, those classically called “transitive” where the overall sentence proposition follows the most salient pattern “who does what to whom (with what result)” — this sort of user-friendliness is built into the natural “software” of human language, but doubtlessly uses the distributed processing power of the brain to create an incredibly fast and rich language system — can the “web” (or rather “does the web”) do this? is this what is meant by “the cloud”? the distributed data on the web together with a processor?
January 8, 2009 at 4:26 am
Not at all different than what i’m trying to do, hard to find the time as it is definitely not an easy endeavour, but we’ll get it eventually 🙂
January 16, 2009 at 2:33 pm
[…] And I love what this guy has to say about verbs… […]
January 20, 2009 at 10:36 pm
Just curious, is the plan for Ubiquity to support only English verbs or other languages as well?
April 7, 2009 at 7:31 pm
“Gilcatt Says:
August 28, 2008 at 10:47 am
As long as we can get rid of unecessary “clicking” to get things done, I’m in… verbs are natural. The mouseclick is used to validate a choice, from a binary YES-NO alternative. In essence that’s what a mouseclick “knows” about language: YES & NO.
Complex situations require the user to go through large strings of “YES-NO” prior to getting to the final YES that we’re looking for.
Getting rid of such large strings of mouseclicks is an obvious idea.
You don’t want to rebuild your house brick by brick everytime you want to go home. Yet that’s what we all do today when we want something specific on the Web: click by click, again and again.”
That’s true, but graphical interfaces which you use your mouse to interact do give the abilty to easily change certain parameters of verbs – like zooming in on a certain area of a map, and resizing it to the required size.
When you consider that the size of the map itself will affect either the scale of the map, or the size of the area shown in the map, getting exactly the right map would be incredibly difficult with only a text based system. A mouse however can be easily used to resize the frame, and would show the user that the scale of the map is changing as well. Then the user could zoom in/out a bit, drag the camera over the exact area they want, and hit enter to finish their work. What I like about Ubiquity is that it will give you an intelligent guess or scale etc. based on your input, and then allow you to finetune that in a GUI.
It’s quick, and it uses both the strenghs of the gui, and the command line.
August 16, 2009 at 9:25 pm
Let us not forget, this combined with improved voice recognition could prove to be a VERY powerful tool in the future…