“An undefined problem has an infinite number of solutions” – Robert A Humphrey

I found that quote on the underside of a tea bottlecap. I think it’s germane to what we’re doing with Test Pilot.

In the months since we started the Test Pilot program, we’ve run three studies, which have collected massive piles of data.

But as the bottlecap warns, those massive piles won’t get us anywhere unless we know what problem we’re trying to solve with them.

There’s a temptation, when looking at a pile of data, to leap to design conclusions.

For example: Suppose studies show that almost nobody is using the profile manager. Great, that means we can get rid of the profile manager! Right?

Or, to use a non-hypothetical example:

Minimum, average, maximum tabs open per session

(Thanks to Blake Cutler for generating all the graphs used in this post).

The most common Firefox session is one with three tabs open? Well hot dog, we should optimize Firefox for the three-tab use case! Right?

Careful, there. This kind of assumption is dangerous. Maybe lots of people would start using the profile manager if they knew it existed. Maybe people would love to have sessions with more tabs open, if the interface was better for managing lots of tabs.



A great post from Boriss on using Amazon’s “mechanical turk” process for user research.

Her conclusions (about the European browser ballot) are interesting too, but personally it’s the research method that interests me most. Is it possible to get an accurate picture of your target audience from a group of self-selected responders? The answer to this question is obviously very relevant to Test Pilot.

While you’re at it, you should read Boriss’ posts on improving the Firefox Add-Ons Manager here and here. In fact, if you’re at all interested in Mozilla usability stuff you should be reading her blog regularly.

How often do you recycle passwords? That is, use the same password for multiple sites? Even though you’ve probably been told this is a security no-no, it’s just too much strain on most people’s memory to come up with unique passwords every time.

Theoretically, the password manager feature of Firefox can help. Come up with a random string of characters and let Firefox remember you for it. This works great… as long as you have Weave, or if you never need to log into the site from a different computer.

And the problem’s getting worse, because these days almost every new site you come across thinks it’s important enough to ask you to create a password. Meanwhile, phishing attempts are getting more sophisticated. These are some of the reasons Mozilla is starting to explore identity management in the browser.

It would help if we knew how much password recycling is actually going on. How many different passwords does the average user use? How many times do they recycle each password? Do they have a throwaway password that they use on lots of unimportant sites, while making unique secure password for their bank?

That’s where Test Pilot comes in.

Pie chart of duplicate password use

The above pie chart, generated by Test Pilot, shows a breakdown of the passwords that I have saved in the Firefox password manager. I was running it on a throwaway profile, so it only has five sites with stored passwords. (If it was my real profile, it would have dozens.)

We should be rolling this study out sometime this week. Of course, the study will not be collecting the actual passwords themselves! Instead, it compares passwords on the client side, so they never leave your machine, and only the count of duplicate passwords gets sent across the network to the Test Pilot server.

I’ll post again when we have some findings to share from this study.

We’re trying something new with the “Week in the Life” Test Pilot study. Instead of running just once per user, it will automatically recur about once every two months (60 days, to be precise) and run for one week each time. The idea is to let it recur over the course of a year, and see whether we can detect any long-term trends in the data that would indicate user habits changing over time. For instance, a lot of us who work in web browsers have a hunch that our users are using bookmarks less, and tabs more, compared to a few years ago. But does the data actually support this?

Because the “Week in the Life” study recurs, and because we never submit any data without the user’s explicit permission, we’ve got a potential user experience problem: Test Pilot is going to ask you whether you want to submit the data from the study every time it recurs. Do you want to submit the data? Do you want to submit the data? How about now? How about now, huh? How about that data, do you want to submit it?

Isn’t it annoying being asked the same question over and over again?

To mitigate this problem, I’ve added a new UI widget to the Test Pilot status page:

Menu with three choices: Ask me whether I want to submit my data; Always submit my data, and don't ask me about it; never submit my data, and don't ask me about it

The principle of “Don’t pester the user” is important, but so is the principle of “Make sure you have the user’s permission before doing anything with their data”. These principles are natural enemies. Finding a compromise between them is not easy! I know that my little drop-down menu is not a perfect solution. What do you think? Is it self-explanatory enough? Too wordy? Is there a better approach to this problem?

We’ve had over 5,000 users submit data from the Test Pilot tabs study!

Considering that people had to first hear about Test Pilot, then opt in by installing the extension, then opt in again by choosing to submit the data, 5,000 is a really good number. Better than we had any right to expect, certainly.

Over the past couple weeks, I’ve been sifting and analyzing the data, and working with Blake Cutler from the Mozilla Metrics team to generate graphs of interesting statistics about tab usage. I’ve just put up a results page showcasing several of these graphs.

We’ve also posted samples of the aggregated data which are free for anyone to download and use. There was some discussion on my previous post about how to aggregate the data in a way that was still useful to researchers. What we ended up doing was building files that include row-level data from a random subsample of the users that fit particular criteria. It’s stripped of any information on the language/locale, operating system, or installed extensions for any individual user in the sample.

Third-party researchers have already begun using the data to do their own analysis! Andy at Surfmind.com has a post containing some very cool-looking visualizations and has proposed an interesting theory about there being two classes of heavy tab users.

I get a lot of people contacting me by email, IRC, forums, or blog comments to say that they’re “worried that if I join Test Pilot I’ll skew the data” because “I’m sure that my tab usage is atypical”.

People! Don’t worry about being an atypical user!

First of all, we have already had almost 5,000 Test Pilot data submissions. One outlier isn’t going to do much to “skew” a data set of that size.

But more importantly, you shouldn’t assume that you’re abnormal. We don’t know what “normal” tab usage is! That’s why we’re doing this experiment, to find that out. If we started out with an idea of what normal tab usage looked like, and threw out things that didn’t match our preconceived notions, that would be a clear case of experimenter bias. Then we’d really be skewing the data.

For instance, I was surprised to find out that there are users who have over 500 tabs open at a time. Over 500! They’re surely outliers, but they’re not abnormal users &emdash; they’re just users. That number isn’t skewing the data &emdash; it is the data. Thanks to those users’ participation, we now know that having 500 tabs open is something that people do with Firefox, something we might not have known otherwise.

As I said in a previous post, I do believe we have a major oversampling of the power-user / early-adopter demographic in our current Test Pilot user base, and that we need to work on fixing this by reaching out to a wider sample of users. But note that word: wider. Excluding yourself because you think you’re atypical isn’t helpful. If you really want to help our sample — and I’m touched that so many of you do want to help " the best thing you can do is to let your less-techie friends know about Test Pilot.

Since the fall quarter of last year, HCI student and Ubiquity community contributor Zac Lym has been doing good work testing Ubiquity on new users. Although the study included only a small number of users, it’s an important one since it’s the first and only rigorously derived usability data we have on Ubiquity. The users in this study had no prior exposure to or preconceptions about the Ubiquity interface, and the experimenter gave them no help using it; so the problems and frustrations they encounter give us insight about the specific areas where Ubiquity needs improvement in discoverability and learnability.

The primary material resulting from the research is a series of videos of users in action. Zac has made these videos available on the Mozilla wiki along with write-ups of his methodology and his findings. These findings are well worth reading in full for anyone interested in improving the Ubiquity user experience.