The results of the Test Pilot Mobile Heatmap study are now up in a post at the User Research blog. Read it and see how different browsing patterns are on mobile compared to desktop Firefox!

Jeremy Singer-Vine at Slate magazine has posted his own analysis based on Test Pilot’s Week-in-the-Life dataset!

He looked at tab use, and while his general conclusions are similar to what we found from the earlier tab study, his chart of tab use by demographics shows something interesting I’ve never seen before:

Jeremy rightly points out that the data behind this graph may not be the most reliable, due to the severe undersampling of women in the Test Pilot user base, as well as the fact that most Test Pilot users who submitted data did not fill out the optional demographic survey. It should be taken with a large grain of salt. Still, by suggesting that there might be significant differences in tab use patterns by age and sex, it points to what might be an interesting area for future experimentation.

This might be a good time to remind you that the deadline of the Open Data Competition is December 17. So there’s still time to do a visualization of your own and enter the contest!

Our first Open Data Competition is now, well, open! The goal of the competition is to produce the coolest and most informative visualization* using two new Test Pilot datasets that we’ve just published: the results of the Week in the Life study v2, and the Firefox 4 Beta Interface study v2. The deadline for submissions is December 17. Find out more about the datasets and how to enter on the contest website.

* – Not limited to “visualization”, actually; we’ve already had one person ask us about turning the data into sound, which we think is totally cool. Not sure there’s a word in English that captures “visualization” as well as analysis based on other senses.

I can always tell whether someone understands statistical research or not by describing Test Pilot to them. If their very first question is “What about the self-selection bias?” then they understand statistical research!

Self-selection bias is the bias that creeps into your results when the subjects of your study are people who choose to be subjects. It’s a bias because a group of people who choose to be subjects is not the same as a random sample of the population.

Amazon screenshot showing only five-star and one-star reviews

Think of product reviews on Amazon: they’re mostly 5-star reviews or 1-star reviews, because only the people who really love something or really hate it are motivated enough to write a review! If you randomly sampled the people who had read a given book, you might find that the majority of them found it mediocre – but indifferent people don’t take the time to write reviews. The self-selection of the reviewers skews the average rating.

This is the same reason why dialing random telephone numbers gives you better poll results than setting up a poll on a website – the telephone polling is (closer to) a random sample, while people who answer the website poll are self-selecting.

The relevance to Test Pilot is obvious. Only people who chose to install the Test Pilot extension or the Firefox 4 Beta get the studies; and only people who click “submit” send results back to Mozilla.

Therefore, it would be a mistake to rely only on Test Pilot submissions when redesigning Firefox UI. We’d be over-tailoring the UI to a particular subset of users, while potentially making it worse for the “silent majority” of users not represented in the sample.

Just how skewed is our sample, anyway?


The results of a survey in March 2010 (which was taken only by users of the Test Pilot extension) gave us a portrait of users who were:

  • More likely to be Linux users…
  • More likely to self-describe as “Tech-savvy”…
  • More likely to be in the 18-35 age range…
  • Much more likely to be male…
  • Likely to spend 4-8 hours a day using the Web…
  • More likely to be using Chrome in addition to Firefox…
  • Much more likely to have been using Firefox for 4 or more years…

(more…)

While we were working on Test Pilot studies, Patrick Dubroy was doing his own research on Firefox tab usage patterns. He presented his findings a paper at CHI 2010 last week. Now he’s put up an excellent blog post summarizing what he found out. Go read it right now!

Tomorrow’s Design Lunch will be about the results from the Test Pilot study on menu item usage. Jinghua, Blake and I will present what we’ve found out so far about what menu items are most commonly used (and how this breaks down by operating system and by mouse-clicking vs. keyboard-shortcuts). We’ll have a brainstorming session about what this data might mean for future redesigns of the Firefox menu bar, and try to come up with questions for further investigation.

We’ll also present some findings about the demographics of the Test Pilot user base.

The design lunch is Thursday March 4, 12:30pm – 1:30pm PST. The details of how to watch or participate remotely are on the Design Lunch wiki page.

Blake (who generated all the cool graphs I’ve been using to present Test Pilot results) has used the session data from the Week-in-the-Life study to form an interesting hypothesis: That Firefox crashes per user follow a power-law distribution. If true, the power-law distribution means that “mean crashes” and “typical experience” are two very different things.

I should emphasize that the Week-in-the-Life study was not really designed to look at crashes, so I’d rather do a follow-up study specifically targeting this hypothesis before I support it with any confidence. But, as Blake says:

If our crash data follows a similar distribution, the average crash per user metric tells us little about the experience of a typical Firefox user.

Anecdotal evidence supports this hypothesis. While we all know people who swear by Firefox’s stability, we also know people who complain of frequent failures.

With this in mind, I suggest we use Test Pilot to run a longitudinal study of true Firefox crashes.

Agreed! And it’s great to see more hypotheses coming out of the Test Pilot data!

“An undefined problem has an infinite number of solutions” – Robert A Humphrey

I found that quote on the underside of a tea bottlecap. I think it’s germane to what we’re doing with Test Pilot.

In the months since we started the Test Pilot program, we’ve run three studies, which have collected massive piles of data.

But as the bottlecap warns, those massive piles won’t get us anywhere unless we know what problem we’re trying to solve with them.

There’s a temptation, when looking at a pile of data, to leap to design conclusions.

For example: Suppose studies show that almost nobody is using the profile manager. Great, that means we can get rid of the profile manager! Right?

Or, to use a non-hypothetical example:

Minimum, average, maximum tabs open per session

(Thanks to Blake Cutler for generating all the graphs used in this post).

The most common Firefox session is one with three tabs open? Well hot dog, we should optimize Firefox for the three-tab use case! Right?

Careful, there. This kind of assumption is dangerous. Maybe lots of people would start using the profile manager if they knew it existed. Maybe people would love to have sessions with more tabs open, if the interface was better for managing lots of tabs.

(more…)

A great post from Boriss on using Amazon’s “mechanical turk” process for user research.

Her conclusions (about the European browser ballot) are interesting too, but personally it’s the research method that interests me most. Is it possible to get an accurate picture of your target audience from a group of self-selected responders? The answer to this question is obviously very relevant to Test Pilot.

While you’re at it, you should read Boriss’ posts on improving the Firefox Add-Ons Manager here and here. In fact, if you’re at all interested in Mozilla usability stuff you should be reading her blog regularly.

How often do you recycle passwords? That is, use the same password for multiple sites? Even though you’ve probably been told this is a security no-no, it’s just too much strain on most people’s memory to come up with unique passwords every time.

Theoretically, the password manager feature of Firefox can help. Come up with a random string of characters and let Firefox remember you for it. This works great… as long as you have Weave, or if you never need to log into the site from a different computer.

And the problem’s getting worse, because these days almost every new site you come across thinks it’s important enough to ask you to create a password. Meanwhile, phishing attempts are getting more sophisticated. These are some of the reasons Mozilla is starting to explore identity management in the browser.

It would help if we knew how much password recycling is actually going on. How many different passwords does the average user use? How many times do they recycle each password? Do they have a throwaway password that they use on lots of unimportant sites, while making unique secure password for their bank?

That’s where Test Pilot comes in.

Pie chart of duplicate password use

The above pie chart, generated by Test Pilot, shows a breakdown of the passwords that I have saved in the Firefox password manager. I was running it on a throwaway profile, so it only has five sites with stored passwords. (If it was my real profile, it would have dozens.)

We should be rolling this study out sometime this week. Of course, the study will not be collecting the actual passwords themselves! Instead, it compares passwords on the client side, so they never leave your machine, and only the count of duplicate passwords gets sent across the network to the Test Pilot server.

I’ll post again when we have some findings to share from this study.