The results of the Test Pilot Mobile Heatmap study are now up in a post at the User Research blog. Read it and see how different browsing patterns are on mobile compared to desktop Firefox!

A user on the Test Pilot discussion forum wrote:

“I didn’t join the big brother “test-pilot” but had to comment on how invasive this research will really be to the average user. Everything you type, always, whether sent or saved, will be saved on a database. If you have a webcam and you agreed to the test pilot thing, mozilla can use your camera to see what is going on. It sounds absurd, but it
is simple for a computer to identify certain things and bring the more important images to the front, however, your actions will be recorded whether they are flagged as a hazard or not. just put a bit of electrical tape over the lens when you aren’t using the camera.

I hate being watched….Peace”

It would be easy to dismiss this person as paranoid. (If we wanted to spy on you, why would we go to the trouble of announcing a data collection program and inviting people to voluntarily join it?) but actually, he is absolutely right to be concerned about his privacy and absolutely right to be skeptical of the motives of the organizations writing his software.

No, Test Pilot does not collect video data or connect to your webcam in any way. (I don’t even know how I would connect a Test Pilot study to a webcam! The user is accusing me of being a much better programmer than I actually am.) It does not record any words that you type, either. Certain studies have recorded certain very specific keystrokes, in order to tell whether a user is using keyboard shortcuts for Firefox menu items, for example, or whether they’re using the Enter key in the URL bar. We published a privacy policy and we have stuck to the rules of that privacy policy. And each study gives you, before you agree to upload anything, the chance to review the collected data for yourself.

The user who posted the above message has already decided that he doesn’t trust us, so I doubt I can convince him that we’ve stuck to our privacy policy.

But, as Levar Burton used to say on Reading Rainbow, you don’t have to take my word for it.

All of the Test Pilot studies are open-source, as is Firefox itself, as is the Test Pilot extension that bridges the two. Anyone who likes can examine the source code for themselves.

Now not every user will have the time, inclination, or ability to read through that source code. But not every user has to; all it takes is one whistleblower to look at the source code and tell everybody “hey, it looks like this study here is doing something fishy”.

So here’s the link to the source code of every single Test Pilot study. And here’s the link to the source code of the Test Pilot extension that runs them.

Go ahead! Read through that code. Look for the functions that secretly turn on your webcam and log the words you type. (There aren’t any.) Flag anything that looks fishy, wrong, or that looks like it might go outside the privacy policy. Please! We’ve got nothing to hide. I would welcome that sort of code review. I would consider it a personal favor from you to me to help us improve the quality and security of Test Pilot code.

This is an underappreciated benefit of open source. With closed-source software, you have to take a company’s word that they aren’t doing anything fishy with your machine. An open source project can’t hide that kind of thing from its users. Sharing the code keeps us honest.

Jeremy Singer-Vine at Slate magazine has posted his own analysis based on Test Pilot’s Week-in-the-Life dataset!

He looked at tab use, and while his general conclusions are similar to what we found from the earlier tab study, his chart of tab use by demographics shows something interesting I’ve never seen before:

Jeremy rightly points out that the data behind this graph may not be the most reliable, due to the severe undersampling of women in the Test Pilot user base, as well as the fact that most Test Pilot users who submitted data did not fill out the optional demographic survey. It should be taken with a large grain of salt. Still, by suggesting that there might be significant differences in tab use patterns by age and sex, it points to what might be an interesting area for future experimentation.

This might be a good time to remind you that the deadline of the Open Data Competition is December 17. So there’s still time to do a visualization of your own and enter the contest!

Our first Open Data Competition is now, well, open! The goal of the competition is to produce the coolest and most informative visualization* using two new Test Pilot datasets that we’ve just published: the results of the Week in the Life study v2, and the Firefox 4 Beta Interface study v2. The deadline for submissions is December 17. Find out more about the datasets and how to enter on the contest website.

* – Not limited to “visualization”, actually; we’ve already had one person ask us about turning the data into sound, which we think is totally cool. Not sure there’s a word in English that captures “visualization” as well as analysis based on other senses.

I can always tell whether someone understands statistical research or not by describing Test Pilot to them. If their very first question is “What about the self-selection bias?” then they understand statistical research!

Self-selection bias is the bias that creeps into your results when the subjects of your study are people who choose to be subjects. It’s a bias because a group of people who choose to be subjects is not the same as a random sample of the population.

Amazon screenshot showing only five-star and one-star reviews

Think of product reviews on Amazon: they’re mostly 5-star reviews or 1-star reviews, because only the people who really love something or really hate it are motivated enough to write a review! If you randomly sampled the people who had read a given book, you might find that the majority of them found it mediocre – but indifferent people don’t take the time to write reviews. The self-selection of the reviewers skews the average rating.

This is the same reason why dialing random telephone numbers gives you better poll results than setting up a poll on a website – the telephone polling is (closer to) a random sample, while people who answer the website poll are self-selecting.

The relevance to Test Pilot is obvious. Only people who chose to install the Test Pilot extension or the Firefox 4 Beta get the studies; and only people who click “submit” send results back to Mozilla.

Therefore, it would be a mistake to rely only on Test Pilot submissions when redesigning Firefox UI. We’d be over-tailoring the UI to a particular subset of users, while potentially making it worse for the “silent majority” of users not represented in the sample.

Just how skewed is our sample, anyway?


The results of a survey in March 2010 (which was taken only by users of the Test Pilot extension) gave us a portrait of users who were:

  • More likely to be Linux users…
  • More likely to self-describe as “Tech-savvy”…
  • More likely to be in the 18-35 age range…
  • Much more likely to be male…
  • Likely to spend 4-8 hours a day using the Web…
  • More likely to be using Chrome in addition to Firefox…
  • Much more likely to have been using Firefox for 4 or more years…

(more…)

In light of the idea that openness is hard work and that improving the discovery path for new contributors is a constant struggle, I’d like to do something to help make Test Pilot a little more discoverable, and possibly gain some useful feedback as well.

Why not publicly post the code for upcoming Test Pilot studies? The Mozilla community can check it out and verify for themselves that we’re upholding our promise not to collect any sensitive or personally identifiable data. They can also point out anything that’s potentially wrong with the code, things that we’re overlooking that might make the collected data less accurate, etc. That sort of feedback would be very useful for us developers.

Of course, it’s not like we’ve been keeping the Test Pilot study code secret up until now, or anything. They’ve always been, and will always be, available through our public Mercurial repository. (Look in the /testcases/ directory). But that’s not exactly easily discoverable.

So, I’m going to start announcing upcoming studies on this blog and linking directly to the code, to make it easier for interested parties to offer feedback.

The next study we’ll be releasing is called the Firefox 4 Beta Interface Study v2. It’s a revised version of the earlier Beta Interface study that we ran near the beginning of the Firefox 4 Beta program. It’s been updated to include instrumentation of the new features in the beta, such as Sync, Panorama, and App Tabs; we’d really like to know how many people are using these new features, and how they’re using them – e.g., how often do people sync? If they use app tabs, how many app tabs do they have? If they use Panorama, how many tab groups do they use? And so on. We’ll also continue tracking the same things that the first version tracked – frequency of use of menu items and toolbar buttons. That way, we can look at how people are using the toolbars and menus now,
compared to how they used them earlier on in the beta cycle (summed up in this heatmap), and see if there are any trends there.

Of course, as always, we’re not collecting any URLs, search terms, names of sites, or names of bookmarks. You don’t have to take my word for it; take a look at the code and see for yourself!

You can read the latest Firefox 4 Beta Interface Study code on the web via the Mercurial repository.

Or you can take a look at the Bugzilla tracking bug for the study; the code is attached to the bug.

Finally, the code for the study is also mirrored on GitHub, so feel free to look at it there if you prefer to use GitHub’s code review tools.

So pick your method and take a look at the code; comments, questions, and criticism welcome.

Over the last couple of weeks I’ve been participating in the beta test of upcoming game Starcraft II. In an interview with Gamasutra magazine, the lead Starcraft II designer, Dustin Browder, had this to say about the data they have collected from the beta test:

The danger with a lot of this data is that you have to be very careful how you use it. With unit stats, I can tell you that, for example, in a Protoss versus Terran game, 12 percent of the time the Protoss build carriers. And when they build carriers, they win 70 percent of the time. You could say, “That must mean carriers are overpowered!”

That’s not really true, though. It could just be that as you get towards the end of the game, if the Protoss have the extra resources to waste on a bunch of carriers, they’re probably going to win anyway.

Of course, it doesn’t mean the carriers aren’t overpowered either. That stat alone actually tells you nothing. It’s a very dangerous stat. If you listen to that stat, you can make all kinds of mistakes

If we look at the stats and we say, “This doesn’t actually back anything we’re experiencing online,” I’m very suspicious of that number. We get information from a lot of different sources, and then we use the other sources to refute or corroborate. We look at another source and say, “You know what? What they’re saying online matches my play experience, and it matches the stats. This seems real. Let’s talk about what some possible fixes can be.”

So very, very relevant to Test Pilot and Firefox!

A couple of months ago, the Test Pilot team sat down with six volunteer users, one at a time, and asked them to go through the steps of installing Test Pilot and submitting test results.

(Yes, that’s right: we were testing Test Pilot — feel free to make infinite recursion jokes.)

What we found was extremely valuable. The same problems happened again and again. Six users may not seem like enough to give you useful information, but believe me, after you’ve seen the fourth or fifth user in a row trip over the exact same usability problem, you’ll have a pretty good idea of how high a priority it is. (Statistics rule of thumb: if you have a problem that affects 1/3 of users, then you only need to interview 5 users to have an 85% chance of seeing it).

(more…)

Alex Faaborg writes about how the Test Pilot menu item study is influencing the redesign of the Firefox menu bar.

Nothing much to add right now except to say that it’s very satisfying to see Test Pilot having a concrete impact, and that I’m proud to have been part of this work!

Tomorrow’s Design Lunch will be about the results from the Test Pilot study on menu item usage. Jinghua, Blake and I will present what we’ve found out so far about what menu items are most commonly used (and how this breaks down by operating system and by mouse-clicking vs. keyboard-shortcuts). We’ll have a brainstorming session about what this data might mean for future redesigns of the Firefox menu bar, and try to come up with questions for further investigation.

We’ll also present some findings about the demographics of the Test Pilot user base.

The design lunch is Thursday March 4, 12:30pm – 1:30pm PST. The details of how to watch or participate remotely are on the Design Lunch wiki page.