Test Pilot is a delicate balancing act.
On one side is user privacy. I can’t overstate how committed we are to protecting the privacy and anonymity of our users, keeping them fully informed, and getting their consent before sending any data about them back to Mozilla. We practice full disclosure, collect nothing without an explicit opt-in, and let you review exactly what will be uploaded, in human-readable form, before you upload it. We never associate any of your uploaded data with your name, email address, or any other personal data about you. We’ve been significantly more conservative about data collection than most other organizations that do this kind of research.
On the other side are the needs of scientific research. We want to collect the most detailed and accurate data possible and share it with the researcher and UI designer communities, in order to try to improve the state of not just the Firefox interface but of web interface design in general. There are certain burning usability questions which would be much easier to answer if we were to collect certain kinds of personal information about our users.
For example, one of the questions that the Mozilla UX department is curious about is this: how often do people have duplicate tabs open? There’s anecdotal evidence that some web users end up opening so many tabs that they forget what they already have open. Instead of switching back to their Gmail tab, they might just open up a new Gmail tab. If this is widespread, it’s an indication of one way that the Firefox tabbed browsing interface is failing to meet users’ needs. Perhaps we ought to have some feature that detects when you’re about to open a duplicate tab, and redirects you to your already-open copy of the tab instead.
Do you know what would be a really easy way to do this research? We could include code in the (currently ongoing) Test Pilot tabs experiment that records the URL of each opened tab. We could look through that data on the server side and automatically identify duplicate tabs.
But whoa, whoa, whoa, hold on there! The URLs that somebody chooses to visit with their web browser are just about the most personal thing about them. It’s one of the most useful things for UI research, but it’s also one of the most private. I know I have URLs in my browser history that I would think twice about uploading to Mozilla, even if they were not associated with my name in any way.
I think that collecting every URL the user visits is out of the question. We made a decision that the current Tabs experiment would never collect URLs, and we repeat this statement over and over throughout our documentation in order to reassure people that we’re not trying to spy on them.
But that doesn’t mean we can’t detect duplicate tabs. We just need to be a little more clever about it. It’s the duplication that we care about, not the URLs, so let’s detect duplicates on the client side. When we send up the data, we’ll include a field that says whether or not an opened tab is a duplicate, but we won’t include the URLs.
This points the way to a general guiding principle: Whenever there’s sensitive data, process it on the client side, and upload only non-sensitive calculations derived from the data. By following this principle, I think we can resolve many, if not most, of the conflicts between privacy and the quest for knowledge.
Here’s another example: Many users install extensions that make their tabs work differently. Addons.mozilla.com hosts over three hundred extensions that affect how tabs behave.
Therefore, we can’t just throw all the tab usage data into one big bucket and then start doing statistics on it. We’d reach incorrect conclusions because that data would be from a mix of Firefox-default tab behavior and customized tab behavior. We’ll have much better results if we separate user data according to what extensions they have installed.
(Even better, we can draw comparisons between the Firefox-default tab users and the users with Extension X. What if we find that on average the users of Extension X make fewer tab-related errors, or spend less time interacting with their tabs, than vanilla Firefox users? That would be very interesting, eh?)
So we’d like to see what extensions each user has installed. But again, there is a privacy problem. There are some extensions you might not want us knowing that you have (even though it is, again, not associated with your name or email address or anything else). For example, what if you’re running an extension which is not public knowledge? You could be developing something for your organization which is proprietary, or under NDA, and you don’t want the world to know it exists yet.
Her’e what we do: We take the ID of each extension that you have and put it through a one-way hash function. We upload these hashes. They’re not readable, but they are vulnerable to guess-and-check. So if we’re looking for a known extension, like say Tabs Mix Plus, we can put the ID of Tabs Mix Plus through the same hash function and then check for the presence of a matching code. In this way we can divide data based on a limited set of known extensions; but your top-secret in-development extension that is not yet public knowledge is nothing but an opaque hash string to us.
The final question has to do with how we make the data avaiable to the research and UI design community. As I mentioned in a previous post, this is an important part of the plan for Test Pilot. What we’ve said is that we will make the data available, but only in an "anonymous, aggregated" form, one that does not contain information about any individual user from any study.
In other words, when we publish data, we want to be even stricter about it than we are with the data we collect. We want to publish only the numbers that describe the Test Pilot user group as a whole, without including any numbers that describe any one particular Test Pilot user.
What exactly that means, we are still figuring out. What will this anonymous, aggregated form be? How will we generate it, how will we make it available, and how can we make sure it’s still useful for researchers (who may be working on questions that we haven’t even thought about yet)?
As always, I would love to hear your thoughts on these matters.
September 12, 2009 at 6:14 am
I believe your approach to UX research strikes a good balance and am heartened by your responsible use of private data.
I feel the need to clarify one point, but I’m sure you already realise this… A user who has opened duplicate tabs may not represent a tab-related error. Personally, I regularly open duplicate tabs for one reason or another. For example, I may want to perform a quick visual comparison of two distinct renders of the same page (to compare changes due to randomised content or time).
Keep up the good work!
September 12, 2009 at 7:22 am
[...] most them backwards to Mozilla. We training flooded disclosure, … See more here: Collecting usability data, without crossover the distinction into intelligence … Posted in Uncategorized | Tags: before-sending, consent, fully-informed, mozilla-, our-users, [...]
September 12, 2009 at 10:48 am
My thoughts? Your post was so reassuring I signed up for the Test Pilot program
I’d like to encourage people with disabilities to sign up but I’m concerned about skewing your data pool – should I worry?
September 12, 2009 at 11:03 am
Here, here on the requirement to keep URLs private and on the discretion you describe for addon identities.
That said, not sharing row level data with researchers reduces the ability for the external community to contribute dramatically.
It also reduces motivation as typically row level data is required to do the statistical validation required for scientific publication.
I applaud your approach to the problem, and even the notion that privacy filters should be stronger for external distribution, but a blanket restriction on individual data makes Test Pilot not very interesting for research in many situations.
September 13, 2009 at 1:07 am
[...] Collecting usability data, without crossing the line into spying « Not The User’s Fault (tags: data privacy ui) Possibly related posts: (automatically generated)test 03/31/2008Death of a Feature: The Impending Demise of TagsMy daily readings 03/31/2008 [...]
September 14, 2009 at 5:43 am
While I agree that you’ve taken the right approach when it comes to protecting privacy, there are some use cases where you might be missing useful data. Let me give you a case in point. The biggest tab-duplication problem I have is goosh.org. When I find I have dozens of tabs open, and two-thirds of them are duplicates, it’s usually that half of them are goosh tabs. Without knowing the url, it may not be possible to develop an optimal strategy for dealing with that situation. What I would really like Firefox to be able to do is show me a list of the last search term for each of the open goosh tabs when I attempt to open a new one. While I don’t want to open twenty different goosh instances, I may want three or four as I work through a complex research problem. Perhaps the same one-way encryption solution you use for extensions makes sense for some URLs as well. …maybe coupled with an “it’s ok to relax privacy restrictions for this session” checkbox I can check when I’m doing things that I’d like to make sure get into the data collection hopper.
September 17, 2009 at 10:10 pm
Hi Ricky,
My point of view is that the data pool would be skewed if it didn’t include people with disabilities. They’re part of the web user base, so they should be part of any representative study. We can always include metadata in the upload that tells whether or not the user has certain accessibility features turned on, which would enable us to divide people up into different user groups for the sake of scientific methodology. So yes, please do encourage everyone to join!
Thanks!
September 17, 2009 at 10:14 pm
Hi Andy,
Thanks for your feedback. That is a very good point that you raise. After considering it, I think what we need to do is to design a query API that allows researchers access to the row-level data without singling out a particular user. If such a thing is possible. Which I think it is. I’ll do another blog post about that, but I need to do a little more design work first.
September 18, 2009 at 3:19 am
Something I thought of today … I have no idea of this is desirable or feasible etc but letting a user annotate data is something that I’d like.
At the moment, for example, I apparently have 7 tabs open all in the same window and with the same URL (and nothing else in that window). If I could put in an annotation saying the duplicate tabs are a stupid Flash site and these tabs are actually not duplicates, I’d feel like I was giving you better data.
September 24, 2009 at 12:45 pm
[...] Jono at Mozilla Labs discusses the balance between collecting usability data and spying on users. [...]
July 29, 2011 at 12:11 pm
[...] Jono at Mozilla Labs discusses the balance between collecting usability data and spying on users. [...]