30
Sep 09

Getting to Know the TestPilot Tab Usage Data

The Mozilla Labs TestPilot project has just released it’s first round of data on tab usage. Getting started with a dataset begins with exploration, confirming basic hypotheses before getting fancy. Here’s an exploratory look at the “vanilla 30″ Test Pilot dataset with color coding on average tabs across a day done with GGobi.

high res version

Moving through this we can observe:

  • **A long tail of average tabs, with an inflection around 20 tabs. 75% of users average under 8 tabs, the 50% mark is at 4.1 tabs. Note the color coding is explained by this first cell.
  • **In the top right cell, we see a bimodal distribution, in which most users with high maxTabs have high average tabs, but a subset of “tab bingers” who more occasionally use large numbers of tabs. More below…
  • **One puzzling aspect is the presence of 22 distinct values for day. Though the timespan of the study is one week, apparently the start and end times are not the same.

This dataset is pretty large, with 7749 points analyzed by day by user. I imported the data into mySQL and am using a hybrid of SQL and R to conduct exploration (see the code).

I moved to a more granular analysis of (the first 25,000 rows) of hourly averageTabs and maxTabs.Here we see the bimodal distribution more strongly. Some minority of users break with the general relationship between average tabs and max tabs. These are the “tab bingers” or alternatively speaking, the clean uppers, who go from few tabs to many and back to a few. Futher analyses will have to be done to identify the pattern here.

In fact, looking at the speed of change of open tabs for the 50% of heavy users with more than 4 average tabs open is one of the more intriguing opportunities discovered so far (see the # tabs per navigation action by Dubroy). This is possibly just an artifact of using average, as MySQL doesn’t have a median function, but it does suggest there may be spiky versus constant tab “junkies”.

My first tweet on this dataset said that 50% of users never go beyond 13 tabs. In fact, the number looks lower than that in this 50% subsample. Seven +-2 likely holds for number of tabs open for most users. There are two interesting exceptions to this rule, called out in the “projection pursuit” ggobi video below:

In the video, spiky tabbers are in grey and tab addicts are plotted with hollow squares:

It’s challenging to derive meaningful and concise conclusions from complex datasets like this one. To that end, I’m attempting with this analysis, like I did with the Places Stats project, to share my analysis code for open source collaboration. I’ve even done a video on using R + GGobi.

This first round of analysis barely touches the surface of the interesting aspects to this data set and suggests two tentative areas for further inquiry: Average tab users currently only use a handful of tabs, can we make life easier on them? Two types of users venture into the >10 tab world, habitual tabbers and spiky tabbers (e.g. addicts and bingers in my unPC terminology) — why?

Further work with this dataset is going to require more sequential analyses, walking the data to generate more granular metrics on growth and constriction of # of tabs, as well as looking at spawning methods and the role of windows. That doesn’t even begin to get into the folks who optimize tabs by using extensions! Avoiding averages is also critical in general and to connect to Dubroy’s thesis study. Does MySQL do UDFs?


by andyed | About the author:

Related Posts

  • No related posts found.

Comments


Posted on Wednesday, September 30th, 2009 at 9:51 pm and is filed under Blogging, HCI, Mozilla, Visualization, datamining. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
8 Comments so far

  1. 1 anon on September 30, 2009

    Someone didn’t get the open video memo.

  2. 2 Alexander Limi on October 1, 2009

    This is very cool stuff, and definitely helps with getting a new angle on the data we have so far. :)

    One interesting thing that is hard to read from the data is whether the people that only use a handful of tabs do so because they feel they lose control with more, and therefore tend to “clean up”.

    Often, when there is a ceiling in this type of data, there’s a limitation in the UI that people fall within the limits of.

    Then again, 7±2 is a pretty well-established guideline, so I’m not surprised that a lot of data is within this range.

    Nice work!

    — Alexander Limi, Firefox User Experience

  3. 3 Daniel Einspanjer on October 1, 2009

    Regarding your question about medians in MySQL:

    A friend of mine has written a few blog posts about doing percentiles and medians in MySQL:
    http://rpbouman.blogspot.com/2008/07/fast-single-pass-method-to-calculate.html
    http://rpbouman.blogspot.com/2009/09/mysql-another-ranking-trick.html

    hth

  4. 4 Test Pilot Results! « Not The User’s Fault on October 1, 2009

    [...] researchers have already begun using the data to do their own analysis! Andy at Surfmind.com has a post containing some very cool-looking visualizations and has proposed an interesting theory about there being two classes of heavy tab users. [...]

  5. 5 Scott Fitchet on October 1, 2009

    Also, I use a mixture of new tabs and new windows (which depends on which machine I’m on). Haven’t thought about why yet.

  6. 6 Mozilla Labs » Test Pilot » Blog Archive » The first Test Pilot data visualizations on October 1, 2009

    [...] have already begun using the aggregated data samples that we published to do their own analysis. Andy Edmonds at Surfmind.com has created some very cool-looking visualizations. He looks at the relationship between average [...]

  7. 7 How Math Teachers Can Help Improve the Web « Shiny Pebbles… on October 6, 2009

    [...] Mozilla takes openness one step further. Unlike data collected by that other browser company, the data collected by all of the Test Pilot plug-ins on the planet is freely available for download. This means that if you’re a Math or Statistics teacher, you can build lessons around Test Pilot data-sets from the real world that your students helped create by having installed the plug-in. …or you can just look at the interesting ways that others interpret the data-sets. [...]

  8. 8 Orienting the TestPilot Tab Data | Surf*Mind*Musings on October 18, 2009

    [...]       « Getting to Know the TestPilot Tab Usage Data 18 Oct [...]

Name

Email

Website

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Share your wisdom