The Mozilla Labs TestPilot project has just released it’s first round of data on tab usage. Getting started with a dataset begins with exploration, confirming basic hypotheses before getting fancy. Here’s an exploratory look at the “vanilla 30″ Test Pilot dataset with color coding on average tabs across a day done with GGobi.
high res version
Moving through this we can observe:
- **A long tail of average tabs, with an inflection around 20 tabs. 75% of users average under 8 tabs, the 50% mark is at 4.1 tabs. Note the color coding is explained by this first cell.
- **In the top right cell, we see a bimodal distribution, in which most users with high maxTabs have high average tabs, but a subset of “tab bingers” who more occasionally use large numbers of tabs. More below…
- **One puzzling aspect is the presence of 22 distinct values for day. Though the timespan of the study is one week, apparently the start and end times are not the same.
This dataset is pretty large, with 7749 points analyzed by day by user. I imported the data into mySQL and am using a hybrid of SQL and R to conduct exploration (see the code).
I moved to a more granular analysis of (the first 25,000 rows) of hourly averageTabs and maxTabs.Here we see the bimodal distribution more strongly. Some minority of users break with the general relationship between average tabs and max tabs. These are the “tab bingers” or alternatively speaking, the clean uppers, who go from few tabs to many and back to a few. Futher analyses will have to be done to identify the pattern here.
In fact, looking at the speed of change of open tabs for the 50% of heavy users with more than 4 average tabs open is one of the more intriguing opportunities discovered so far (see the # tabs per navigation action by Dubroy). This is possibly just an artifact of using average, as MySQL doesn’t have a median function, but it does suggest there may be spiky versus constant tab “junkies”.
My first tweet on this dataset said that 50% of users never go beyond 13 tabs. In fact, the number looks lower than that in this 50% subsample. Seven +-2 likely holds for number of tabs open for most users. There are two interesting exceptions to this rule, called out in the “projection pursuit” ggobi video below:
In the video, spiky tabbers are in grey and tab addicts are plotted with hollow squares:
It’s challenging to derive meaningful and concise conclusions from complex datasets like this one. To that end, I’m attempting with this analysis, like I did with the Places Stats project, to share my analysis code for open source collaboration. I’ve even done a video on using R + GGobi.
This first round of analysis barely touches the surface of the interesting aspects to this data set and suggests two tentative areas for further inquiry: Average tab users currently only use a handful of tabs, can we make life easier on them? Two types of users venture into the >10 tab world, habitual tabbers and spiky tabbers (e.g. addicts and bingers in my unPC terminology) — why?
Further work with this dataset is going to require more sequential analyses, walking the data to generate more granular metrics on growth and constriction of # of tabs, as well as looking at spawning methods and the role of windows. That doesn’t even begin to get into the folks who optimize tabs by using extensions! Avoiding averages is also critical in general and to connect to Dubroy’s thesis study. Does MySQL do UDFs?