15
Jun 09

Understanding Bookmarks & Browsing with Places Stats

The Firefox Places team created an opt-in method for users to submit a quantitative profile of their places (history + bookmarks + tags) database (sqlite) by pasting a script into the error console. Drew (adw) did an analysis a few months back and recently gave “last call” for the first round.

I’ve been doing some exploratory data analyses on the data to data, now over 600 submissions. Some top level observations:

  • Distinct visits, not total visits, is the best predictor of bookmarking activity
  • 30% of the sample use tags extensively while 50% have at least 1 tag (Note: Amended)

The sample is far from a random one and is probably skewed to power users both of the internet and of firefox. 30% tagging utilization is likely an upper bound of what we’d find in a more representative sample.

The rise of AJAX has made the “number of unique pages visited” much harder to compute. Computing this from the a recent pull from the places stats dataset (n=594), we see a 38% revisit rate, or 62% new visit rate as coded below:
> describe(places$percent_visits_new )

n missing unique Mean .05 .10 .25 .50 .75 .90 .95
594 6 589 0.6224 0.3913 0.4726 0.5485 0.6275 0.7064 0.7779 0.8182

This isn’t actually far from the prior research, quoting from an earlier post:

A WWW’06 study by Harald Weinrich, et al. paper updated these stats, showing across studies 61% of pages visited were repeats in ‘94, 58% in ‘96, and only 45.6% in their 2006 study.

Similarly, pages per day are likely confounded by background requests:

n missing unique Mean .05 .10 .25 .50 .75 .90 .95
592 6 591 523 51.6 90.2 167.5 287.6 522.4 832.6 1579.8

A key motivator for the places team’s efforts to collect these stats was to create test profiles. Additionally, this type of sampling of the accumulation of data in Places should provide insights into nature of the problem of organizing & recalling web resources. We’re gearing up to do cluster analysis on the dataset to identify profile candidates.

I’m using R (stats) & GGobi (visualization) and versioning analysis code on the Mozilla wiki. The goal is repeatable analysis, both across users and as new data is accumulated, as well as insuring the process is transparent for user confidence as well as for methodogical feedback.

More conclusions are coming, for now, we also have lots of pretty pictures via GGobi’s scatterplot matrix. Thanks to Ed Borasky for R & GGobi tutelage while I was in Portland. I’ve published an intro to R & GGobi screencast.


by andyed | About the author:

Related Posts


Comments


Posted on Monday, June 15th, 2009 at 4:51 am and is filed under Academic, HCI, Mozilla, Visualization. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
2 Comments so far

  1. 1 Places Stats - Analysis in the Open < Blog of Metrics on June 16, 2009

    [...] was excited to see Andy Edmonds’ post yesterday about Places Stats. It’s not every day that we see someone in the Mozilla [...]

  2. 2 Getting to Know the TestPilot Tab Usage Data | Surf*Mind*Musings on September 30, 2009

    [...] datasets like this one. To that end, I’m attempting with this analysis, like I did with the Places Stats project, to share my analysis code for open source collaboration. This first round of analysis barely [...]

Name

Email

Website

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Share your wisdom