Internet Politics

« Ch 13: Futurology | Home | Deleted Scenes »

Archive for the Data Category

Data & Resources

AOL accidentally releases data - provided by Google search

Posted on Tue, Aug 08, 2006 at 12:52 PM by Andrew Chadwick

A couple of weeks ago the AOL research department released this dataset:

"500k User Queries Sampled Over 3 Months. This collection consists of ~20M web queries collected from ~500k users over three months. Where the data is sorted by ananomized user id... The goal of this collection is to provide a real query log based on users. It could be used for personalization, query reformulation or other type of search research."

It was made available as a free download for non-commercial use only. It was quickly withdrawn but is widely available as a bittorrent download.

The data are reasonably anonymous. I say 'reasonably' because there are no strict personal identifiers in these data, but personal details like social security numbers, phone numbers, addresses and so on do feature in search requests. And there are plenty of those in here. The data are also uncensored.

This is going to send huge ripples through the regulatory debate, not least because AOL's search technology is provided by Google, the globe's number one search engine. These are a very good guide to the kind of search queries that run through Google. And Google has kept very tight wraps on this kind of thing in the past.

As an academic I'm torn: these data would provide a wonderful snapshot of search activity. But is it ethical to use them if the users have not consented? They were designed for an academic audience, but are these 'public' data? They will undoubtedly be publicly available for many years to come. It's also highly likely that both law enforcement and market research companies will be working with them already. Eszter Hargittai, an expert on the sociology of search, points out some of the problems.

Archive for the Data Category

Data & Resources

AOL accidentally releases data - provided by Google search

Some Internet Politics Related Links

Navigation

Categories

Archives

Feed