Tuesday, November 25, 2008

Verifying that analytics data collection is happening

I just finished adding the analytics tracking javascript to my blog. I wanted to do a quick check that the data collection is happening. To verify, I made use of the excellent tool called HttpWatch Professional. It is a very neat little tool for inspecting and debugging the http traffic from your browser.

A quick gander over to my blog after starting "Record" in the tool immediately verified that the data collection was happening - the proof was the calls to http://analytics.live.com domain to download the tracking javascript, and the calls to http://analytics.r.msn.com/x.gif, which are the requests issued for every event that is tracked on the website.

So far so good, everything working as expected.

Instrumenting my blog with AdCenter Analytics

Step 1. Get an adCenter Analytics account. Check. I already have that.

Step 2. Create a new profile. Easy enough, took all of 10 seconds.

Step 3. Copy the javascript.

Step 4. Figure out how to add this to my blog on blogspot.com. How do I do this? Ok, let me search and see what turns up.

Did a quick search on variations of "how to add adcenter analytics tracking code to your blog" - unfortunately nothing promising turned up, except for an adcentercommunity blog post outlining the major steps as the top result, which was relevant but did not have anything specific to blogspot.

Quick look in adcenter analytics help also revealed nothing specific to blogger.

Since the blog is hosted on blogger, next bet would be to look at blogger help for any insights. Hmm, nothing directly useful, except it did point to Google Analytics.

All right, next step is to search in Google Analytics help. I headed over there and soon enough (searching for blogspot in the help) I found what I needed.

Step 5. Edit the HTML for the blog template and insert the tracking code. Easy!

Now all I have to do is wait for the data to show up in my reports. Let's see how it looks tomorrow.

Wednesday, November 12, 2008

Psephologists and data fitting

Psephologist: That's a real word. It means someone who studies elections and polls. That was the word I was reminded of, when I listened to Weekday on kuow.org this morning as I drove to work. Today's show had two guests, both professors at the University of Washington if I recall correctly talking about the recent US presidential election.

I first came across the word psephologist, back in India in the late eighties, while watching election coverage on TV. There were pundits on TV who were called in to give expert opinion on election polls and trends, and I learnt they are psephologists. Doordarshan had the excellent Vinod Dua and Prannoy Roy covering elections, and later NDTV (if I recall correctly the first private news channel launched in India, by Prannoy Roy) did a great job covering elections.

The guests on Weekday today had interesting takes on how the US presidential campaigns affected the polls. One argued that actions by one camp were mostly canceled out by actions from the other camp, and that throughout the campaign Sen. Obama maintained a lead over Sen. McCain, and that the voting could have happened at any time from July with the same results. The other guest argued that this time the elections were different and that trends amongst young voters and African Americans, and a larger voter registration among democrats were crucial to Sen. Obama's victory. At some level both were in agreement, but the opinion that the campaigning was somewhat irrelevant, as posited by one of the guests, was curious.

My own takeaway was that the same data could be interpreted in many different ways - so despite both professors having access to exactly the same data (or the same ocean of data), the interpretations varied depending on their existing ideas, biases and hypotheses.

I imagine that the election poll data is a data miner's dream - plenty of data points to use to segment the population, slice and dice the voting public into blocs and look at correlations and trends, and compare with previous elections. What fun!

However post-facto analysis of data without knowledge of cause and effect can lend itself to a lot of problems in interpretation. Today's radio discussion reminded me of how a lot of tv and radio anchors often talk about how a certain state or town has been reliably picking the presidential winner, or even how the winner of a football game predicts the winner (I need dig up the reference for this one). This kind of data fitting can be downright misleading; however such kinds of data correlations are often accidental and are mostly just plain irrelevant.

For a cautionary tale in data mining giving silly results, read about how butter production in Bangladesh is a great predictor of the S&P performance for two decades: David Leinweber went searching for random data and found something that fit the bill. Clearly however, there is no cause and effect relationship here and so the correlation is meaningless.