Disturbingly, Quantitative Analysis May Have Methodological Shortcomings – Blogosphere Edition

After reading some academic studies about the blogosphere – deep breath – quantitative analysis might have some problems getting at what’s really going on. The problem has to do with studies that use links to evaluate trust and communication – because, well you’ve got to count something, right? You could use shallow natural language processing, but you still need to figure out who’s being referenced – and that gets screwed up by nicknames, substitutions of authors for blog names, embedding of reference within urls, etc. To solve that you need to know something in advance about the domain you’re studying – and that’s precisely what scholars are supposed to pretend not to know.

Even if you could overcome all that, it’s not hard to figure out why natural language processing of the blogosphere might run into problems. We suppose you could hand-code all the content. But then instead of studying the blogosphere you’re studying how apathetic 19 year olds screwed up your data because they didn’t understand political content or sense ideological valence or get sarcasm.

The methodological problem is that linking and blogrolling are no longer “naive” – what linking signifies has now become reflexive. When Fred Thompson’s campaign blog went live, Hot Air’s Allahpundit sardonically noted that while the content is pedestrian thus far but the blogroll is impeccable – minus the absence of MichelleMalkin.com, the home blog of A-list blogger and Hot Air owner Michelle Malkin. That topic provided more grist for the reflexive joke mill when her site was added a few days later. This rhetoric wouldn’t present any real methodological problem as long as bloggers still used links as if they were naive. But – of course – that’s not how the world works.

Two quick examples from Ace of Spades. The first has to do with how bloggers have often stopped linking to blogs they find distasteful. Blogrolls were always a way to show ideological affinity, and even the early days of the blogosphere saw groups of bloggers announce to readers when they were angry enough to remove links to other bloggers. But as blog traffic has begun to translate into money for individual bloggers, bloggers have taken to explicitly refusing to link to opposing sites even when they reference them. While this (obviously) doesn’t do anything to link-based measures of ideological affinity, it’s potentially devastating when studies use links to gauge communication across the left and right blogospheres (and in every other respect, that particular study was really good).

Of course, you could measure exactly how badly explicit refusal to link corrupts the data – if you could get a clean measure of the actual amount of left-right referencing. But that would require you to… know something about the data beforehand. At the very least, demurring to link to opponents is undermining the optimists who always said that linking to what bloggers criticize will prevent the blogosphere from becoming an echo chamber.

In another reflexive turn, bloggers have sometimes taken to not linking even to ideologically sympathetic blogs – in Ace’s case, because those bloggers don’t adhere to standard linking etiquette. Both the action and the reaction here are, to say the least, unhelpful to people who want to use embedded links to analyze the density of blogging communities – or to write whole theses about same.

On the other hand, analyzing the structure and measuring the density of the blogosphere is actually something that you need quantitative methods to do. There are a lot of objects of analysis that are profoundly ill-suited for quantitative analysis: literature, interpersonal dynamics, or – our favorite stats irony – Foucault’s conception of power. But networks are actually things that you have to use quantitative analysis to get at. The problem is that the blogosphere is a profoundly rhetorical network, where people constantly twist and play with language. It’s not a pomo issue – quite the opposite – it’s just a mundane case of people communicating, in multiple ways and with multiple tones.

Except – ya know – there aren’t numbers to prove that.

References:
* snark [Wiktionary]
* Fred’s blog debuts [Hot Air]
* Michelle Malkin
* Crisis averted: MM.com now featured on Fred’s blogroll [Hot Air]
* Ken Layne Needs To Drum Up Weak Traffic At Wonkette, So He Endorses Strong-Form of Tillman Assassination Theory [Ace of Spades]
* Adamic, L. A., and N. Glance. “The Political Blogosphere and the 2004 Us Election: Divided They Blog.” Proceedings of the 3rd international workshop on Link discovery (2005): 36-43.
* Kale, Anubhav. “Modeling Trust and Influence on Blogosphere Using Link Polarity.” University of Maryland, 2007.
Previously:
* Welcome to IIS

Related Icon Index Symbol Posts: