The blogosphere at a glance

This is a brief description of a research project funded by the Swedish Internet
Infrastructure Foundation (.SE).

The last couple of years, we have seen a remarkable increase in the number of blogs on the web. Blogs and blog communities differ from traditional Web pages in that they are highly dynamic. It is therefore difficult for an individual reader or blog writer - or anyone interested in exploring or monitoring the blogosphere - to keep track of what is interesting and relevant. This calls for novel techniques and tools that facilitate blog navigation and monitoring.

For this reason we have developed a method that provides a global overview of the blogosphere based on the content of individual blogs. The blogosphere is then represented as a network where nodes constitute blogs, and where blogs are connected if they have similar semantic content. Since the representation rely on unprocessed content alone rather than explicit web links, for instance, it may reveal blog relations that otherwise are hidden.

As seen in the figure - which shows a portion of the Swedish blogosphere - the network has a distinct clustered structure, where blogs that treat similar topics are grouped together. This facilitates navigation and also provides a mean to identify spam blogs since these form sub-networks with characteristic properties.

The network representation of the blogosphere is also found to be hierarchically structured, and so may pave the way for applications that operate on different levels of resolution - from blogs to groups of similar blogs, to groups of groups of blogs and so forth. This intrinsic organization of the blogosphere may facilitate blog navigation further. That is, although the blogosphere can seem overwhelming at times, it actually appears that it is naturally structured to enable effective navigation and monitoring.

    Görnerup, Olof and Boman, Magnus
(2009)A baseline for content-based           blog classification. e-print 0909.4416.