The Web as a graph: How far we are

IRIS

In this article we present an experimental study of the properties of webgraphs. We study a large crawl from 2001 of 200M pages and about 1.4 billion edges, made available by the WebBase project at Stanford, as well as several synthetic ones generated according to various models proposed recently. We investigate several topological properties of such graphs, including the number of bipartite cores and strongly connected components, the distribution of degrees and PageRank values and some correlations; we present a comparison study of the models against these measures. Our findings are that (i) the WebBase sample differs slightly from the (older) samples studied in the literature, and (ii) despite the fact that these models do not catch all of its properties, they do exhibit some peculiar behaviors not found, for example, in the models from classical random graph theory. Moreover we developed a software library able to generate and measure massive graphs in secondary memory; this library is publicy available under the GPL licence. We discuss its implementation and some computational issues related to secondary memory graph algorithms.

The Web as a graph: How far we are

Donato D;Laura L;Leonardi S;Millozzi S

2007-01-01

Abstract

In this article we present an experimental study of the properties of webgraphs. We study a large crawl from 2001 of 200M pages and about 1.4 billion edges, made available by the WebBase project at Stanford, as well as several synthetic ones generated according to various models proposed recently. We investigate several topological properties of such graphs, including the number of bipartite cores and strongly connected components, the distribution of degrees and PageRank values and some correlations; we present a comparison study of the models against these measures. Our findings are that (i) the WebBase sample differs slightly from the (older) samples studied in the literature, and (ii) despite the fact that these models do not catch all of its properties, they do exhibit some peculiar behaviors not found, for example, in the models from classical random graph theory. Moreover we developed a software library able to generate and measure massive graphs in secondary memory; this library is publicy available under the GPL licence. We discuss its implementation and some computational issues related to secondary memory graph algorithms.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2007
			
	Parole chiave
	
				experimentation
measurement
theory
verification
graph structure
models
world-wide-web
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
R06TOIT.pdf non disponibili Dimensione 565.72 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	565.72 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14086/332

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

social impact