Posted on May 30, 2008 @ 03:31:42 PM by Paul Meagher
Bernardo Huberman formulated a few "laws of the web". One of these laws concerns the distribution of clickstream lengths.
The probability density function of L, the length of a clickstream, is, according to the "Law of Surfing", distributed according to an inverse guassian function:
f(L, v, λ) = sqrt(λ/2π) * L^{3/2} * e^{(λ/2*v*v*L)*(Lv)^2}
What do the symbols L, v, λ mean?
L is session length measured in page clicks.
v is the expected value or mean. A fitted distribution of path length probabilities for Boston University students had a value of v=51.19 which the
study referred to as "mean visits".
λ is related to the expected value and variance
λ = v^{3}/Var(L). The fitted value for λ was equal to 3.53.
Here is a php script to compute the probability densities for each clickstream length:
<?php /** * Raises a number to a floating point power * * a^b = e^(b log a) which is not 10log but the elog (aka "ln") * so instead of pow( $a , 0.6 ) use something like * exp( 0.6 * log($a)) * * @see http://ca3.php.net/manual/en/function.pow.php#47297 */ function fpow($base, $fexp) { return exp($fexp * log($base)); }
/** * Used to compute probability density associated with each clickstream * length. */ function inverse_guassian($L, $v, $lambda) { return sqrt($lambda/(2 * M_PI)) * fpow($L, 3/2) * exp(1.0 * ($lambda/(2 * pow($v, 2) * $L)) * pow($L$v, 2)); }
$v = 51.19; $lambda = 3.53;
for($L=1; $L < 20; $L++) { $p = inverse_guassian($L, $v, $lambda); echo "f(".$L.", ".$v.", ".$lambda.") = ".$p."<br />"; }
?>
And these are the probability densites the program generates:
f(1, 51.19, 3.53) = 0.13738001323331 f(2, 51.19, 3.53) = 0.11731429052801 f(3, 51.19, 3.53) = 0.08563996238308 f(4, 51.19, 3.53) = 0.064395173972196 f(5, 51.19, 3.53) = 0.050294705235689 f(6, 51.19, 3.53) = 0.040551681574111 f(7, 51.19, 3.53) = 0.033538749822867 f(8, 51.19, 3.53) = 0.028310951045661 f(9, 51.19, 3.53) = 0.024298496373469 f(10, 51.19, 3.53) = 0.021143050105347 f(11, 51.19, 3.53) = 0.018610350025871 f(12, 51.19, 3.53) = 0.016541931634992 f(13, 51.19, 3.53) = 0.014827372300491 f(14, 51.19, 3.53) = 0.013387711569616 f(15, 51.19, 3.53) = 0.012165196593551 f(16, 51.19, 3.53) = 0.0111167387119 f(17, 51.19, 3.53) = 0.0102096203282 f(18, 51.19, 3.53) = 0.0094186073802048 f(19, 51.19, 3.53) = 0.0087239635554072
