php/Math   
Recreational Mathematics   
   home  |  library  |  contact
 Math Notes
 Math Programming [25]
 Regression [3]
 Data Mining [17]
 Notation [6]
 Linear Algebra [9]
 Stats & Prob [15]
 Math Cognition [5]
 Space & Physics [6]
 Formulas [5]
 Fun & Games [2]
 Haskell [1]
 Bayes Theory [1]
 Site News [0]
 Math Projects [5]
 Polynomials [1]
 Calculus [9]
 Number Theory [3]
 Optimization [2]
 Financial [1]

 Math Links
 PHP/ir
 Andrew Gelman
 Chance Wiki
 Daniel Lemire
 KD Knuggets
 Social Stats
 MySQL Performance
 Hunch.net
 Matthew Hurst
 JMLR
 JSS
 Hal Daume III
 Math Notes >> Formulas

Law of Surfing [Formulas
Posted on May 30, 2008 @ 03:31:42 PM by Paul Meagher

Bernardo Huberman formulated a few "laws of the web". One of these laws concerns the distribution of clickstream lengths.

The probability density function of L, the length of a clickstream, is, according to the "Law of Surfing", distributed according to an inverse guassian function:

f(L, v, λ) = sqrt(λ/2π) * L-3/2 * e-(λ/2*v*v*L)*(L-v)^2

What do the symbols L, v, λ mean?

L is session length measured in page clicks.

v is the expected value or mean. A fitted distribution of path length probabilities for Boston University students had a value of v=51.19 which the study referred to as "mean visits".

λ is related to the expected value and variance λ = v3/Var(L). The fitted value for λ was equal to 3.53.

Here is a php script to compute the probability densities for each clickstream length:

<?php
/**
* Raises a number to a floating point power
*
* a^b = e^(b log a) which is not 10log but the e-log (aka "ln")
* so instead of pow( $a , 0.6 ) use something like 
* exp( 0.6 * log($a))
*
* @see http://ca3.php.net/manual/en/function.pow.php#47297
*/
function fpow($base$fexp) {
  return 
exp($fexp log($base));
}

/** 
* Used to compute probability density associated with each clickstream
* length.
*/
function inverse_guassian($L$v$lambda) {
  return 
sqrt($lambda/(M_PI)) 
         * 
fpow($L, -3/2
         * 
exp(-1.0 * ($lambda/(pow($v2) * $L)) * pow($L-$v2)); 
}

$v 51.19;
$lambda 3.53;

for(
$L=1$L 20$L++) {
  
$p inverse_guassian($L$v$lambda);
  echo 
"f(".$L.", ".$v.", ".$lambda.") = ".$p."<br />";
}

?>

And these are the probability densites the program generates:

f(1, 51.19, 3.53) = 0.13738001323331
f(2, 51.19, 3.53) = 0.11731429052801
f(3, 51.19, 3.53) = 0.08563996238308
f(4, 51.19, 3.53) = 0.064395173972196
f(5, 51.19, 3.53) = 0.050294705235689
f(6, 51.19, 3.53) = 0.040551681574111
f(7, 51.19, 3.53) = 0.033538749822867
f(8, 51.19, 3.53) = 0.028310951045661
f(9, 51.19, 3.53) = 0.024298496373469
f(10, 51.19, 3.53) = 0.021143050105347
f(11, 51.19, 3.53) = 0.018610350025871
f(12, 51.19, 3.53) = 0.016541931634992
f(13, 51.19, 3.53) = 0.014827372300491
f(14, 51.19, 3.53) = 0.013387711569616
f(15, 51.19, 3.53) = 0.012165196593551
f(16, 51.19, 3.53) = 0.0111167387119
f(17, 51.19, 3.53) = 0.0102096203282
f(18, 51.19, 3.53) = 0.0094186073802048
f(19, 51.19, 3.53) = 0.0087239635554072

Permalink 

Basic Formulas for Optimal Foraging Theory [Formulas
Posted on May 23, 2008 @ 02:03:11 AM by Paul Meagher

I am currently reading the book "Optimal Foraging Theory" (Oxford: 2007) by Peter Pirolli.

Peter Pirolli lays the foundation for Optimal Foraging Theory (OFT) with this formula:

R = g(tw) / (tb + tw)

This is a rate equation. The equation tells us that the overall rate of gain R of food or information is equal to a gain function applied to the amount of time you have been foraging within the current patch g(tw) divided by the amount of time it takes to move between patches tb plus the time you have spent foraging in the current patch tw.

Charnov's Marginal Value Theorem can be used to find the optimal amount of time t* to spend in a patch.

To apply the theorem we first need to rewrite the equation above as an optimal rate R* equation:

R* = g(t*) / (tb + t*)

An optimal forager obeys the two rules below (see p. 8) to find an optimal t* value:

  • if (g'(tw) >= R*) then continue foraging in the patch.
  • if (g'(tw) < R*) then start looking for a new patch

Note that g'(tw) refers to the derivative of the gain function for within-patch foraging. The means we need to compute the slope of the gain function for various values of t. If the slope at t is greater than or equal to the maximal slope R*, we stay in the patch; otherwise, we move to a new patch.

This is an abstract formulation of the optimization problem that Optimal Foraging Theory is concerned with.

Applications to particular problems in web design require operationalizatizing the concept of a "patch" and what "foraging within a patch" and "moving between patches" consists of.

The book offers ideas about how to operationalize these concepts in particular domains, including web design and web usability. To learn how to apply optimal foraging theory from a usability guru, I recommend reading a recent article by Jakob Nielson (Nov. 2007: Long versus Short Content).

Permalink 

Cubic Polynomial Approximation [Formulas
Posted on February 24, 2007 @ 02:18:34 PM by Paul Meagher

Polynomials can be used to approximate the shape of a wide range of curves. In fact, you can always find a polynomial function P(x) that will approximate a differentiable function f(x) to within an arbitrary degree of accuracy e.

|f(x)-P(x)| <= e for all x in the interval I.

The reason a polynomial can be used to approximate an arbitrary curve is because each higher order term allows you to capture a different type of trend that might appear in the curve function f(x) you want to approximate.

The cubic polynomial approximator can be viewed as superimposing four formulas for capturing the constant, linear, quadratic, and cubic trends in a curve function f(x) interval.

The constant component:

P0(x) = a0

The linear component:

P1(x) = a0 + a1x1

The quadratic component:

P2(x) = a0 + a1x1 + a2x2

The cubic component:

P3(x) = a0 + a1x1 + a2x2 + a3x3

Permalink 

Parametric equations [Formulas
Posted on January 10, 2007 @ 04:57:58 PM by Paul Meagher

Instead of writing linear equations where y is a function of x:

y = f(x)

We can instead introduce a third shared variable t (because the time dimension t is often the shared third variable) that the x and y values are jointly dependent upon:

y = f(t)
x = g(t)

As a concrete example we can generate coupled x and y values using these two parametric linear equations:

x = 2t
y = t

This results in the following table of values:

txy
0 0 0
1 2 1
2 4 2
3 6 3
4 8 4
5 10 5

The table of values can be graphed using just the x and y values and this graph would illustrate how the x and y components evolved as a function time.

Parametric equations arise frequently in physics where the horizontal and vertical influences on an object can be modelled by two independent force equations. They also arise frequently in control/robotic contexts where we might independently control the x and y directions of motion. Finally, parametric equations provide an alternative way to generate many shapes to provide both mathematical insight into those shapes and a practical and efficient way to generate those shapes on a computer (e.g., computer graphics and animation applications).

Exercise

Create a table of values and a graph of the following parametric equations:

x = t * cos(t)
y = t * cos(t)

For 0 <= t <= 4π

What shape do these parametric equations generate?

Permalink 

2D and 3D geometry formulas [Formulas
Posted on December 6, 2006 @ 03:24:59 AM by Paul Meagher

Perimeter of a triangle
Formula Usage
P = a + b + c General formula for all triangles
P = 3s Equilateral triangle of side s

Area of a triangle
Formula Usage
A = 1/2*b*h General formula for all triangles
A = 1/2*a*b Right triangle whose legs are a and b
A = 3/4*s^2 Equilateral triangle of side s
A = 1/2*a*b*θ If two sides and the included angle are known

Permalink 

 Archive 
 


php/Math Project
© 2011. All rights reserved.