php/Math   
Recreational Mathematics   
   home  |  library  |  contact
 Math Notes
 Math Programming [25]
 Regression [3]
 Data Mining [17]
 Notation [6]
 Linear Algebra [9]
 Stats & Prob [15]
 Math Cognition [5]
 Space & Physics [6]
 Formulas [5]
 Fun & Games [2]
 Haskell [1]
 Bayes Theory [1]
 Site News [0]
 Math Projects [5]
 Polynomials [1]
 Calculus [9]
 Number Theory [3]
 Optimization [2]
 Financial [1]

 Math Links
 PHP/ir
 Andrew Gelman
 Chance Wiki
 Daniel Lemire
 KD Knuggets
 Social Stats
 MySQL Performance
 Hunch.net
 Matthew Hurst
 JMLR
 JSS
 Hal Daume III
 Math Notes >> Recent

Updates to LognormalDistribution.php [Stats & Prob
Posted on December 23, 2013 @ 01:15:18 PM by Paul Meagher

Božo Cvetkovski, who is working on a Master degree and doing some work in PHP, contacted me about a couple of errors he found in the LognormalDistribution.php class.

These are Božo's findings and fixes:

LognormalDistribution.php class gives correct result for values mu = 0, sigma = 1 that are used in example file which I found in test folder.

http://www.phpmath.com/build02/PDL/docs/tests.php?src=lognormal.php#source

But when you change values for mu and sigma it does not give correct results.

I did a little bit of research and found that you made a mistake by using the variance instead of standard deviation.

You need to change LognormalDistribution.php like this:


Find lines 29-31:

function LognormalDistribution($mu=0.0,$sigma=1.0) { $this->normal = new NormalDistribution($mu, $sigma * $sigma); }

Replace with this:

function LognormalDistribution($mu=0.0,$sigma=1.0) { $this->normal = new NormalDistribution($mu, $sigma) }


Find lines 43-45:

function getSigmaParameter() { return sqrt($this->normal->getVariance()); }

Replace with this:

function getSigmaParameter() { return $this->normal->getStandardDeviation(); }


Before change for mu= 4.15 and sigma= 0.37 it would return this:

LognormalDistribution(4.15, 0.37)
Methods Output
getMuParameter() 4.15
getSigmaParameter() 0.37
PDF(.2) 0.00025050224546219
CDF(.50) 1.9101344312285E-274
ICDF(0.95) 79.454176680834

After the change it returns these values:

LognormalDistribution(4.15, 0.37)
Methods Output
getMuParameter() 4.15
getSigmaParameter() 0.37
PDF(.2) 1.307667356577E-52
CDF(.50) 1.8882790329553E-39
ICDF(0.95) 116.58211120768


I tested the new class by comparing its results with results I get in Excel and Matlab. New, modified class works as expected.

I'd like to thank Božo for his work on finding and fixing these errors and have acknowledged Božo as another author of the LognormalDistribution.php class.

If anyone discovers any other errors in the Probability Distributions Library (PDL) please let me know. Also, if anyone has any new probability distribution classes, or wants to suggest changes the parameterization of a class (more parameters, different parameters), then shoot me an email with your thoughts or code. I've not had much time for awhile now to do php-based math programming but I'd love to see some multivariate distributions added to the library. The php port of the JAMA linear algebra library might come in handy here.

Permalink 

PHP functions to perform linear algebra decompositons [Linear Algebra
Posted on February 11, 2012 @ 09:16:36 PM by Paul Meagher

Here is a session in Octave demonstrating how to compute the QR decomposition of a matrix:

octave-3.2.4.exe:1> m = [1, 2; 3, 4]
m =

   1   2
   3   4

octave-3.2.4.exe:2> [Q R] = qr(m)
Q =

  -0.31623  -0.94868
  -0.94868   0.31623

R =

  -3.16228  -4.42719
   0.00000  -0.63246

To emulate this functional approach (versus object-oriented approach) to decomposing a matrix in PHP, we need to wrap calls to the JAMA Matrix.php object inside a qr( ) function as follows:

<?php

// Set path to JAMA Matrix object.
require_once "../Matrix.php";

// Wrap JAMA matrix functionality inside a qr function.
function qr($m) {
  
$A  = new Matrix($m);
  
$QR $A->qr();
  
$ans['R']  = $QR->getR();
  
$ans['Q']  = $QR->getQ();
  return 
$ans;
}

// Create matrix.
$m = array(array(12),array(34));

// Qet QR decompostion of a matrix.
$ans qr($m);

// Output Q and R components of answer object.
?>

<pre>
Q = 
<?php print_r($ans['Q']); ?>
</pre>

<pre>
R = 
<?php print_r($ans['R']); ?>
</pre>

The output of this script looks like this:

Q = 
Matrix Object
(
    [A] => Array
        (
            [0] => Array
                (
                    [1] => 0.94868329805051
                    [0] => -0.31622776601684
                )

            [1] => Array
                (
                    [1] => -0.31622776601684
                    [0] => -0.94868329805051
                )

        )

    [m] => 2
    [n] => 2
)

R = 
Matrix Object
(
    [A] => Array
        (
            [0] => Array
                (
                    [0] => -3.1622776601684
                    [1] => -4.4271887242357
                )

            [1] => Array
                (
                    [0] => 0
                    [1] => 0.63245553203368
                )

        )

    [m] => 2
    [n] => 2
)

Interestingly, the answers produced by the Octave and PHP scripts differ in the signs associated with a few of the answer values. Does this matter? Well, if we take the values of Q and R output by PHP, enter them into Octave, and then multiply Q and R together, we get the original matrix back. This reconstruction of the original matrix demonstrates the correctness of the decomposition and that the decomposition of a matrix into Q and R components is not unique.

octave-3.2.4.exe:1> Q = [-0.31622, 0.94868; -0.94868, -0.31622]
Q =

  -0.31622   0.94868
  -0.94868  -0.31622

octave-3.2.4.exe:2> R = [-3.1622, -4.42718; 0, 0.63245]
R =

  -3.16220  -4.42718
   0.00000   0.63245

octave-3.2.4.exe:3> Q * R
ans =

   0.99995   1.99996
   2.99992   3.99998

Note that if I didn't reduce the precision of the Q and R matrix values, the output would be [1 2; 3 4] or the exact version of the original matrix.

This proof-of-concept demonstrates how we can develop a matlab/octave-like functional API for linear algrebra decompositions in PHP by incorporating calls to the JAMA library methods inside qr(), svd(), lu(), eig(), and chol() functions. A functional API for linear algebra would make doing linear algrebra in PHP much more direct, interactive and enjoyable. Object-oriented linear algebra libraries like JAMA play a supporting role, hidden behind a set of linear algebra functions like qr(), svd(), lu(), eig() and chol() for operating on matrices.

Permalink 

Flexible computation of a sum [Math Programming
Posted on February 6, 2012 @ 11:49:32 AM by Paul Meagher

I re-implemented Matlab/Octave's sum function in PHP. The Matlab/Octave sum function is more flexible than PHP's array_sum function as it can handle scalar, vector, or matrix input. In the case of matrices, the sum function can also be directed to sum by column or row. The default is to sum a matrix by column which is handy for getting a bunch of column sums from a data matrix.

The way that the sum function works is similar to the way an ave or stdev function would work in Matlab/Octave. Eventually it would be nice to have all these data summarization functions available and working in a Matlab/Octave like way.

Without futher ado, here is the code that implements the sum function. Note that it reuses the get_type() function mentioned in previous blogs:

<?php
/**
* Partial implementation of matlab sum function.  Does not 
* handle arrays with dimension greater than 2.
*
* @see http://www.mathworks.com/help/techdoc/ref/sum.html
*/

/**
* Determines whether a variable is a scalar, vector, or matrix.

* @params mixed $var Scalar, vector, or matrix data.
* @return string scalar, vector, matrix or false 
*/
function get_type($var) {  
  if (
is_numeric($var)) {
    return 
"scalar";
  } elseif (
is_array($var)) {
    if (
is_array($var[0])) 
      return 
"matrix";
    else 
      return 
"vector";
  } else 
    return 
false;
}

/**
* Computes the row or column sums.  The default is to compute
* column sums and to store each column sum into a row vector.
*
* @param array $a A one to two dimensional array of numeric values.
* @param integer $dim The dimension to sum (1=column sums, 2=row sums).
* @return array/string $sum A row ($dim=1) or column ($dim=2) vector of sums.
*/
function sum($a$dim=1) {  
  
$type get_type($a);  
  if (
$type=="scalar")
    return array(
$a);
  elseif (
$type=="vector"
    return array(
array_sum($a));
  else {
    
// zero the sums array
    
$sums = array();
    
// num rows
    
$nrows count($a);        
    
// num cols
    
$ncols count($a[0]);         
    if (
$dim == 1) {
      
// compute sum of matrix columns
      
for ($j=0$j $ncols$j++) {
        
$sums[$j] = 0.0;
        for(
$i=0$i $nrows$i++) 
          
$sums[$j] += $a[$i][$j];
      }
      return 
$sums;
    } else {
      
// compute sum of matrix rows
      
for ($i=0$i $nrows$i++) 
        
$sums[$i][0] += array_sum($a[$i]);
      return 
$sums;
    }      
  }
}


$s 5;
$v = array(1,2,3);
$m = array(array(1,2,3),array(1,2,3),array(1,2,3));

// scalar case
$ans =  sum($s);
echo 
"<pre>";
print_r($ans);
echo 
"</pre>";
echo 
"<br />";

// row vector case
$ans =  sum($v);
echo 
"<pre>";
print_r($ans);
echo 
"</pre>";
echo 
"<br />";

// matrix case - sum cols (default)
$ans =  sum($m);
echo 
"<pre>";
print_r($ans);
echo 
"</pre>";
echo 
"<br />";

// matrix case - sum rows
$ans =  sum($m2);
echo 
"<pre>";
print_r($ans);
echo 
"</pre>";
echo 
"<br />";

// compute sum of all elements
$ans =  sum(sum($m));
echo 
"<pre>";
print_r($ans);
echo 
"</pre>";
echo 
"<br />";

?>

The output of this script looks like this:

Array
(
    [0] => 5
)


Array
(
    [0] => 6
)


Array
(
    [0] => 3
    [1] => 6
    [2] => 9
)


Array
(
    [0] => Array
        (
            [0] => 6
        )

    [1] => Array
        (
            [0] => 6
        )

    [2] => Array
        (
            [0] => 6
        )

)


Array
(
    [0] => 18
)

One important point to note is that when we sum a single scalar value like sum(5) the result is an array and not a scalar value. I suspect that it is of fundamental importance to matrix-oriented languages like Matlab/Octave that results be wrapped in an array structure so that we can string together matrix operations in a conistent manner.

Permalink 

Elementwise operations [Math Programming
Posted on January 28, 2012 @ 06:33:46 AM by Paul Meagher

In my last post, I discussed how basic arithmetic operations might be vectorized using PHP. I wasn't very satisfied with the implemention. One issue was that I was duplicating alot of code and felt that the vectorization code could be reused across vectorized arithmetic operations better. Another issue was that I didn't study how matlab/octave elementwise operations worked in detail so didn't know what the allowable argument types were (scalar, vector, matrix) and what combination of argument types was allowed on both sides of the arithmetic operation. I'm happy to report that I've addressed both of these issues in the new implementation. While I could probably add more error handling, I think the code below could be used to easily vectorize many operations in PHP such as exp, sin, cos, pow, etc...

<?php
/**
* elementwise_operations.php
*
* Functions that mimic matlab/octave's elementwise operations (e.g., .+, .-, .*, ./):
*
*  .+  =  madd  =  element-by-element addition
*  .-  =  msub  =  element-by-element subtraction
*  .*  =  mmul  =  element-by-element multiplication
*  ./  =  mdiv  =  element-by-element division
*/

/**
* Determines whether a variable is a scalar, vector, or matrix
*/
function get_type($var) {  
  if (
is_numeric($var)) {
    return 
"scalar";
  } elseif (
is_array($var)) {
    if (
is_array($var[0])) 
      return 
"matrix";
    else 
      return 
"vector";
  } else 
    return 
false;
}

/**
* Function that vectorizes the elementary operations.
*/
function vectorize($a$b$op) {
  
  
$atype get_type($a);  
  
$btype get_type($b);

  if ((
$atype != false) AND ($btype != false))
    
$arguments $atype."-".$btype;
  else 
    die(
"Argument is not a scalar, vector or matrix.");
  
  switch (
$arguments) {
    
    case 
"scalar-scalar"
      return 
$op($a$b);
      break;

    case 
"vector-scalar"
      
$asize count($a);
      
$b     array_fill(0$asize$b);
      return 
array_map($op$a$b);      
      break;

    case 
"vector-vector":
      
$asize count($a);
      
$bsize count($b);      
      if (
$asize == $bsize
        return 
array_map($op$a$b);              
      else 
        die(
"Nonconformant arguments.");
      break; 
      
    case 
"matrix-scalar":        
      
$nrows count($a);
      
$ncols count($a[0]);      
      for(
$i=0$i<$nrows$i++) {
        
$bvec  array_fill(0$ncols$b);        
        
$c[$i] = array_map($op$a[$i], $bvec);           
      }  
      return 
$c;
      break;
      
    case 
"matrix-matrix":        
      
$arows count($a);
      
$acols count($a[0]); 
      
$brows count($b);
      
$bcols count($b[0]);       
      if ((
$arows == $brows) AND ($acols == $bcols)) {
        for(
$i=0$i<$arows$i++) 
          
$c[$i] = array_map($op$a[$i], $b[$i]);   
        return 
$c;                        
      } else 
        die(
"Nonconformant arguments.");      
      break;             
  }    
  
}


// Elementary arithmetic operations.

function _add($addend1$addend2) {
  return (
$addend1 $addend2);
}

function 
_sub($minuend$subtrahend) {
  return (
$minuend $subtrahend);
}

function 
_mul($factor1$factor2) {
  return (
$factor1 $factor2);
}

function 
_div($numerator$divisor) {
  return (
$numerator $divisor);
}


// Elementwise operators.

function madd($m$addend) {
  return 
vectorize($m$addend"_add");
}

function 
msub($m$subtrahend) {
  return 
vectorize($m$subtrahend"_sub");
}

function 
mmul($m$factor) {
  return 
vectorize($m$factor"_mul");
}

function 
mdiv($m$divisor) {
  return 
vectorize($m$divisor"_div");
}


// Test elementwise addition.

$s 5;
$v = array(123);
$m = array(array(123), array(456));

// scalar-scalar case
$ans madd($s$s);
print_r($ans);
echo 
"<br />";

// vector-scalar case
$ans madd($v$s);
print_r($ans);
echo 
"<br />";

// vector-vector case
$ans madd($v$v);
print_r($ans);
echo 
"<br />";

// matrix-scalar case
$ans madd($m$s);
print_r($ans);
echo 
"<br />";

// matrix-matrix case
$ans madd($m$m);
print_r($ans);

// What the output looks like:

// 10
// Array ( [0] => 6 [1] => 7 [2] => 8 )
// Array ( [0] => 2 [1] => 4 [2] => 6 )
// Array ( [0] => Array ( [0] => 6 [1] => 7 [2] => 8 ) [1] => Array ( [0] => 9 [1] => 10 [2] => 11 ) )
// Array ( [0] => Array ( [0] => 2 [1] => 4 [2] => 6 ) [1] => Array ( [0] => 8 [1] => 10 [2] => 12 ) ) 

?>
  

Permalink 

Vectorizing basic arithmetic operations [Math Programming
Posted on January 23, 2012 @ 12:45:37 AM by Paul Meagher

Continuing on with my research into adding better matrix/vector capabilities to PHP, I decided to implement the basic arithmetic functions (addition, subtraction, division, multiplication) as vectorized functions. To make these functions concise and easy to remember, I used a system of function naming that involves prefixing the function call with a "v" (for "vector") and then using the first three letters of the arithmetic operation name (add, sub, div, mul) as the suffix:

  1. vadd($v, $addend): vector addition
  2. vsub($v, $subtrahend): vector subtaction
  3. vdiv($v, $divisor): vector division
  4. vmul($v, $factor): vector multiplication

In all cases, the first argument is a vector of numbers, followed by the value to apply to each element of the vector. The code below is proof-of-concept of how to vectorize these functions and is not meant as production-level code:

<?php
/**
* Set of functions that vectorize the basic arithmetic functions:
* addition, subtraction, division, multiplication.  No error 
* checking.
*/

// function for vectorized addition

function _add($addend1$addend2) {
  return (
$addend1 $addend2);
}

function 
vadd($v$addend) {
  
$size      count($v);
  
$add_array array_fill(0$size$addend);
  return 
array_map("_add"$v$add_array);
}

// function for vectorized subtraction

function _sub($minuend$subtrahend) {
  return (
$minuend $subtrahend);
}

function 
vsub($v$subtrahend) {
  
$size      count($v);
  
$sub_array array_fill(0$size$subtrahend);
  return 
array_map("_sub"$v$sub_array);
}

// function for vectorized multiplication

function _mul($factor1$factor2) {
  return (
$factor1 $factor2);
}

function 
vmul($v$factor) {
  
$size      count($v);
  
$mul_array array_fill(0$size$factor);
  return 
array_map("_mul"$v$mul_array);
}

// function for vectorized division

function _div($numerator$divisor) {
  return (
$numerator $divisor);
}

function 
vdiv($v$divisor) {
  
$size      count($v);
  
$div_array array_fill(0$size$divisor);
  return 
array_map("_div"$v$div_array);
}

$v   = array(51015);
$arg 5;

$b vadd($v$arg);
print_r($b);
echo 
"<br />";

$b vsub($v$arg);
print_r($b);
echo 
"<br />";

$b vmul($v$arg);
print_r($b);
echo 
"<br />";

$b vdiv($v$arg);
print_r($b);
echo 
"<br />";

// Output:

// Array ( [0] => 10 [1] => 15 [2] => 20 )
// Array ( [0] => 0 [1] => 5 [2] => 10 )
// Array ( [0] => 25 [1] => 50 [2] => 75 )
// Array ( [0] => 1 [1] => 2 [2] => 3 ) 

?>

The alternative to vectorized functions is the ad-hoc creation of loops to perform similar vectorized operations in your code. Think how much programming time and effort has been wasted coding loops to do vectorized arithmetic when there could be vectorized arithmetic functions available to do it all in one simple command.

This only touches the surface of the number of mathematical functions that should be vectorized in PHP (e.g., vsin, vcos, vexp, etc...) to make PHP a much more efficient language to program numeric-based algorithms. We could easily add a "v" suffix to all these function to indicate that they are all vector-based.

Another option to consider would be to go even further and have these functions handle matrix operations, with vectors being just the special case of a one-dimensional matrix. In that case, we might want to instead prefix all the functions with an "m" instead of a "v". I had considered this, but was too lazy to implement functions to handle the matrix case. This is a good exercise for the reader.

At the end of the day we should look at languages like Matlab/Octave to give us guidance on how to proceed with respect to the range of functions to vectorize and whether we should handle vector or matrix arguments to these functions. While we may not be able to use Matlab/Octave's concise syntax to code vectorized algorithms, we can develop slightly more verbose functions like vadd, vsub, vdiv, and vmul to efficiently code numeric algorithms.

Permalink 

Matrix manipulation functions for PHP: Part 3 [Math Programming
Posted on January 14, 2012 @ 05:28:18 PM by Paul Meagher

Another useful function that matlab/octave supports is the linspace(start, end, num_points) function. The linspace function returns an array of n numbers that are equally spaced between the start and end points. A function like this is very useful when graphing an axis and when testing vectorized functions. You will see it used quite frequently in matlab/octave examples.

Below is an implementation that captures the behavior of Octave's linspace function. It also includes some code to test the output of the linspace function.

<?php

define
("PRECISION"4);

function 
linspace($start$end$num_points) {

  
$format "%01.".PRECISION."f";

  
$start      = (float)$start;
  
$end        = (float)$end;  
  
$num_points = (int)$num_points 1;
  
  
$difference abs($end-$start);
 
  if (
$num_points <= 1)
    return array(
$end);
    
  if (
$num_points == 2
    return array(
$start$end);
  
  if (
$num_points >= 3) {

    if (
$difference 0) {

      
      
$increment sprintf($format$difference/$num_points);  
      
$increment $difference/$num_points;        
      
      
$A[0] = sprintf($format$start);
     
      for (
$i=1$i $num_points$i++) {
        
$A[$i] = sprintf($format, ($A[$i-1] + $increment));
        
$next_num $A[$i];
      }
        
      
$A[] = sprintf($format$end);

      return 
$A;

    } else {
    
      return 
array_fill(0$num_points+1$start);
      
    }  

  }

}

// Testing the output of the linspace function.

echo "<pre>";
echo 
"linspace(1, 5, 4) \n";
print_r(linspace(154));
echo 
"\n";
echo 
"linspace(1, 1.8, 4) \n";
print_r(linspace(11.84));
echo 
"\n";
echo 
"linspace(1, 1, 4) \n";
print_r(linspace(114));
echo 
"\n";
echo 
"linspace(5, 7, 0)\n";
print_r(linspace(570));
echo 
"</pre>";
?>

The output of this script looks like this:

linspace(1, 5, 4) 
Array
(
    [0] => 1.0000
    [1] => 2.3333
    [2] => 3.6666
    [3] => 5.0000
)

linspace(1, 1.8, 4) 
Array
(
    [0] => 1.0000
    [1] => 1.2667
    [2] => 1.5334
    [3] => 1.8000
)

linspace(1, 1, 4) 
Array
(
    [0] => 1
    [1] => 1
    [2] => 1
    [3] => 1
)

linspace(5, 7, 0)
Array
(
    [0] => 7
)

An interesting aspect of the behavior of the linspace function is that it handles odd cases in ways that one might think should throw an error. Instead of throwing an error when you enter, for example, linspace(5, 7, 0) - because you can't return 0 points - it instead tries to do something sensible and returns the end point (i.e., 7). Whether this is sensible or not I will leave you to decide, however, the larger point is that Octave seems biased towards returning a numeric result of some sort rather than an error result. If you don't know the detailed behavior of these functions you could end up with some nasty surprises in terms of the result returned; perhaps it would be better if an error gets returned rather than a numeric result in some cases. Nevertheless, when doing mathematical programming you should consider whether it is better to throw an error result in response to odd inputs or try to generate a sensible numeric answer by, for example, anticipating the type of mistakes a user might make when passing arguments to a function and handling this mistakes appropriately (e.g., the low and hi values might be in reverse order from what is expected, so reverse them, and give the answer based upon the corrected ordering of arguments). A good way to master Octave is to port some of it's matrix manipulation functions to PHP. To do so you will need to observe how the Octave function behaves with boundary cases; a type of study you probably would not have engaged in otherwise.

On another note, it is my hope that I can begin to revive this blog again in 2012. I don't have time to blog as frequently as I did in the past, however, I would like to try for at least one blog post per week - morely like on weekends when I have more time for recreational math, than on weekdays. This is one of my resolutions in 2012 so stay tuned to see if this pans out :-)

Permalink 

Matrix manipulation functions for PHP: Part 2 [Math Programming
Posted on January 7, 2012 @ 01:41:26 PM by Paul Meagher

In the last post, I bemoaned the fact that PHP lacks some useful matrix manipulation functions that matlab/octave support. I decided to see how difficult it would be to create a script that would 1) parse a matlab/octave-like recipe for creating a matrix, and 2) generate a matrix based upon that parse. I initially thought such work would require some tricky regular expressions for parsing the recipe, but it appears that the language for specifying matrices was designed to make parsing relatively easy and that major parts of the parsing could be handled by exploding on various control characters. I'm not claiming that the code below implements all of the matlab/octave language for specifying matrices, however, it does support much of functionality you will see discussed in matlab/octave books.

<?php
/**
* @script matrix.php
*
* Script for creating matrices using matlab/octave like recipes.
*
* @author Paul Meagher
* @version 0.1
* @modified Jan 6, 2012
*/


/**
* Returns a parse of the recipe for a desired matrix.
* The parse that is returned contains a "valid" flag 
* that is set to false (i.e., 0) if the number of 
* columns in each row is unequal.
*/
function parse_matrix_recipe($recipe) {
  
  
// Initialize parse validity status to 1. Parse validity 
  // status will be set to 0 if expression is invalid.    
  
$parse['valid'] = 1;       
  
  
$num_rows 0;
  
$num_cols 0;

  
// Get rid of excess whitespace.
  
$recipe preg_replace('/\s\s+/'' '$recipe);
      
  
// Get all the rows.
  
$rows explode(';'$recipe);
  
  
$num_rows count($rows);
  
  
// Parse all the rows  
  
for($i=0$i<$num_rows$i++) {
  
    
// Check first if the row has the : range delimiter.
    
if (substr_count($rows[$i], ":")) {
  
      list(
$start$limit) = explode(":"$rows[$i]);
        
      if (
settype($start"int") AND settype($limit"int")) {
    
        
$parse[$i]['start'] = $start;
        
$parse[$i]['limit'] = $limit;        
              
        
// Counting number of elements generated by range function
        // is an inefficient way of checking that all rows have
        // same number of columns, but it work.  Optimize later :-)
        
if ($i == 0
          
$init_num_cols $row_num_cols count(range($start$limit));
        else
          
$row_num_cols count(range($start,$limit));
                        
      } else {        
        
        
// Cannot convert range values to integers.
        
$parse['valid'] = 0;
        break;
        
      }
    
    } else {          
            
      
$parse[$i]['array'] = explode(" "$rows[$i]);
            
      if (
$i == 0
        
$init_num_cols $row_num_cols count($parse[$i]['array']);
      else
        
$row_num_cols count($parse[$i]['array']);
            
    }

    
// Exit loop and set parse to false if one row has more columns 
    // than another.
    
if ($init_num_cols != $row_num_cols) {
      
$parse['valid'] = 0;       
      break;
    }    
    
  }
  
  
// If parse is valid, set matrix dimensions.  
  
if ($parse['valid'] == 1) {
    
$parse['num_rows'] = $num_rows;
    
$parse['num_cols'] = $init_num_cols;  
  }
  
  return 
$parse;

}

/**
* Uses parse elements to generate and return a matrix
*/
function generate_matrix($parse) {
 
  
$mat = array();
 
  
$num_rows $parse['num_rows'];
  
  if (
$num_rows == 1) {    

    
// Assign supplied array to vector.
    
if (isset($parse[0]['array']))
      
$vec $parse[0]['array'];
    
// Assign specified range of values to vector.  
    
else 
      
$vec range($parse[0]['start'], $parse[0]['limit']);
      
    return 
$vec;
    
  } else {    
    
    for(
$i=0$i<$num_rows$i++) {        

      
// Assign supplied array to row. 
      
if (isset($parse[$i]['array']))      
        
$mat[$i] = $parse[$i]['array'];
      
// Assign range of values to row.  
      
else
        
$mat[$i] = range($parse[$i]['start'], $parse[$i]['limit']);
    }
    
    return 
$mat;
    
  }

}

/**
* Main function for creating matrices.
*/
function m($recipe) {  
  
  
$parse parse_matrix_recipe($recipe);
  
  if (
$parse['valid'] == 1
    return 
generate_matrix($parse); 
  else
    return array();
    
}

?>

As you can see above, the main function used to specify a matrix is the m( ) function. I envision a complementary submatrix operator, sm( ), that would be used to access a submatrix of a supplied matrix (i.e., sm($recipe, $A)). Matlab/octaves language for specifying a submatrix is more complex than the language for specifying a matrix, so coding the sm( ) function will be a bit more challenging. You don't really know how challenging, however, until you try ...

Below is a test script that shows that the matrix.php script outputs the appropriate matrices given matlab/octave like recipes.

<?php

include "matrix.php";

$recipe1 '-1:1; 3:1';
$recipe2 '1;2;3;4';
$recipe3 '1 2  3;4 5   6';
$recipe4 '1';
$recipe5 '1 2 3;4:6';

$A m($recipe1);
$B m($recipe2);
$C m($recipe3);
$D m($recipe4);
$E m($recipe5);

echo 
"<pre>";
echo 
"\$A = m('".$recipe1."') \n";
echo 
"\$A = \n";
print_r($A);
echo 
"\n";

echo 
"\$B = m('".$recipe2."') \n";
echo 
"\$B = \n";
print_r($B);
echo 
"\n";

echo 
"\$C = m('".$recipe3."') \n";
echo 
"\$C = \n";
print_r($C);
echo 
"\n";

echo 
"\$D = m('".$recipe4."') \n";
echo 
"\$D = \n";
print_r($D);
echo 
"\n";

echo 
"\$E = m('".$recipe5."') \n";
echo 
"\$E = \n";
print_r($E);
echo 
"</pre>";

?>

Here is what the output of this script looks like:

$A = m('-1:1; 3:1') 
$A = 
Array
(
    [0] => Array
        (
            [0] => -1
            [1] => 0
            [2] => 1
        )
    [1] => Array
        (
            [0] => 3
            [1] => 2
            [2] => 1
        )
)

$B = m('1;2;3;4') 
$B = 
Array
(
    [0] => Array
        (
            [0] => 1
        )

    [1] => Array
        (
            [0] => 2
        )

    [2] => Array
        (
            [0] => 3
        )

    [3] => Array
        (
            [0] => 4
        )
)

$C = m('1 2  3;4 5   6') 
$C = 
Array
(
    [0] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
        )
    [1] => Array
        (
            [0] => 4
            [1] => 5
            [2] => 6
        )
)

$D = m('1') 
$D = 
Array
(
    [0] => 1
)

$E = m('1 2 3;4:6') 
$E = 
Array
(
    [0] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
        )
    [1] => Array
        (
            [0] => 4
            [1] => 5
            [2] => 6
        )
)

Originally I suggested that we use a mat( ) function to both create and access a matrix. I decided that if we are going to shorten "matrix" to "mat" we might as well go all the way and use an "m" function as the function name. While I think that we could still use one m( ) function for both constructing and accessing matrices, I think it would be better to develop a separate sm( ) function to make initial development of a matrix accessor easier. Finally, using m( ) as the function used to create/access a matrix reminds me of JQuery's $() operator so I don't think there is anything inherently wrong with using a one letter function name if the importance of the operator supports reserving a letter for it. This is arguably the case for an operator designed to create and access matrices.

Permalink 

Matrix manipulation functions for PHP: Part 1 [Math Programming
Posted on January 4, 2012 @ 01:12:29 PM by Paul Meagher

Currently reading some books on Matlab (so that I can program in it's opensource version, Octave, better). In Matlab/Octave you can construct a matrix like so:

mat = [1:3; 4:6]

which returns:

mat =

   1   2   3
   4   5   6

Perhaps in PHP we should be able to do something similiar like this:

$A = mat('1:3; 4:6');

Where mat(' replaces [ and '); replaces ]. Implementing a mat function like this would require implementing 1) a parser for the expression between single quotes (or double quotes), and 2) a code generator that uses the parsed expression to generate the appropriate array or nested arrays.

In the matrix constructor for the JAMA package, there are methods to "fill" the arrays with values but they are very primative compared to matlab/octaves notation for constructing filled matrices. The proposed "mat" function could work in a very complementary way with JAMA's matrix constructor.

A mat function is only one matrix manipulation function that php might implement to make matrix manipulation easier. Other functions that would be useful are linspace, reshape, and repmat. Matlab's/Octave's method for transposing a matrix involves using a single quote operator ' outside the right bracket (e.g., mat = [1:3; 4:6]') that might be functionalized as a transpose or trans operator. Finally, Matlab/Octave also supports an expressive language for accessing the contents of a matrix that would also be useful to have. One possibility is that the mat function might take a second argument, a matrix, that would be accessed via an expression between the single quotes of the first argument like so:

$b = mat('end, 1', $A);

Which equates to:

$b = 4;

I would argue that for PHP to be a usable language for matrix manipulation it should offer matrix manipulation functions like the Matlab/Octave ones proposed here. A few functions like these, inspired by the way Matlab/Octive implements these functions, would take PHP a long way towards being a useful matrix manipulation language. They would be complementary with the JAMA Linear Algebra Package and would make it easier to port Matlab/Octave-based matrix algorithms to PHP. Many vectorized algorithms require the ability to efficiently slice and dice matrices as the algorithm proceeds towards a solution which is what these methods provide.

Permalink 

Information Retrieval Class [Data Mining
Posted on June 5, 2008 @ 02:21:07 PM by Paul Meagher

As I reflected on how I would begin computing summary statistics over an inverted index (which I discussed and implemented in my last post), I realized that I would have to immediately turn my script into a class in order to easily and efficiently pass around the inverted index to the statistical functions. Here is what I have developed so far (see also):

<?php

/**
* Information Retrievel

* Class used to explore information retrieval theory and concepts.
*/

define("DOC_ID"0);
define("TERM_POSITION"1);

class 
IR {

  public 
$num_docs 0;
  
  public 
$corpus_terms = array();

  
/* 
  * Show Documents
  *
  * Helper function that shows the contents of your corpus documents.
  *
  * @param array $D document corpus as array of strings
  */ 
  
function show_docs($D) {
    
$ndocs count($D);
    for(
$doc_num=0$doc_num $ndocs$doc_num++) {
      
?>
      <p>
      Document #<?php echo ($doc_num+1); ?>:<br />
      <?php echo $D[$doc_num]; ?>
      </p>
      <?php
    
}
  }

  
/* 
  * Create Index
  *
  * Creates an inverted index from the supplied corpus documents.
  * Inverted index stored in corpus_terms array.
  *
  * @param array $D document corpus as array of strings
  */   
  
function create_index($D) {  
    
$this->num_docs count($D);    
    for(
$doc_num=0$doc_num $this->num_docs$doc_num++) {      
      
// zero array containing document terms
      
$doc_terms = array();      
      
// simplified word tokenization process
      
$doc_terms explode(" "$D[$doc_num]);      
      
// here is where the indexing of terms to document locations happens
      
$num_terms count($doc_terms);
      for(
$term_position=0$term_position $num_terms$term_position++) {
        
$term strtolower($doc_terms[$term_position]);
        
$this->corpus_terms[$term][]=array($doc_num$term_position);
      }      
    }   
  }

  
/* 
  * Show Index
  *
  * Helper function that outputs inverted index in a standard format.
  */         
  
function show_index() {
    
// sort by key for alphabetically ordered output
    
ksort($this->corpus_terms);
    
// output a representation of the inverted index
    
foreach($this->corpus_terms AS $term => $doc_locations) {
      echo 
"<b>$term:</b> ";
      foreach(
$doc_locations AS $doc_location
        echo 
"{".$doc_location[DOC_ID].", ".$doc_location[TERM_POSITION]."} ";
      echo 
"<br />";  
    }    
  }   

  
/*
  * Term Frequency
  *
  * @param string $term
  * @return frequency of term in corpus
  */
  
function tf($term) {
    
$term strtolower($term);
    return 
count($this->corpus_terms[$term]);
  }
   
  
/*
  * Number Documents With
  * 
  * @param string $term
  * @return number of documents with term
  */
  
function ndw($term) {
    
$term strtolower($term);   
    
$doc_locations $this->corpus_terms[$term];
    
$num_locations count($doc_locations);
    
$docs_with_term = array();
    for(
$doc_location=0$doc_location $num_locations$doc_location++) 
      
$docs_with_term[$i]++;
    return 
count($docs_with_term);     
  }
   
  
/*
  * Inverse Document Frequency
  *
  * @param string $term
  * @return inverse document frequency of term
  */
   
function idf($term) {
     return 
log($this->num_docs)/$this->ndw($term);    
   }       

}

?>

Here is a script I developed to test the methods of the IR class:

<?php

include "IR.php";

$D[0] = "Shipment of gold delivered in a fire";
$D[1] = "Delivery of silver arrived in a silver truck";
$D[2] = "Shipment of gold arrived in a truck";

$ir = new IR();

echo 
"<p><b>Corpus:</b></p>";
$ir->show_docs($D);

$ir->create_index($D);

echo 
"<p><b>Inverted Index:</b></p>";
$ir->show_index();

$term "silver"
$tf  $ir->tf($term);
$ndw $ir->ndw($term);
$idf $ir->idf($term);
echo 
"<p>";
echo 
"Term Frequency of '$term' is $tf<br />";
echo 
"Number Of Documents with $term is $ndw<br />";
echo 
"Inverse Document Frequency of $term is $idf";
echo 
"</p>";  

?>

Here is the output that the script generates:

Corpus:

Document #1:
Shipment of gold delivered in a fire

Document #2:
Delivery of silver arrived in a silver truck

Document #3:
Shipment of gold arrived in a truck

Inverted Index:

a: {0, 5} {1, 5} {2, 5}
arrived: {1, 3} {2, 3}
delivered: {0, 3}
delivery: {1, 0}
fire: {0, 6}
gold: {0, 2} {2, 2}
in: {0, 4} {1, 4} {2, 4}
of: {0, 1} {1, 1} {2, 1}
shipment: {0, 0} {2, 0}
silver: {1, 2} {1, 6}
truck: {1, 7} {2, 6}

Term Frequency of 'silver' is 2
Number Of Documents with silver is 1
Inverse Document Frequency of silver is 1.0986122886681

The next step in this exploration of Information Retrieval Theory will be to develop a table that better summarizes term frequency and inverse term frequencies. I will use this table to verify that the results I'm generating match with tf-idf calculations found in the literature for the sample documents used in my calculations.

Permalink 

Information Retrieval Theory: Implementing an Inverted Index [Data Mining
Posted on June 3, 2008 @ 01:29:29 PM by Paul Meagher

Last night I downloaded a local copy of this book:

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.

Most introductions to Information Retrieval begin by talking about the "Inverted Index" data structure. I had come accross this concept before, however, I haven't explored it in php code because I got hung up on the issues of what corpus to use and, more importantly, what word tokenization approach I should use for the documents. Through my surfing around on the topic of IR, I observed that you can bypass these issues by simplifying the corpus selection and word tokenization issues while retaining the essentials of the representational and computational issues. The code below illustrates a common simplification of the corpus concept and word tokenization concept that might similarly allow you to get started exploring some IR concepts.

<?php

/**
* Creates an inverted index and shows what it looks like.  Useful for
* explorating TF-IDF calculations in Information Retrieval theory.
*
* @see http://en.wikipedia.org/wiki/Inverted_index
*/

define("DOC_ID"0);
define("TERM_POSITION"1);

$D[0] = "Shipment of gold delivered in a fire";
$D[1] = "Delivery of silver arrived in a silver truck";
$D[2] = "Shipment of gold arrived in a truck";

$corpus_terms = array();

$num_docs count($D);

for(
$doc_num=0$doc_num $num_docs$doc_num++) {
  
  
// zero array containing document terms
  
$doc_terms = array();
  
  
// simplified word tokenization process
  
$doc_terms explode(" "$D[$doc_num]);
  
  
// here is where the indexing of terms to document locations happens
  
$num_terms count($doc_terms);
  for(
$term_position=0$term_position $num_terms$term_position++) 
    
$corpus_terms[strtolower($doc_terms[$term_position])][]=array($doc_num$term_position);

}

// sort by key for alphabetically ordered output
ksort($corpus_terms);

$num_terms count($corpus_terms);

// output a representation of the inverted index
foreach($corpus_terms AS $term => $doc_locations) {
  echo 
"<b>$term:</b> ";
  foreach(
$doc_locations AS $doc_location
    echo 
"{".$doc_location[DOC_ID].", ".$doc_location[TERM_POSITION]."} ";
  echo 
"<br />";  
}    
      
?>  

The output of this code looks like this:

a: {0, 5} {1, 5} {2, 5}
arrived: {1, 3} {2, 3}
delivered: {0, 3}
delivery: {1, 0}
fire: {0, 6}
gold: {0, 2} {2, 2}
in: {0, 4} {1, 4} {2, 4}
of: {0, 1} {1, 1} {2, 1}
shipment: {0, 0} {2, 0}
silver: {1, 2} {1, 6}
truck: {1, 7} {2, 6}

Sometimes it is useful to jump ahead in your exploration of certain topics and avoid, for example, the gritty details of document tokenization. What may seem like an avoidance of the problem, might end up informing you about important subtleties in the requirements for your information retrieval system. Exploratory inverted index code can help you to see the computational consequences of representing an inverted index in a certain manner.

For me, the next step in exploring IR theory is to explore in code the Term Frequency–Inverse Document Frequency calculations, generally abbreviated as TF-IDF. Tim Munroe illustrates some of the calculations involved, however, I will need to get my calculations working with the inverted index data structure I've developed so far.

Permalink 

 Archive 
 


php/Math Project
© 2011. All rights reserved.