|
|
Build your own tetrahedron |
[Math Projects] |
|
Posted on October 22, 2012 @ 04:30:35 PM by Paul Meagher
Credits: http://geoite.com/msu/albums/userpics/10002/Tetrahedron-cutout.jpg
To render a 3D tetrahedron, you can download the latest version of Processing (2.0 beta 4 as of this writing) and load this code into it:
# Jennie Meyer # http://faculty.purchase.edu/jeanine.meyer/processing/tetrahedron/applet/tetrahedron.pde
float side = 100; float rad60=60*PI/180;
float rotx = PI/4; float roty = PI/4;
void setup() { size(800,600,P3D); } void draw() { background(0); translate(width/2.0, height/2.0, -100); rotateX(rotx); rotateY(roty); drawtetrahedron(); } void drawtetrahedron() { beginShape(TRIANGLES); vertex(-side/2,0,0); vertex(0,sin(rad60)*(-side),0); vertex(side/2,0,0); endShape(); beginShape(TRIANGLES); vertex(-side/2,0,0); vertex(0,sin(rad60)*(-side)*.5,sin(rad60)*(side)); vertex(side/2,0,0); endShape(); beginShape(TRIANGLES); vertex(-side/2,0,0); vertex(0,sin(rad60)*(-side)*.5,sin(rad60)*(side)); vertex(0,sin(rad60)*(-side),0); endShape(); beginShape(TRIANGLES); vertex(0,sin(rad60)*(-side),0); vertex(0,sin(rad60)*(-side)*.5,sin(rad60)*(side)); vertex(side/2,0,0); endShape(); } void mouseDragged() { float rate = 0.01; rotx += (pmouseY-mouseY) * rate; roty += (mouseX-pmouseX) * rate; //println("rot x is "+rotx*180/PI+"degrees and roty is "+roty*180/PI); }
There is a javascript-based implementation of the Processing language started by John Resig (JQuery founder) and now carried by other developers. The Java version is ahead of the Javascript version of the language in terms of featureset right now. Would be interesting to see the javascript version move more towards parity with the Java version.
My inspiration for learning more about tetrahedrons was a picture of Alexander Graham Bell reclining in an outdoor shack in the shape of a tetrahedron, a shape he considered to be structural perfection.
Credits: Maria McGowan, http://therightcoastnovascotia.blogspot.ca/2009/08/visit-to-alexander-graham-bell-national.html
|
|
Permalink |
|
PHP for doing income tax |
[Financial] |
|
Posted on February 27, 2012 @ 01:54:08 PM by Paul Meagher
I've slipped on my once-a-week blogging schedule because I've been busy the last couple of weeks migrating sites from an old server to a new server. In the last couple of days I've been busy getting ready for income tax season. Today I thought I would share with you a simple program I hacked together to date sort and total my gas receipts. The program could be used to date sort and total other types of receipts provided the text file it reads in is structured with the amount and date on each line with a colon separating them. Read the program comments for more details. The main feature of this program that is noteworthy is how to use of the array_multisort function to solve a real world problem. I haven't used this function in the past, however, it looks like a useful function to be aware of.
<?php
/**
* @script tax-receipts.php
* @author Paul Meagher
* @modified Feb 27/2012
* @version 1.0
*
* Reads in a list of receipts with each line structured like this:
*
* amount : date
*
* Where "amount" has no dollar symbol, comma, or other extraneous symbols
* and "date" is structured in month-day format with zero padding (e.g., 02-01
* for feb 1). Here is an excerpt of my gas-receipts.txt file:
*
* 20.00:05-06
* 32.03:07-13
* 23.87:07-14
* 45.08:09-11
* 23.73:08-15
* 34.00:09-16
* 18.00:06-30
* 63.10:04-21
*
* etc..
*
* So the idea is you enter all your unsorted receipts, for a given category of
* expenses, into a text file in this format then run this program on the receipts
* so it will sort them for you and calculate a total amount. Save the output into
* your tax files so you have a nice record of your expenses for the tax man if he
* comes to audit you.
*
* The reason I created this program was because I had alot of gas receipts all
* thrown together into a pile and wanted to sort them by date to make it easier
* to potentially cross-check with my bank and credit card statements. Using a
* program like this makes sense where you have a large number of unordered
* receipts in need of sorting.
*/
// open receipts file (create text file for each category of expenses)
$fp = fopen("gas-receipts.txt", "r");
// declare and zero variables
$receipt_dates = array();
$receipt_amounts = array();
$receipt_data = array();
// read files items into array
while (!feof($fp)) {
$line=fgets($fp, 512);
list($receipt_amount, $receipt_date) = explode(":", $line);
$receipt_dates[] = trim($receipt_date);
$receipt_amounts[] = trim($receipt_amount);
$receipt_data[] = array('receipt_date'=>$receipt_date, 'receipt_amount'=>$receipt_amount);
}
// close file pointer
fclose($fp);
// sort the receipts by date then amount
array_multisort($receipt_dates, SORT_ASC, $receipt_amounts, SORT_ASC, $receipt_data);
// print out date sorted receipts and compute total amount
$num_receipts = count($receipt_data);
$total_amount = 0.0;
for ($i=0; $i < $num_receipts; $i++) {
echo $receipt_data[$i]['receipt_date']." : $".$receipt_data[$i]['receipt_amount']."<br />";
$total_amount += $receipt_data[$i]['receipt_amount'];
}
// and the total amount is
echo "Total Amount: $$total_amount";
?>
|
|
Permalink |
|
PHP functions to perform linear algebra decompositons |
[Linear Algebra] |
|
Posted on February 11, 2012 @ 09:16:36 PM by Paul Meagher
Here is a session in Octave demonstrating how to compute the QR decomposition of a matrix:
octave-3.2.4.exe:1> m = [1, 2; 3, 4]
m =
1 2
3 4
octave-3.2.4.exe:2> [Q R] = qr(m)
Q =
-0.31623 -0.94868
-0.94868 0.31623
R =
-3.16228 -4.42719
0.00000 -0.63246
To emulate this functional approach (versus object-oriented approach) to decomposing a matrix in PHP, we need to wrap calls to the JAMA Matrix.php object inside a qr( ) function as follows:
<?php
// Set path to JAMA Matrix object. require_once "../Matrix.php";
// Wrap JAMA matrix functionality inside a qr function. function qr($m) { $A = new Matrix($m); $QR = $A->qr(); $ans['R'] = $QR->getR(); $ans['Q'] = $QR->getQ(); return $ans; }
// Create matrix. $m = array(array(1, 2),array(3, 4));
// Qet QR decompostion of a matrix. $ans = qr($m);
// Output Q and R components of answer object. ?>
<pre> Q = <?php print_r($ans['Q']); ?> </pre>
<pre> R = <?php print_r($ans['R']); ?> </pre>
The output of this script looks like this:
Q =
Matrix Object
(
[A] => Array
(
[0] => Array
(
[1] => 0.94868329805051
[0] => -0.31622776601684
)
[1] => Array
(
[1] => -0.31622776601684
[0] => -0.94868329805051
)
)
[m] => 2
[n] => 2
)
R =
Matrix Object
(
[A] => Array
(
[0] => Array
(
[0] => -3.1622776601684
[1] => -4.4271887242357
)
[1] => Array
(
[0] => 0
[1] => 0.63245553203368
)
)
[m] => 2
[n] => 2
)
Interestingly, the answers produced by the Octave and PHP scripts differ in the signs associated with a few of the answer values. Does this matter? Well, if we take the values of Q and R output by PHP, enter them into Octave, and then multiply Q and R together, we get the original matrix back. This reconstruction of the original matrix demonstrates the correctness of the decomposition and that the decomposition of a matrix into Q and R components is not unique.
octave-3.2.4.exe:1> Q = [-0.31622, 0.94868; -0.94868, -0.31622]
Q =
-0.31622 0.94868
-0.94868 -0.31622
octave-3.2.4.exe:2> R = [-3.1622, -4.42718; 0, 0.63245]
R =
-3.16220 -4.42718
0.00000 0.63245
octave-3.2.4.exe:3> Q * R
ans =
0.99995 1.99996
2.99992 3.99998
Note that if I didn't reduce the precision of the Q and R matrix values, the output would be [1 2; 3 4] or the exact version of the original matrix.
This proof-of-concept demonstrates how we can develop a matlab/octave-like functional API for linear algrebra decompositions in PHP by incorporating calls to the JAMA library methods inside qr(), svd(), lu(), eig(), and chol() functions. A functional API for linear algebra would make doing linear algrebra in PHP much more direct, interactive and enjoyable. Object-oriented linear algebra libraries like JAMA play a supporting role, hidden behind a set of linear algebra functions like qr(), svd(), lu(), eig() and chol() for operating on matrices.
|
|
Permalink |
|
Flexible computation of a sum |
[Math Programming] |
|
Posted on February 6, 2012 @ 11:49:32 AM by Paul Meagher
I re-implemented Matlab/Octave's sum function in PHP. The Matlab/Octave sum function is more flexible than PHP's array_sum function as it can handle scalar, vector, or matrix input. In the case of matrices, the sum function can also be directed to sum by column or row. The default is to sum a matrix by column which is handy for getting a bunch of column sums from a data matrix.
The way that the sum function works is similar to the way an ave or stdev function would work in Matlab/Octave. Eventually it would be nice to have all these data summarization functions available and working in a Matlab/Octave like way.
Without futher ado, here is the code that implements the sum function. Note that it reuses the get_type() function mentioned in previous blogs:
<?php /** * Partial implementation of matlab sum function. Does not * handle arrays with dimension greater than 2. * * @see http://www.mathworks.com/help/techdoc/ref/sum.html */
/** * Determines whether a variable is a scalar, vector, or matrix. * * @params mixed $var Scalar, vector, or matrix data. * @return string scalar, vector, matrix or false */ function get_type($var) { if (is_numeric($var)) { return "scalar"; } elseif (is_array($var)) { if (is_array($var[0])) return "matrix"; else return "vector"; } else return false; }
/** * Computes the row or column sums. The default is to compute * column sums and to store each column sum into a row vector. * * @param array $a A one to two dimensional array of numeric values. * @param integer $dim The dimension to sum (1=column sums, 2=row sums). * @return array/string $sum A row ($dim=1) or column ($dim=2) vector of sums. */ function sum($a, $dim=1) { $type = get_type($a); if ($type=="scalar") return array($a); elseif ($type=="vector") return array(array_sum($a)); else { // zero the sums array $sums = array(); // num rows $nrows = count($a); // num cols $ncols = count($a[0]); if ($dim == 1) { // compute sum of matrix columns for ($j=0; $j < $ncols; $j++) { $sums[$j] = 0.0; for($i=0; $i < $nrows; $i++) $sums[$j] += $a[$i][$j]; } return $sums; } else { // compute sum of matrix rows for ($i=0; $i < $nrows; $i++) $sums[$i][0] += array_sum($a[$i]); return $sums; } } }
$s = 5; $v = array(1,2,3); $m = array(array(1,2,3),array(1,2,3),array(1,2,3));
// scalar case $ans = sum($s); echo "<pre>"; print_r($ans); echo "</pre>"; echo "<br />";
// row vector case $ans = sum($v); echo "<pre>"; print_r($ans); echo "</pre>"; echo "<br />";
// matrix case - sum cols (default) $ans = sum($m); echo "<pre>"; print_r($ans); echo "</pre>"; echo "<br />";
// matrix case - sum rows $ans = sum($m, 2); echo "<pre>"; print_r($ans); echo "</pre>"; echo "<br />";
// compute sum of all elements $ans = sum(sum($m)); echo "<pre>"; print_r($ans); echo "</pre>"; echo "<br />";
?>
The output of this script looks like this:
Array
(
[0] => 5
)
Array
(
[0] => 6
)
Array
(
[0] => 3
[1] => 6
[2] => 9
)
Array
(
[0] => Array
(
[0] => 6
)
[1] => Array
(
[0] => 6
)
[2] => Array
(
[0] => 6
)
)
Array
(
[0] => 18
)
One important point to note is that when we sum a single scalar value like sum(5) the result is an array and not a scalar value. I suspect that it is of fundamental importance to matrix-oriented languages like Matlab/Octave that results be wrapped in an array structure so that we can string together matrix operations in a conistent manner.
|
|
Permalink |
|
Elementwise operations |
[Math Programming] |
|
Posted on January 28, 2012 @ 06:33:46 AM by Paul Meagher
In my last post, I discussed how basic arithmetic operations might be vectorized using PHP. I wasn't very satisfied with the implemention. One issue was that I was duplicating alot of code and felt that the vectorization code could be reused across vectorized arithmetic operations better. Another issue was that I didn't study how matlab/octave elementwise operations worked in detail so didn't know what the allowable argument types were (scalar, vector, matrix) and what combination of argument types was allowed on both sides of the arithmetic operation. I'm happy to report that I've addressed both of these issues in the new implementation. While I could probably add more error handling, I think the code below could be used to easily vectorize many operations in PHP such as exp, sin, cos, pow, etc...
<?php /** * elementwise_operations.php * * Functions that mimic matlab/octave's elementwise operations (e.g., .+, .-, .*, ./): * * .+ = madd = element-by-element addition * .- = msub = element-by-element subtraction * .* = mmul = element-by-element multiplication * ./ = mdiv = element-by-element division */
/** * Determines whether a variable is a scalar, vector, or matrix */ function get_type($var) { if (is_numeric($var)) { return "scalar"; } elseif (is_array($var)) { if (is_array($var[0])) return "matrix"; else return "vector"; } else return false; }
/** * Function that vectorizes the elementary operations. */ function vectorize($a, $b, $op) { $atype = get_type($a); $btype = get_type($b);
if (($atype != false) AND ($btype != false)) $arguments = $atype."-".$btype; else die("Argument is not a scalar, vector or matrix."); switch ($arguments) { case "scalar-scalar": return $op($a, $b); break;
case "vector-scalar": $asize = count($a); $b = array_fill(0, $asize, $b); return array_map($op, $a, $b); break;
case "vector-vector": $asize = count($a); $bsize = count($b); if ($asize == $bsize) return array_map($op, $a, $b); else die("Nonconformant arguments."); break; case "matrix-scalar": $nrows = count($a); $ncols = count($a[0]); for($i=0; $i<$nrows; $i++) { $bvec = array_fill(0, $ncols, $b); $c[$i] = array_map($op, $a[$i], $bvec); } return $c; break; case "matrix-matrix": $arows = count($a); $acols = count($a[0]); $brows = count($b); $bcols = count($b[0]); if (($arows == $brows) AND ($acols == $bcols)) { for($i=0; $i<$arows; $i++) $c[$i] = array_map($op, $a[$i], $b[$i]); return $c; } else die("Nonconformant arguments."); break; } }
// Elementary arithmetic operations.
function _add($addend1, $addend2) { return ($addend1 + $addend2); }
function _sub($minuend, $subtrahend) { return ($minuend - $subtrahend); }
function _mul($factor1, $factor2) { return ($factor1 * $factor2); }
function _div($numerator, $divisor) { return ($numerator / $divisor); }
// Elementwise operators.
function madd($m, $addend) { return vectorize($m, $addend, "_add"); }
function msub($m, $subtrahend) { return vectorize($m, $subtrahend, "_sub"); }
function mmul($m, $factor) { return vectorize($m, $factor, "_mul"); }
function mdiv($m, $divisor) { return vectorize($m, $divisor, "_div"); }
// Test elementwise addition.
$s = 5; $v = array(1, 2, 3); $m = array(array(1, 2, 3), array(4, 5, 6));
// scalar-scalar case $ans = madd($s, $s); print_r($ans); echo "<br />";
// vector-scalar case $ans = madd($v, $s); print_r($ans); echo "<br />";
// vector-vector case $ans = madd($v, $v); print_r($ans); echo "<br />";
// matrix-scalar case $ans = madd($m, $s); print_r($ans); echo "<br />";
// matrix-matrix case $ans = madd($m, $m); print_r($ans);
// What the output looks like:
// 10 // Array ( [0] => 6 [1] => 7 [2] => 8 ) // Array ( [0] => 2 [1] => 4 [2] => 6 ) // Array ( [0] => Array ( [0] => 6 [1] => 7 [2] => 8 ) [1] => Array ( [0] => 9 [1] => 10 [2] => 11 ) ) // Array ( [0] => Array ( [0] => 2 [1] => 4 [2] => 6 ) [1] => Array ( [0] => 8 [1] => 10 [2] => 12 ) )
?>
|
|
Permalink |
|
Vectorizing basic arithmetic operations |
[Math Programming] |
|
Posted on January 23, 2012 @ 12:45:37 AM by Paul Meagher
Continuing on with my research into adding better matrix/vector capabilities to PHP, I decided to implement the basic arithmetic functions (addition, subtraction, division, multiplication) as vectorized functions. To make these functions concise and easy to remember, I used a system of function naming that involves prefixing the function call with a "v" (for "vector") and then using the first three letters of the arithmetic operation name (add, sub, div, mul) as the suffix:
- vadd($v, $addend): vector addition
- vsub($v, $subtrahend): vector subtaction
- vdiv($v, $divisor): vector division
- vmul($v, $factor): vector multiplication
In all cases, the first argument is a vector of numbers, followed by the value to apply to each element of the vector. The code below is proof-of-concept of how to vectorize these functions and is not meant as production-level code:
<?php /** * Set of functions that vectorize the basic arithmetic functions: * addition, subtraction, division, multiplication. No error * checking. */
// function for vectorized addition
function _add($addend1, $addend2) { return ($addend1 + $addend2); }
function vadd($v, $addend) { $size = count($v); $add_array = array_fill(0, $size, $addend); return array_map("_add", $v, $add_array); }
// function for vectorized subtraction
function _sub($minuend, $subtrahend) { return ($minuend - $subtrahend); }
function vsub($v, $subtrahend) { $size = count($v); $sub_array = array_fill(0, $size, $subtrahend); return array_map("_sub", $v, $sub_array); }
// function for vectorized multiplication
function _mul($factor1, $factor2) { return ($factor1 * $factor2); }
function vmul($v, $factor) { $size = count($v); $mul_array = array_fill(0, $size, $factor); return array_map("_mul", $v, $mul_array); }
// function for vectorized division
function _div($numerator, $divisor) { return ($numerator / $divisor); }
function vdiv($v, $divisor) { $size = count($v); $div_array = array_fill(0, $size, $divisor); return array_map("_div", $v, $div_array); }
$v = array(5, 10, 15); $arg = 5;
$b = vadd($v, $arg); print_r($b); echo "<br />";
$b = vsub($v, $arg); print_r($b); echo "<br />";
$b = vmul($v, $arg); print_r($b); echo "<br />";
$b = vdiv($v, $arg); print_r($b); echo "<br />";
// Output:
// Array ( [0] => 10 [1] => 15 [2] => 20 ) // Array ( [0] => 0 [1] => 5 [2] => 10 ) // Array ( [0] => 25 [1] => 50 [2] => 75 ) // Array ( [0] => 1 [1] => 2 [2] => 3 )
?>
The alternative to vectorized functions is the ad-hoc creation of loops to perform similar vectorized operations in your code. Think how much programming time and effort has been wasted coding loops to do vectorized arithmetic when there could be vectorized arithmetic functions available to do it all in one simple command.
This only touches the surface of the number of mathematical functions that should be vectorized in PHP (e.g., vsin, vcos, vexp, etc...) to make PHP a much more efficient language to program numeric-based algorithms. We could easily add a "v" suffix to all these function to indicate that they are all vector-based.
Another option to consider would be to go even further and have these functions handle matrix operations, with vectors being just the special case of a one-dimensional matrix. In that case, we might want to instead prefix all the functions with an "m" instead of a "v". I had considered this, but was too lazy to implement functions to handle the matrix case. This is a good exercise for the reader.
At the end of the day we should look at languages like Matlab/Octave to give us guidance on how to proceed with respect to the range of functions to vectorize and whether we should handle vector or matrix arguments to these functions. While we may not be able to use Matlab/Octave's concise syntax to code vectorized algorithms, we can develop slightly more verbose functions like vadd, vsub, vdiv, and vmul to efficiently code numeric algorithms.
|
|
Permalink |
|
Matrix manipulation functions for PHP: Part 3 |
[Math Programming] |
|
Posted on January 14, 2012 @ 05:28:18 PM by Paul Meagher
Another useful function that matlab/octave supports is the linspace(start, end, num_points) function. The linspace function returns an array of n numbers that are equally spaced between the start and end points. A function like this is very useful when graphing an axis and when testing vectorized functions. You will see it used quite frequently in matlab/octave examples.
Below is an implementation that captures the behavior of Octave's linspace function. It also includes some code to test the output of the linspace function.
<?php
define("PRECISION", 4);
function linspace($start, $end, $num_points) {
$format = "%01.".PRECISION."f";
$start = (float)$start; $end = (float)$end; $num_points = (int)$num_points - 1; $difference = abs($end-$start); if ($num_points <= 1) return array($end); if ($num_points == 2) return array($start, $end); if ($num_points >= 3) {
if ($difference > 0) {
$increment = sprintf($format, $difference/$num_points); $increment = $difference/$num_points; $A[0] = sprintf($format, $start); for ($i=1; $i < $num_points; $i++) { $A[$i] = sprintf($format, ($A[$i-1] + $increment)); $next_num = $A[$i]; } $A[] = sprintf($format, $end);
return $A;
} else { return array_fill(0, $num_points+1, $start); }
}
}
// Testing the output of the linspace function.
echo "<pre>"; echo "linspace(1, 5, 4) \n"; print_r(linspace(1, 5, 4)); echo "\n"; echo "linspace(1, 1.8, 4) \n"; print_r(linspace(1, 1.8, 4)); echo "\n"; echo "linspace(1, 1, 4) \n"; print_r(linspace(1, 1, 4)); echo "\n"; echo "linspace(5, 7, 0)\n"; print_r(linspace(5, 7, 0)); echo "</pre>"; ?>
The output of this script looks like this:
linspace(1, 5, 4)
Array
(
[0] => 1.0000
[1] => 2.3333
[2] => 3.6666
[3] => 5.0000
)
linspace(1, 1.8, 4)
Array
(
[0] => 1.0000
[1] => 1.2667
[2] => 1.5334
[3] => 1.8000
)
linspace(1, 1, 4)
Array
(
[0] => 1
[1] => 1
[2] => 1
[3] => 1
)
linspace(5, 7, 0)
Array
(
[0] => 7
)
An interesting aspect of the behavior of the linspace function is that it handles odd cases in ways that one might think should throw an error. Instead of throwing an error when you enter, for example, linspace(5, 7, 0) - because you can't return 0 points - it instead tries to do something sensible and returns the end point (i.e., 7). Whether this is sensible or not I will leave you to decide, however, the larger point is that Octave seems biased towards returning a numeric result of some sort rather than an error result. If you don't know the detailed behavior of these functions you could end up with some nasty surprises in terms of the result returned; perhaps it would be better if an error gets returned rather than a numeric result in some cases. Nevertheless, when doing mathematical programming you should consider whether it is better to throw an error result in response to odd inputs or try to generate a sensible numeric answer by, for example, anticipating the type of mistakes a user might make when passing arguments to a function and handling this mistakes appropriately (e.g., the low and hi values might be in reverse order from what is expected, so reverse them, and give the answer based upon the corrected ordering of arguments). A good way to master Octave is to port some of it's matrix manipulation functions to PHP. To do so you will need to observe how the Octave function behaves with boundary cases; a type of study you probably would not have engaged in otherwise.
On another note, it is my hope that I can begin to revive this blog again in 2012. I don't have time to blog as frequently as I did in the past, however, I would like to try for at least one blog post per week - morely like on weekends when I have more time for recreational math, than on weekdays. This is one of my resolutions in 2012 so stay tuned to see if this pans out :-)
|
|
Permalink |
|
Matrix manipulation functions for PHP: Part 2 |
[Math Programming] |
|
Posted on January 7, 2012 @ 01:41:26 PM by Paul Meagher
In the last post, I bemoaned the fact that PHP lacks some useful matrix manipulation functions that matlab/octave support. I decided to see how difficult it would be to create a script that would 1) parse a matlab/octave-like recipe for creating a matrix, and 2) generate a matrix based upon that parse. I initially thought such work would require some tricky regular expressions for parsing the recipe, but it appears that the language for specifying matrices was designed to make parsing relatively easy and that major parts of the parsing could be handled by exploding on various control characters. I'm not claiming that the code below implements all of the matlab/octave language for specifying matrices, however, it does support much of functionality you will see discussed in matlab/octave books.
<?php /** * @script matrix.php * * Script for creating matrices using matlab/octave like recipes. * * @author Paul Meagher * @version 0.1 * @modified Jan 6, 2012 */
/** * Returns a parse of the recipe for a desired matrix. * The parse that is returned contains a "valid" flag * that is set to false (i.e., 0) if the number of * columns in each row is unequal. */ function parse_matrix_recipe($recipe) { // Initialize parse validity status to 1. Parse validity // status will be set to 0 if expression is invalid. $parse['valid'] = 1; $num_rows = 0; $num_cols = 0;
// Get rid of excess whitespace. $recipe = preg_replace('/\s\s+/', ' ', $recipe); // Get all the rows. $rows = explode(';', $recipe); $num_rows = count($rows); // Parse all the rows for($i=0; $i<$num_rows; $i++) { // Check first if the row has the : range delimiter. if (substr_count($rows[$i], ":")) { list($start, $limit) = explode(":", $rows[$i]); if (settype($start, "int") AND settype($limit, "int")) { $parse[$i]['start'] = $start; $parse[$i]['limit'] = $limit; // Counting number of elements generated by range function // is an inefficient way of checking that all rows have // same number of columns, but it work. Optimize later :-) if ($i == 0) $init_num_cols = $row_num_cols = count(range($start, $limit)); else $row_num_cols = count(range($start,$limit)); } else { // Cannot convert range values to integers. $parse['valid'] = 0; break; } } else { $parse[$i]['array'] = explode(" ", $rows[$i]); if ($i == 0) $init_num_cols = $row_num_cols = count($parse[$i]['array']); else $row_num_cols = count($parse[$i]['array']); }
// Exit loop and set parse to false if one row has more columns // than another. if ($init_num_cols != $row_num_cols) { $parse['valid'] = 0; break; } } // If parse is valid, set matrix dimensions. if ($parse['valid'] == 1) { $parse['num_rows'] = $num_rows; $parse['num_cols'] = $init_num_cols; } return $parse;
}
/** * Uses parse elements to generate and return a matrix */ function generate_matrix($parse) { $mat = array(); $num_rows = $parse['num_rows']; if ($num_rows == 1) {
// Assign supplied array to vector. if (isset($parse[0]['array'])) $vec = $parse[0]['array']; // Assign specified range of values to vector. else $vec = range($parse[0]['start'], $parse[0]['limit']); return $vec; } else { for($i=0; $i<$num_rows; $i++) {
// Assign supplied array to row. if (isset($parse[$i]['array'])) $mat[$i] = $parse[$i]['array']; // Assign range of values to row. else $mat[$i] = range($parse[$i]['start'], $parse[$i]['limit']); } return $mat; }
}
/** * Main function for creating matrices. */ function m($recipe) { $parse = parse_matrix_recipe($recipe); if ($parse['valid'] == 1) return generate_matrix($parse); else return array(); }
?>
As you can see above, the main function used to specify a matrix is the m( ) function. I envision a complementary submatrix operator, sm( ), that would be used to access a submatrix of a supplied matrix (i.e., sm($recipe, $A)). Matlab/octaves language for specifying a submatrix is more complex than the language for specifying a matrix, so coding the sm( ) function will be a bit more challenging. You don't really know how challenging, however, until you try ...
Below is a test script that shows that the matrix.php script outputs the appropriate matrices given matlab/octave like recipes.
<?php
include "matrix.php";
$recipe1 = '-1:1; 3:1'; $recipe2 = '1;2;3;4'; $recipe3 = '1 2 3;4 5 6'; $recipe4 = '1'; $recipe5 = '1 2 3;4:6';
$A = m($recipe1); $B = m($recipe2); $C = m($recipe3); $D = m($recipe4); $E = m($recipe5);
echo "<pre>"; echo "\$A = m('".$recipe1."') \n"; echo "\$A = \n"; print_r($A); echo "\n";
echo "\$B = m('".$recipe2."') \n"; echo "\$B = \n"; print_r($B); echo "\n";
echo "\$C = m('".$recipe3."') \n"; echo "\$C = \n"; print_r($C); echo "\n";
echo "\$D = m('".$recipe4."') \n"; echo "\$D = \n"; print_r($D); echo "\n";
echo "\$E = m('".$recipe5."') \n"; echo "\$E = \n"; print_r($E); echo "</pre>";
?>
Here is what the output of this script looks like:
$A = m('-1:1; 3:1') $A = Array ( [0] => Array ( [0] => -1 [1] => 0 [2] => 1 ) [1] => Array ( [0] => 3 [1] => 2 [2] => 1 ) )
$B = m('1;2;3;4') $B = Array ( [0] => Array ( [0] => 1 )
[1] => Array ( [0] => 2 )
[2] => Array ( [0] => 3 )
[3] => Array ( [0] => 4 ) )
$C = m('1 2 3;4 5 6') $C = Array ( [0] => Array ( [0] => 1 [1] => 2 [2] => 3 ) [1] => Array ( [0] => 4 [1] => 5 [2] => 6 ) )
$D = m('1') $D = Array ( [0] => 1 )
$E = m('1 2 3;4:6') $E = Array ( [0] => Array ( [0] => 1 [1] => 2 [2] => 3 ) [1] => Array ( [0] => 4 [1] => 5 [2] => 6 ) )
Originally I suggested that we use a mat( ) function to both create and access a matrix. I decided that if we are going to shorten "matrix" to "mat" we might as well go all the way and use an "m" function as the function name. While I think that we could still use one m( ) function for both constructing and accessing matrices, I think it would be better to develop a separate sm( ) function to make initial development of a matrix accessor easier. Finally, using m( ) as the function used to create/access a matrix reminds me of JQuery's $() operator so I don't think there is anything inherently wrong with using a one letter function name if the importance of the operator supports reserving a letter for it. This is arguably the case for an operator designed to create and access matrices.
|
|
Permalink |
|
Matrix manipulation functions for PHP: Part 1 |
[Math Programming] |
|
Posted on January 4, 2012 @ 01:12:29 PM by Paul Meagher
Currently reading some books on Matlab (so that I can program in it's opensource version, Octave, better). In Matlab/Octave you can construct a matrix like so:
mat = [1:3; 4:6]
which returns:
mat =
1 2 3
4 5 6
Perhaps in PHP we should be able to do something similiar like this:
$A = mat('1:3; 4:6');
Where mat(' replaces [ and '); replaces ]. Implementing a mat function like this would require implementing 1) a parser for the expression between single quotes (or double quotes), and 2) a code generator that uses the parsed expression to generate the appropriate array or nested arrays.
In the matrix constructor for the JAMA package, there are methods to "fill" the arrays with values but they are very primative compared to matlab/octaves notation for constructing filled matrices. The proposed "mat" function could work in a very complementary way with JAMA's matrix constructor.
A mat function is only one matrix manipulation function that php might implement to make matrix manipulation easier. Other functions that would be useful are linspace, reshape, and repmat. Matlab's/Octave's method for transposing a matrix involves using a single quote operator ' outside the right bracket (e.g., mat = [1:3; 4:6]') that might be functionalized as a transpose or trans operator. Finally, Matlab/Octave also supports an expressive language for accessing the contents of a matrix that would also be useful to have. One possibility is that the mat function might take a second argument, a matrix, that would be accessed via an expression between the single quotes of the first argument like so:
$b = mat('end, 1', $A);
Which equates to:
$b = 4;
I would argue that for PHP to be a usable language for matrix manipulation it should offer matrix manipulation functions like the Matlab/Octave ones proposed here. A few functions like these, inspired by the way Matlab/Octive implements these functions, would take PHP a long way towards being a useful matrix manipulation language. They would be complementary with the JAMA Linear Algebra Package and would make it easier to port Matlab/Octave-based matrix algorithms to PHP. Many vectorized algorithms require the ability to efficiently slice and dice matrices as the algorithm proceeds towards a solution which is what these methods provide.
|
|
Permalink |
|
Information Retrieval Class |
[Data Mining] |
|
Posted on June 5, 2008 @ 02:21:07 PM by Paul Meagher
As I reflected on how I would begin computing summary statistics over an inverted index (which I discussed and implemented in my last post), I realized that I would have to immediately turn my script into a class in order to easily and efficiently pass around the inverted index to the statistical functions. Here is what I have developed so far (see also):
<?php
/** * Information Retrievel * * Class used to explore information retrieval theory and concepts. */
define("DOC_ID", 0); define("TERM_POSITION", 1);
class IR {
public $num_docs = 0; public $corpus_terms = array();
/* * Show Documents * * Helper function that shows the contents of your corpus documents. * * @param array $D document corpus as array of strings */ function show_docs($D) { $ndocs = count($D); for($doc_num=0; $doc_num < $ndocs; $doc_num++) { ?> <p> Document #<?php echo ($doc_num+1); ?>:<br /> <?php echo $D[$doc_num]; ?> </p> <?php } }
/* * Create Index * * Creates an inverted index from the supplied corpus documents. * Inverted index stored in corpus_terms array. * * @param array $D document corpus as array of strings */ function create_index($D) { $this->num_docs = count($D); for($doc_num=0; $doc_num < $this->num_docs; $doc_num++) { // zero array containing document terms $doc_terms = array(); // simplified word tokenization process $doc_terms = explode(" ", $D[$doc_num]); // here is where the indexing of terms to document locations happens $num_terms = count($doc_terms); for($term_position=0; $term_position < $num_terms; $term_position++) { $term = strtolower($doc_terms[$term_position]); $this->corpus_terms[$term][]=array($doc_num, $term_position); } } }
/* * Show Index * * Helper function that outputs inverted index in a standard format. */ function show_index() { // sort by key for alphabetically ordered output ksort($this->corpus_terms); // output a representation of the inverted index foreach($this->corpus_terms AS $term => $doc_locations) { echo "<b>$term:</b> "; foreach($doc_locations AS $doc_location) echo "{".$doc_location[DOC_ID].", ".$doc_location[TERM_POSITION]."} "; echo "<br />"; } }
/* * Term Frequency * * @param string $term * @return frequency of term in corpus */ function tf($term) { $term = strtolower($term); return count($this->corpus_terms[$term]); } /* * Number Documents With * * @param string $term * @return number of documents with term */ function ndw($term) { $term = strtolower($term); $doc_locations = $this->corpus_terms[$term]; $num_locations = count($doc_locations); $docs_with_term = array(); for($doc_location=0; $doc_location < $num_locations; $doc_location++) $docs_with_term[$i]++; return count($docs_with_term); } /* * Inverse Document Frequency * * @param string $term * @return inverse document frequency of term */ function idf($term) { return log($this->num_docs)/$this->ndw($term); }
}
?>
Here is a script I developed to test the methods of the IR class:
<?php
include "IR.php";
$D[0] = "Shipment of gold delivered in a fire"; $D[1] = "Delivery of silver arrived in a silver truck"; $D[2] = "Shipment of gold arrived in a truck";
$ir = new IR();
echo "<p><b>Corpus:</b></p>"; $ir->show_docs($D);
$ir->create_index($D);
echo "<p><b>Inverted Index:</b></p>"; $ir->show_index();
$term = "silver"; $tf = $ir->tf($term); $ndw = $ir->ndw($term); $idf = $ir->idf($term); echo "<p>"; echo "Term Frequency of '$term' is $tf<br />"; echo "Number Of Documents with $term is $ndw<br />"; echo "Inverse Document Frequency of $term is $idf"; echo "</p>";
?>
Here is the output that the script generates:
Corpus:
Document #1:
Shipment of gold delivered in a fire
Document #2:
Delivery of silver arrived in a silver truck
Document #3:
Shipment of gold arrived in a truck
Inverted Index: a: {0, 5} {1, 5} {2, 5} arrived: {1, 3} {2, 3} delivered: {0, 3} delivery: {1, 0} fire: {0, 6} gold: {0, 2} {2, 2} in: {0, 4} {1, 4} {2, 4} of: {0, 1} {1, 1} {2, 1} shipment: {0, 0} {2, 0} silver: {1, 2} {1, 6} truck: {1, 7} {2, 6}
Term Frequency of 'silver' is 2 Number Of Documents with silver is 1 Inverse Document Frequency of silver is 1.0986122886681
The next step in this exploration of Information Retrieval Theory will be to develop a table that better summarizes term frequency and inverse term frequencies. I will use this table to verify that the results I'm generating match with tf-idf calculations found in the literature for the sample documents used in my calculations.
|
|
Permalink |
|
Information Retrieval Theory: Implementing an Inverted Index |
[Data Mining] |
|
Posted on June 3, 2008 @ 01:29:29 PM by Paul Meagher
Last night I downloaded a local copy of this book:
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.
Most introductions to Information Retrieval begin by talking about the "Inverted Index" data structure. I had come accross this concept before, however, I haven't explored it in php code because I got hung up on the issues of what corpus to use and, more importantly, what word tokenization approach I should use for the documents. Through my surfing around on the topic of IR, I observed that you can bypass these issues by simplifying the corpus selection and word tokenization issues while retaining the essentials of the representational and computational issues. The code below illustrates a common simplification of the corpus concept and word tokenization concept that might similarly allow you to get started exploring some IR concepts.
<?php
/** * Creates an inverted index and shows what it looks like. Useful for * explorating TF-IDF calculations in Information Retrieval theory. * * @see http://en.wikipedia.org/wiki/Inverted_index */
define("DOC_ID", 0); define("TERM_POSITION", 1);
$D[0] = "Shipment of gold delivered in a fire"; $D[1] = "Delivery of silver arrived in a silver truck"; $D[2] = "Shipment of gold arrived in a truck";
$corpus_terms = array();
$num_docs = count($D);
for($doc_num=0; $doc_num < $num_docs; $doc_num++) { // zero array containing document terms $doc_terms = array(); // simplified word tokenization process $doc_terms = explode(" ", $D[$doc_num]); // here is where the indexing of terms to document locations happens $num_terms = count($doc_terms); for($term_position=0; $term_position < $num_terms; $term_position++) $corpus_terms[strtolower($doc_terms[$term_position])][]=array($doc_num, $term_position);
}
// sort by key for alphabetically ordered output ksort($corpus_terms);
$num_terms = count($corpus_terms);
// output a representation of the inverted index foreach($corpus_terms AS $term => $doc_locations) { echo "<b>$term:</b> "; foreach($doc_locations AS $doc_location) echo "{".$doc_location[DOC_ID].", ".$doc_location[TERM_POSITION]."} "; echo "<br />"; } ?>
The output of this code looks like this:
a: {0, 5} {1, 5} {2, 5} arrived: {1, 3} {2, 3} delivered: {0, 3} delivery: {1, 0} fire: {0, 6} gold: {0, 2} {2, 2} in: {0, 4} {1, 4} {2, 4} of: {0, 1} {1, 1} {2, 1} shipment: {0, 0} {2, 0} silver: {1, 2} {1, 6} truck: {1, 7} {2, 6}
Sometimes it is useful to jump ahead in your exploration of certain topics and avoid, for example, the gritty details of document tokenization. What may seem like an avoidance of the problem, might end up informing you about important subtleties in the requirements for your information retrieval system. Exploratory inverted index code can help you to see the computational consequences of representing an inverted index in a certain manner.
For me, the next step in exploring IR theory is to explore in code the
Term Frequency–Inverse Document Frequency calculations, generally abbreviated as TF-IDF. Tim Munroe illustrates some of the calculations involved, however, I will need to get my calculations working with the inverted index data structure I've developed so far.
|
|
Permalink |
|
php/Math Project
© 2011. All rights reserved.
|