Recreational Mathematics   
   home  |  library  |  contact
 Math Notes
 Math Programming [25]
 Regression [3]
 Data Mining [17]
 Notation [6]
 Linear Algebra [9]
 Stats & Prob [15]
 Math Cognition [5]
 Space & Physics [6]
 Formulas [5]
 Fun & Games [2]
 Haskell [1]
 Bayes Theory [1]
 Site News [0]
 Math Projects [5]
 Polynomials [1]
 Calculus [9]
 Number Theory [3]
 Optimization [2]
 Financial [1]

 Math Links
 Andrew Gelman
 Chance Wiki
 Daniel Lemire
 KD Knuggets
 Social Stats
 MySQL Performance
 Matthew Hurst
 Hal Daume III
 Math Notes >> Regression

New regression transformations [Regression
Posted on December 1, 2006 @ 04:45:32 PM by Paul Meagher

Added three new normalizing methods to the REGRESS_Transform class (which is part of the REGRESS package):

  • logit($vector)
  • fisherz($vector)
  • boxcox($vector, $exponent)

You can see their implementation below:

* Class for performing vector transformations used in regression analysis.
* Paul Meagher 
* @file.license PHP v3.0   
* @file.version 0.2
* @file.modified Dec 1, 2006
class REGRESS_Transform {

 * Log transformation
 * Generally useful for normalizing observations
function log($vector) {
$vector as $value
$log_vector[] = log($value); 

 * Square root transformation
 * Useful for normalizing counts
function sqrt($vector) {
$vector as $count
$sqrt_vector[] = sqrt($count); 

 * Logit transformation
 * Useful for normalizing proportions p
function logit($vector) {
$vector as $p
$logit_vector[] = 0.5*log($p/(1-$p)); 

 * Fisher z-transformation
 * Useful for normalizing correlations r
function fisherz($vector) {
$vector as $r
$fisherz_vector[] = 0.5*log((1+$r)/(1-$r)); 

  * Box-Cox power transformation
  * Sligthly modified power transformation
function boxcox($vector$exponent) {
    if (
$exponent != 0) {
$vector as $value
$boxcox_vector[] = ((pow($value$exponent) - 1) / $exponent); 
    } else { 
$vector as $value
$boxcox_vector[] = log($value); 


The Box-Cox transformation can be improved upon by using the following formula to estimate the best exponent value e to use for the boxcox transformation:

likelihood(e) = -n/2 * log(pow((1/n * sum(pow(x, e))) - mean(x, e), 2)) + (e - 1) * sum(log(x))


mean(x,e) = 1/n * sum(pow(x,e))


Develop a php program that estimates the optimal value of e to use in the boxcox() transformation by selecting the value of e that maximizes the likelihood expression above.


Regression programming with PHP [Regression
Posted on April 18, 2006 @ 12:48:40 PM by Paul Meagher

The following script illustrates how I would like to be able to interact with the REGRESS Package. The REGRESS Package has not been refactored to work in this manner so the included classes are mostly vaporware at the moment. What the code illustrates is the minimal set of classes that could be used to perform a basic data mining cycle - retrieve two columns of data from a database, perform a statistical data analysis on the two columns, display the results in tabular and graphical formats.

* @package REGRESS 
* fire_damage_analysis.php
* @file.type example    
* Example of code for extracting two columns of 
* data from a database table, applying a transform
* to the data, submitting the data to simple regression
* analysis, generating data tables, generating data
* graphs.
* Paul Meagher
* @file.license LGPL
* @file.modified Apr 18, 2006 @ 9:15 am AST
* @file.version 0.1

require_once "Math/REGRESS/Simple.php";    
// Data Source Name    
$DSN "mysql::Insurance.FireDamage";

$data = new REGRESS_DB_Table($DSN);

// supply field names of data columns you want to extract
$X  $data->col("distance");
$Y  $data->col("damage");    

// php does not offer vectorized functions so we create a 
// class to do it
$TR = new REGRESS_Transform;

$LOG_X $TR->log($X);    

// Confidence Interval (95 = 95% confidence interval) 
$CI 95
$SR = new REGRESS_Simple($LOG_X$Y$CI); 

// set labels here so we do not have to repeat ourselves 
// for tabular and graphic output 

"<h1>Fire Damage Study</h1>";

$table = new REGRESS_HTML_Table($SR);    

// show observed, predicted, and residual values

// show Analysis Of Variance table

// show Maximum Liklihood Estimates of the parameters   

// show formula

// show R values 

// REGRESS_Graph encapsulates a graphing package.
$graph = new REGRESS_Graph($SR);

// show scatter plot with line of best fit

// graph the size of the residual error    

Things to note about this code:

  1. The REGRESS_ prefix makes this code base more PEAR::Math friendly.
  2. The instance variables that are populated by the regression analysis are passed directly to the HTML_Table and Graph objects. The scheme used to pass analytic results between the Regression objects and the HTML_Table and Graph objects will need to anticipate their use in multiple regression contexts.
  3. The list of tabular reports and graphical reports that are currently generated are not meant to be exhaustive. There are many other diagnostic reports and analysis that could be generated under the "Simple Regression" rubric. The package will eventually need to support the generation of navigable multi-page output rather than one long page of tables and graphs.

My current objective is to refactor the code base for simple regression so that it consists of the above classes. At the same time, I want to turn the current MultipleRegression.php script into a class (i.e., REGRESS_Multiple or REGRESS_Multivariate) that also interacts with the DB_Table, HTML_Table and Graph objects to make it easy to engage in data mining cycles using PHP and a web browser.


Towards a REGRESS Package [Regression
Posted on March 27, 2006 @ 02:19:21 PM by Paul Meagher

Spent some time this weekend re-acquainting myself with some work on multiple regression I did about 10 months ago. I intend to begin moving the "Regression Project" forward again.

The MultipleRegression.php class is not yet ready for public consumption. It currently exists as a linear script that reads in a multivariate data array, performs the standard multiple regression computation (using the JAMA linear algebra library), and outputs some standard diagnostic tables (using the PDL probability distributions library to evaluate the results). The MultipleRegression.php script now has to be turned into a class with methods.

The MultipleRegression.php class will need to exist within a larger framework that has good analytic coverage and congruency with how we might want to think about various types of regression problems.

What I propose is a REGRESS Package with these initial classes:

include "Math/REGRESS/Regression.php";
include "Math/REGRESS/SimpleRegression.php";
include "Math/REGRESS/MultipleRegression.php";
include "Math/REGRESS/lib/Transformation.php";

These classes would be installed when you installed the REGRESS package.

In addition, there would be other non-installable directories containing "examples", "tests", "docs" ("EN", "FR", etc...), and "downloads" (two types downloads "full" and "install").

Longer term plans

The next iteration of the library might include these classes:

include "Math/REGRESS/LogisticRegression.php";
include "Math/REGRESS/PolynomialRegression.php";

There are also other more arcane types of regression which might be developed on future iterations such as:

include "Math/REGRESS/RidgeRegression.php";
include "Math/REGRESS/ProjectionPursuitRegression.php";
include "Math/REGRESS/GeneralLinearModel.php";

Bayesian mirror classes

What about the increasingly influential forms of Bayesian regression?

Mirror classes might be eventually added to a BAYES package.

include "Math/BAYES/NiaveBayes.php";
include "Math/BAYES/Regression.php";
include "Math/BAYES/SimpleRegression.php";
include "Math/BAYES/MultipleRegression.php";



php/Math Project
© 2011. All rights reserved.