Data Mining

An intermediate stage in computing binary item similarity is to cross-tabulate the number of 1-1, 1-0, 0-1, and 0-0 pairs denoted by a, b, c, and d respectively in the table below:

Computing binary item similarity

[Data Mining]

		Item j
		1	0
Item i	1	a	b
	0	c	d

Computing binary item similarity is not simply a matter of counting the number of a’s and d’s in the table above. For example, we may want to weight 1-1 (or “a”) matches greater than 0-0 (or “d”) matches or compute various ratios of matches to mismatches. Richard A. Johnson & Dean W. Wichern (2002) have compiled a list of weighting schemes for binary item similarity which is reproduced below with minor changes (note that p=a+b+c+d):

Weighting Scheme	Rationale
1. (a+d)/p	Equal weights for 1-1 matches and 0-0 matches
2. 2(a+d)/(2(a+d)+b+c)	Double weights for 1-1 matches and 0-0 matches
3. (a+d)/(a+d+2(b+c))	Double weights for unmatched pairs
4. a/p	No 0-0 matches in numerator
5. a/(a+b+c)	No 0-0 matches in numerator or denominator. The 0-0 matches are treated as irrelevant.
6. 2a/(2a+b+c)	No 0-0 matches in numerator or denominator. Double weight for 1-1 matches.
7. a/(a+2(b+c))	No 0-0 matches in numerator or denominator. Double weight for unmatched pairs.
8. a/(b+c)	Ratio of matches to mismatches with 0-0 matches excluded.

If we wanted to create a class that calculated binary item similarity according to these different weighting schemes then the class skeleton might look like this: <?php /** * Class to compute binary item similarity according to * eight different weighting schemes */ class BinaryItemSimilarity { // store the contingency table so all // methods can access it var $ContingencyTable = array(); // instantiate class with contingency table values function BinaryItemSimilarity($ContingencyTable) {); function useEqualWeightForMatches() {); function useDoubleWeightForMatches() {); function useDoubleWeightForNonMatches() {); function useNoZeroMatchesInNumerator() {); function useNoZeroMatches() {); function useNoZeroMatchesWithDoubleWeightMatches() {); function useNoZeroMatchesWithDoubleWeightNonMatches() {); function useRatioOfMatchesToNonMatchesExcludingZeroMatches() {); } ?> Exercises 1. Complete the implementation of the BinaryItemSimilarity class. 2. Do you think the API for this class is too verbose? Is a self-documenting API a good thing even when method names get this long? 3. Are there alternate names for these weighting schemes? Would it be better to use these names instead of these long method names? Permalink

Binary similarity for items and attributes

[Data Mining]

Posted on December 4, 2006 @ 11:05:13 AM by Paul MeagherBinary similarity can be computed between items or attributes. Binary item similiary can be measured by defining a collection of binary attributes for each item and comparing items on whether or not they share the same binary value for each attribute:

Attributes
	Attribute 1	Attribute 2	Attribute 3
Item 1	1	0	0
Item 2	1	1	0
Item 3	0	1	1

In this example, we compare item 1 and item 2 and observe there is a 1-1 match, a 0-0 match, and a 0-1 mismatch. Binary attribute similarity measures similarity between data columns rather than data rows:

Attributes
	Attribute 1	Attribute 2	Attribute 3
Item 1	1	0	0
Item 2	1	1	0
Item 3	1	0	0

In this example, we compare attribute 1 and attribute 2 and observe there is a 1-1 match and two 1-0 mismatches.

The Weka Data Mining library is perhaps the most impressive collection of opensource software for learning and doing Data Mining (along with the R project). Weka ships with an excellent textbook as well. If you examine the source you will quickly notice that they don’t rely upon Java’s native numerical comparison functions. Instead, they have chosen to “roll their own” comparison functions. These comparison functions are located in the Utils.java class. I ported a subset of this class to PHP – the subset dealing with basic numerical comparisons:

 <?php /** * Subset of the java-based Utils class from the Weka data mining library. * * @see http://www.koders.com/java/fidCCCC334D9F2AED4BBB7CF993FF7A5C4E99F4F05C.aspx * * You can use this class to ensure that your comparison results are * not machine dependent. */ class Utils { /** * The small deviation allowed in floating point comparisons. */ const SMALL = 1e-6; /** * Tests if a is equal to b. * * @param float $a * @param float $b * @return boolean */ public static function eq($a, $b){ return ((float)$a - (float)$b < Utils::SMALL) && ((float)$b - (float)$a < Utils::SMALL); } /** * Tests if a is smaller or equal to b. * * @param float $a * @param float $b * @return boolean */ public static function smOrEq($a, $b) { return ((float)$a - (float)$b < Utils::SMALL); } /** * Tests if a is greater or equal to b. * * @param float $a * @param float $b * @return boolean */ public static function grOrEq($a, $b) { return ((float)$b-(float)$a < Utils::SMALL); } /** * Tests if a is smaller than b. * * @param float $a * @param float $b * @return boolean */ public static function sm($a, $b) { return ((float)$b-(float)$a > Utils::SMALL); } /** * Tests if a is greater than b. * * @param float $a * @param float $b * @return boolean */ public static function gr($a, $b) { return ((float)$a-(float)$b > Utils::SMALL); } } ?>

Here is an example of usage:

 <?php include "Utils.php"; $x = 1.00001; echo "$x == 1.0 ? "; if (Utils::eq($x, 1.0)) echo "true"; else echo "false"; echo "<br />"; echo "$x > 1.0 ? "; if (Utils::gr($x, 1.0)) echo "true"; else echo "false"; echo "<br />"; echo "$x >= 1.0 ? "; if (Utils::grOrEq($x, 1.0)) echo "true"; else echo "false"; echo "<br />"; echo "$x < 1.0 ? "; if (Utils::sm($x, 1.0)) echo "true"; else echo "false"; echo "<br />"; echo "$x <= 1.0 ? "; if (Utils::smOrEq($x, 1.0)) echo "true"; else echo "false"; echo "<br />"; echo "==================<br />"; $x = 1.000001; echo "$x == 1.0 ? "; if (Utils::eq($x, 1.0)) echo "true"; else echo "false"; echo "<br />"; echo "$x > 1.0 ? "; if (Utils::gr($x, 1.0)) echo "true"; else echo "false"; echo "<br />"; echo "$x >= 1.0 ? "; if (Utils::grOrEq($x, 1.0)) echo "true"; else echo "false"; echo "<br />"; echo "$x < 1.0 ? "; if (Utils::sm($x, 1.0)) echo "true"; else echo "false"; echo "<br />"; echo "$x <= 1.0 ? "; if (Utils::smOrEq($x, 1.0)) echo "true"; else echo "false"; echo "<br />"; ?>

Here is the output of the test script:

1.00001 == 1.0 ? false
1.00001 > 1.0 ? true
1.00001 >= 1.0 ? true
1.00001 < 1.0 ? false
1.00001 <= 1.0 ? false
==================
1.000001 == 1.0 ? true
1.000001 > 1.0 ? false
1.000001 >= 1.0 ? true
1.000001 < 1.0 ? false
1.000001 <= 1.0 ? true

Exercise If you were developing a data mining/machine learning library, do you think it would be a good idea to rely upon PHP’s native comparison functions or would you use the above comparison functions?

Machine independent comparison accuracy

[Data Mining]

The kappa function is commonly used as a measure of interrater reliability. The kappa functions below compute a kappa score given a 2×2 table of observed counts. These interrater “confusion” counts can be represented using either a one dimensional array (use the kappa1 function below):

array(count(no/no), count(no/yes), count(yes/no), count(yes/yes));

or a two dimensional array (use the kappa2 function below):

array(
 array(count(no/no), count(no/yes)),
 array(count(yes/no), count(yes/yes))
);

<?php /** * Compute kappa for a 2x2 table. * * An index which compares the agreement against that which might be * expected by chance. Possible values range from +1 (perfect agreement) * to 0 (no agreement above that expected by chance) to -1 (complete * disagreement). * * @see http://www.dmi.columbia.edu/homepages/chuangj/kappa/ * * @param $table2x2 array of interrater no/yes agreements/disagreements * * @return $kappa statistic */ function kappa2($table2x2) { $row1 = $table2x2[0][0] + $table2x2[0][1]; $row2 = $table2x2[1][0] + $table2x2[1][1]; $col1 = $table2x2[0][0] + $table2x2[1][0]; $col2 = $table2x2[0][1] + $table2x2[1][1]; $n = $col1+$col2; $pObserved = ($table2x2[0][0] + $table2x2[1][1]) / $n; $pExpected = (($col1 * $row1) + ($col2 * $row2)) / ($n*$n); $kappa = ($pObserved - $pExpected) / (1 - $pExpected); return $kappa; } /* * @see http://bij.isi.uu.nl/ImageJ/api/index.html {BIJStats.java} */ function kappa1($table) { $g1 = $table[0] + $table[1]; $g2 = $table[2] + $table[3]; $f1 = $table[0] + $table[2]; $f2 = $table[1] + $table[3]; $n = $f1+$f2; $pObserved = ($table[0] + $table[3]) / $n; $pExpected = (($f1 * $g1) + ($f2 * $g2)) / ($n*$n); $kappa = ($pObserved - $pExpected) / (1 - $pExpected); return $kappa; } $table2x2 = array( array(10, 7), array(0, 12) ); echo kappa2($table2x2); ?> The example values input and output by this script were taken from this discussion of the kappa statistic. There are two reasons why I am particularly interested in the kappa statistic: 1. The kappa statistic is a close cousin of the chi square statistic which I have written about on numerious occasions and developed math packages for. I have seen the kappa statistic used alongside the chi square statistic to provide additional insight into tables of categorical count data. 2. An important aspect of the concept of “Collaborative Filtering” is the idea of “interrater reliability” – developing operational defintions of what it means to agree or disagree in your assessment of some object attribute. The kappa statistic is a good place to begin one’s exploration of the mathematics of collaborative filtering. Exercise Generalize the algorithm for computing a “kappa” score so that it can be used on 3×3 tables and nxn tables.

Measuring interrater reliability

[Data Mining]

Here is a stochoastic proof that the Euclidean metric obeys the triangle inequality.

 <?php include "Array_Distance.php"; function randomIntegers($size, $lo, $hi) { $int_vector = array(); for ($i=0; $i<$size; $i++) $int_vector[] = mt_rand($lo, $hi); return $int_vector; } $num_experiments = 10; $num_dimensions = 2; $num_rule_matches = 0; $num_rule_failures = 0; ?> <h1>Triangle Inequality Experiments using the Euclidean Metric</h1> <pre> d(x, z) <= d(x, y) + d(y, z) (triangle inequality) </pre> <?php for($e=0; $e<$num_experiments; $e++) { $X = randomIntegers($num_dimensions, 0, 9); $Y = randomIntegers($num_dimensions, 0, 9); $Z = randomIntegers($num_dimensions, 0, 9); $d = new Array_Distance($X, $Z); $dXZ = $d->euclidean(); $d = new Array_Distance($X, $Y); $dXY = $d->euclidean(); $d = new Array_Distance($Y, $Z); $dYZ = $d->euclidean(); if ($dYZ <= ($dXY + $dYZ)) { echo "Experiment #".($e+1)." obeys triangle inequality: "; echo $dYZ."&lt;=(".$dXY." + ".$dYZ.")<br/>"; $num_rule_matches++; } else { echo "Does not obey triangle inequality<br />"; $num_rule_failures++; } } ?> <h1>Results</h1> <?php echo "<b>Num Experiments: </b> $num_experiments <br />"; echo "<b>Num Rule Matches: </b> $num_rule_matches <br />"; echo "<b>Num Rule Failures: </b> $num_rule_failures <br />"; echo "<b>Rule Success Rate: </b> ".(($num_rule_matches/$num_experiments)*100)."%<br />"; ?>

You can see the ouput of this script here. I wonder what would happen if I substituted the minkowski($exp) method for the euclidean() method in the above test_triangle_inequality.php script?

Triangle Inequality Experiments using the Euclidean Metric

[Data Mining]

Vector distance metrics can be viewed as Euclidean if they satisfy the following relationship among the vectors:

1. d(x, y) >= 0 (non-negativity)
2. d(x, y) = 0 if and only if x = y (identity)
3. d(x, y) = d(y, x) (symmetry)
4. d(x, z) <= d(x, y) + d(y, z) (triangle inequality)

The Minkowski vector distance metric (i.e., minkowski($exp)) is a generalization of the Euclidean and Cityblock vector distance metrics achieved by passing in an “exponent” term (i.e., $exp). Here is the code used to compute these 3 vector distance metrics: <?php function euclidean() { $sum = 0.0; for($i=0; $i<$this->n; $i++) $sum += pow($this->X[$i]-$this->Y[$i], 2); return sqrt($sum); } function cityblock() { $sum = 0.0; for($i=0; $i<$this->n; $i++) $sum += abs($this->X[$i]-$this->Y[$i]); return $sum; } function minkowski($exp) { $sum = 0.0; for($i=0; $i<$this->n; $i++) $sum += pow(abs($this->X[$i]-$this->Y[$i]), $exp); return pow($sum, 1/$exp); } ?> Exercise 1: Constuct a program that illustrates the idea/fact that the minkoski() function is a generalization of the cityblock() and euclidean() functions. Is the PHP program “proof” of this mathematical idea/fact? Exercise 2: Construct a PHP program that tests whether the minkowski($exp) function satisfies the triangle inequality relationship. I will provide some code tomorrow that offers a stochoastic answer. Exercise 3: Discuss the relationships between space and distance? Is space “real” and/or “constructed”? Is distance “real” and/or “constructed”?

Euclidean, cityblock, and minkowski distance metrics

[Data Mining]

Added the chebychev() and canberra() distance methods to the Array_Distance class. Question: Are these two new distance measures methods “true” distance metrics? Do they obey the triangle inequality? Partial Answer: The triangle inequality is defined as:

d(u, v) <= d(u,w) + d(w, v) for all u, v, w vectors

Snagged this definition from The C Clustering Library (PDF) documentation which has a good discussion of distance metrics and applications.

Chebychev and Canberra distance measures added

[Data Mining]

I have set up an Array_Distance download page where the public can download and demo the Array_Distance class. Prior to releasing this first public release of the class I made sure I pre-assigned all values being computed to 0.0. Making these pre-assignments (e.g., $sum = 0.0) might help to ensure that PHP stores and manipulates these computed values using PHP’s float data type.

Array_Distance download page [Data Mining]

I added three more methods for measuring the “distance” between two vectors: covariance(), correlation(), and centroid(). To compute these distances I needed to add a few standard statistical computations (i.e., mass(), mean(), and variance()). These statistical methods will be useful when implementing further distance metrics. There was also an error in the minkowski() method – I used a textbook with a wrong definition and corrected the issue when I did more research on this important distance metric. Note that the cityblock() and euclidean() metrics are special cases of the minkowski() metric (with exponents of 1 and 2 respectively). Here is the latest version of the Distance object:

 <?php class Distance { var $n = 0; var $X = array(); var $Y = array(); function Distance($X, $Y) { $this->n = count($X); if (($this->n != 0) AND ($this->n == count($Y))) { $this->X = $X; $this->Y = $Y; return true; } else die("Size of X and Y arrays must be the same"); } function mass($vec) { $sum = 0; for($i=0; $i < $this->n; $i++) $sum += $vec[$i]; return $sum; } function mean($vec) { return $this->mass($vec) / $this->n; } function variance($vec) { $mean = $this->mean($vec); $sum = 0.0; for($i=0; $i < $this->n; $i++) $sum += pow($vec[$i] - $mean, 2); return $sum /($this->n - 1); } function euclidean() { $sum = 0; for($i=0; $i<$this->n; $i++) $sum += pow($this->X[$i]-$this->Y[$i], 2); return sqrt($sum); } function cityblock() { $sum = 0; for($i=0; $i<$this->n; $i++) $sum += abs($this->X[$i]-$this->Y[$i]); return $sum; } function minkowski($exp) { $sum = 0; for($i=0; $i<$this->n; $i++) $sum += pow(abs($this->X[$i]-$this->Y[$i]), $exp); return pow($sum, 1/$exp); } function difference() { $sum = 0; for($i=0; $i<$this->n; $i++) if ($this->X[$i] != $this->Y[$i]) $sum += 1; return $sum; } function covariance() { $x_mean = $this->mean($this->X); $y_mean = $this->mean($this->Y); $sum = 0.0; for($i=0; $i < $this->n; $i++) $sum += ($this->X[$i]-$x_mean) * ($this->Y[$i]-$y_mean); return $sum / ($this->n - 1); } function correlation() { $denom = sqrt($this->variance($this->X) * $this->variance($this->Y)); if($denom != 0) return($this->covariance() / $denom); else { if (($this->variance($this->X)==0) && ($this->variance($this->Y)==0)) return 1.0; else return 0.0; // impossible to correlate a null signal with another } } function centroid() { return abs($this->mean($this->X)-$this->mean($this->Y)); } } $X = array(1, 2, 3); $Y = array(2, 4, 6); $d = new Distance($X, $Y); echo "The Cityblock distance between the vectors is ".$d->cityblock()."<br />"; echo "The Minkowski distance between the vectors with exp=1 is ".$d->minkowski(1)."<br />"; echo "The Euclidean distance between the vectors is ".$d->euclidean()."<br />"; echo "The Minkowski distance between the vectors with exp=2 is ".$d->minkowski(2)."<br />"; echo "The Difference distance between the vectors is ".$d->difference()."<br />"; echo "The Covariance distance between the vectors is ".$d->covariance()."<br />"; echo "The Correlation distance between the vectors is ".$d->correlation()."<br />"; echo "The Centroid distance between the vectors is ".$d->centroid()."<br />"; // The Cityblock distance between the vectors is 6 // The Minkowski distance between the vectors with exp=1 is 6 // The Euclidean distance between the vectors is 3.7416573867739 // The Minkowski distance between the vectors with exp=2 is 3.7416573867739 // The Difference distance between the vectors is 3 // The Covariance distance between the vectors is 2 // The Correlation distance between the vectors is 1 // The Centroid distance between the vectors is 2 ?>

More distance methods

[Data Mining]

Many datamining applications rely upon some form of distance metric to compute how similiar one vector of numbers is to another. Decided to create a class to collect these distance metrics. So far I have added 4 distance metrics – three useful for continuous variables and the last one – the difference method – useful for discrete variables.

 <?php class Distance { var $n = 0; var $X = array(); var $Y = array(); function Distance($X, $Y) { $this->n = count($X); if (($this->n != 0) AND ($this->n == count($Y))) { $this->X = $X; $this->Y = $Y; return true; } else die("Size of X and Y arrays must be the same"); } function euclidean() { $sum = 0; for($i=0; $i<$this->n; $i++) $sum += pow($this->X[$i]-$this->Y[$i], 2); return sqrt($sum); } function cityblock() { $sum = 0; for($i=0; $i<$this->n; $i++) $sum += abs($this->X[$i]-$this->Y[$i]); return $sum; } function minkowski($exp) { $sum = 0; for($i=0; $i<$this->n; $i++) $sum += pow(abs($this->X[$i]-$this->Y[$i]), $exp); return $sum; } function difference() { $sum = 0; for($i=0; $i<$this->n; $i++) if ($this->X[$i] != $this->Y[$i]) $sum += 1; return $sum; } } $X = array(1, 2, 3); $Y = array(2, 4, 6); $d = new Distance($X, $Y); echo "The Euclidean distance between the vectors is ".$d->euclidean()."<br />"; echo "The Cityblock distance between the vectors is ".$d->cityblock()."<br />"; echo "The Minkowski distance between the vectors with exp=1 is ".$d->minkowski(1)."<br />"; echo "The Difference distance between the vectors is ".$d->difference()."<br />"; // The Euclidean distance between the vectors is 3.7416573867739 // The Cityblock distance between the vectors is 6 // The Minkowski distance between the vectors with exp=1 is 6 // The Difference distance between the vectors is 3 ?>

There are a large number of distance metrics and I will add more to the class as I have the time. Let me know if you discover any errors in the above code or if you have any other vector distance metrics you would like to add.

Some distance metrics

[Data Mining]

Just finished reading a book called “Graphic Discovery” (2005) by Harold Wainer. It is a nice short book (main content takes up 149 pages) and if you are interested in Edward Tufte’s writings you will want to read this book as well for it’s gallery of interesting data graphics and commentary. The final two chapters of the book are a look into the future of data graphics and were done in collaboration with data graphics guru John Tukey before he died in 2000. According to Tukey and Wainer, these are the 4 graphic tools that will be the focus of Exploratory Data Analysis (EDA) research in 21st century:

Rotatable Scatterplots
Slicing Engine
Nearness Engine
Smoothing Engine

These are all interactive graphic discovery tools for multi-dimensional spaces. You will need to read the book to get a sense of how these tools might work, however, the exact nature of what these tools are and how they work is the subject of future research so don’t expect crisp definitions (otherwise I would have provided them above) but rather suggestive examples. The book also contains numerious interesting and instructive tidbits such as this one:

Abraham Wald, in some work he did during World War II, was trying to determine where to add extra armor to planes on the basis of the pattern of bullet holes in returning aricraft. His conclusion was to determine carefully where returning planes had been shot and put extra armor every place else! (p. 148)

Why do you think Wald did this?

Exploratory Data Analysis in the 21st century

[Data Mining]

I have done some implementation work on a “Survey” data frame object so my suggestion a few days ago about a fundamental “Survey” data type for php-based datamining can be fleshed out in some more detail. Here are some of the “results” of my R & D on this concept to date. I have implemented a “Survey” data object as a database-oriented data structure. A prototypical “Survey” data object has 1 database table that is used to hold survey data, and 3 other database tables that are used to hold metadata about this survey data table. The names of these metadata tables are “Surveys”, “SurveyQuestions”, “SurveyAnswers”. This represents both an object-oriented decomposition of the domain of “Surveys” as well as a normalization of the database tables required to implement the data object. One might shorten the object names to “Surveys”, “Questions”, “Answers” to make everything appear more fundamental and elegant. A “Survey” consists of a set of “Questions” where each “Question” has either a finite or infinite set of possible “Answers”. The possible “Answers” to a given “Question” would be enumerated in the “Answers” table. The “Survey” object would have methods for inserting/updating/deleting/selecting the Surveys/Questions/Answers metadata tables which would, in turn, modify the structure of the database table used to hold the raw data. If you add a question to a particular “Survey” this would have the side-effect of adding a column to the raw data table using the column name specified by the insert/updateQuestion method. The “Survey” object would require more or less work to set up depending on how much Survey, Question and Answer metadata you wanted to specify for it. Once specified, your “Survey” object could potentially be used by many different PHP math applications that assume the data has a “Survey” structure and thus contains quite a bit of useful metadata that could be used 1) to reduce the burden of specifying the nature of the data set, 2) to provide useful labelling in reporting screens, and 3) to document the nature of the dataset for archival purposes.

An object-oriented and normalized “Survey” data type for php-based datamining applications

[Data Mining]

If you were developing a data mining platform using PHP, one question you might want to ask yourself is whether the platform needs to define a standard data object that might be used by all math library components to interrogate arbitrary data sets (in practice, the data would mostly reside in a mysql database). In R you have an object called a “DataFrame” that serves this role. I have been thinking with calling the PHP object that enscapsulates arbitrary data sets a “Survey” object. The “Survey” metaphor is quite suggestive of many desirable features that a data structure useful for data mining should have. For example, a survey is a multidimensional information storage object, often with reflection/introspection features, that is designed to systematically collect and relate data about some phenomonon of interest. Surveys often use database tables as storage objects and have a strong columnar orientation/structure.

Survey objects for data mining

[Data Mining]

When sharing datamining code in an agile development context, a recurring issue is that you can’t easily tar up your application because your configuration file contains sensitive information in the form of usernames and passwords to connect to a real database. Inconvenient and inelegant archiving solutions need to be created to prevent your real usernames and passwords from shipping with your opensource datamining application. In Protecting a MySQL user/password in a PHP script, Greg Beaver discusses various solutions but the solution I like best is to use enviroment variables to configure the username and password values for your database connection:

Add these settings to your .htaccess file (where mysqluser and mysqlpass are to be changed to appropriate values): SetEnv DBUSER mysqluser SetEnv DBPASS mysqlpass
Test that the values of the environment variables can be accessed from a PHP script: <?php echo "<pre>"; echo $_SERVER['DBUSER']; echo $_SERVER['DBPASS']; echo "</pre>"; ?>
You can then connect to your db like this: $db = mysqli_connect('localhost', $_SERVER['DBUSER'], $_SERVER['DBPASS']);

In the future, php/Math datamining applications will utilize a database connection setup like this to make sharing of php-mysql code easier. Those of you who can’t set the .htaccess environment for your web directories, could instead set the $_SERVER values via a config.php file that would be included at the top of every datamining application script. This is less convenient to set up and tarball than using Apache environment variables to maintain your database username and password.

Setting up php-mysql datamining applications

[Data Mining]

Content composed with the instant Online HTML editor. Please purchase a HTMLg license to remove the promotional messages from the edited documents.