REGRESSION Linear Regression Datasets

REGRESSION is a dataset directory which contains test data for linear regression.

The simplest kind of linear regression involves taking a set of data (xi,yi), and trying to determine the "best" linear relationship

y = a * x + b

Commonly, we look at the vector of errors:

ei = yi - a * xi - b

and look for values (a,b) that minimize the L1, L2 or L-infinity norm of the errors. For problems involving multivariate sets of data, the number a becomes a matrix, and b a vector, but the idea is similar.

The data files have a simple format:

• initial comment lines, each beginning with a "#".
• the number of columns of data;
• the number of rows of data;
• for each column of data, a line containing a column label; the first column is always "Index" and counts the rows; if there is a column labeled "A0" it usually contains the value 1.0;
• each row of data, on a separate line, with data separated by spaces.

There are also some extended examples, which involve an M by N linear system, a set of linear constraints to be solved exactly, and a set of linear inequalities. In that case, a master file lists the sizes of the three sets of data, and the name of the first file, which contains the linear system.

Related Data and Programs:

HARTIGAN, a dataset directory which contains datasets for testing clustering algorithms;

ISWR, a dataset directory which contains datasets used for statistical analysis, particularly with the R language.

MARTINEZ, a dataset directory which contains datasets for computational statistics, including cluster analysis;

MDS, a dataset directory which contains datasets for M-dimensional scaling;

SOKAL_ROHLF, a dataset directory which contains biological datasets considered by Sokal and Rohlf.

STATS, a dataset directory which contains datasets for computational statistics;

Reference:

1. I Barrodale, F Roberts,
Algorithm 552: Solution of the Constrained L1 Approximation Problem,
ACM Transactions on Mathematical Software,
Volume 6, Number 2, pages 231-235, 1980.
2. Richard Gunst, Robert Mason,
Regression Analysis and Its Applications: a data-oriented approach,
Dekker, 1980,
ISBN: 0824769937,
LC: QA278.2.G85.
3. David Kahaner, Cleve Moler, Steven Nash,
Numerical Methods and Software,
Prentice Hall, 1989,
ISBN: 0-13-627258-4,
LC: TA345.K34.
4. Helmuth Spaeth,
Mathematical Algorithms for Linear Regression,
ISBN: 0-12-656460-4.

Datasets:

• x01.txt, brain and body weight, 62 rows, 3 columns;
• x02.txt, height, weight, catheter length, 12 rows, 4 columns;
• x03.txt, age, blood pressure, 30 rows, 4 columns;
• x04.txt, catalog print run versus orders, 38 rows, 4 columns;
• x05.txt, catalog print run versus orders, 38 rows, 5 columns;
• x06.txt, age, water temperature, length of fish, 44 rows, 4 columns;
• x07.txt, retardation, doctor distrust, degree of illness, 53 rows, 4 columns;
• x08.txt, poverty, unemployment, murder rate, 20 rows, 5 columns;
• x09.txt, age, weight, blood fat, 25 rows, 5 columns;
• x10.txt,factory operation parameters, percent of unprocessed ammonia, 21 rows, 5 columns;
• x11.txt, pasturage properties and price, 67 rows, 5 columns;
• x12.txt, electrical utility data, 16 rows, 6 columns;
• x13.txt, production, imports, and consumption data, 18 rows, 7 columns;
• x14.txt, gas tank temperature and pressure, 32 rows, 6 columns;
• x15.txt, gas consumption versus local conditions, 48 rows, 6 columns;
• x16.txt, gas consumption versus local conditions, 48 rows, 7 columns;
• x17.txt, octane rating versus raw materials, 82 rows, 6 columns;
• x18.txt, octane rating versus raw materials, 82 rows, 7 columns;
• x19.txt, livestock market expenses, 19 rows, 7 columns;
• x20.txt, population and drinking data, 46 rows, 7 columns;
• x21.txt, economic and employment data, 16 rows, 8 columns;
• x22.txt, economic and employment data, 16 rows, 9 columns;
• x23.txt, office worker satisfaction, 30 rows, 8 columns;
• x24.txt, office worker satisfaction, 30 rows, 9 columns;
• x25.txt, ground evaporation versus conditions, 25 rows, 9 columns;
• x26.txt, selling price of houses, 28 rows, 13 columns;
• x27.txt, selling price of houses, 28 rows, 14 columns;
• x28.txt, the death rate as a function of other variables, 60 rows, 17 columns;

More data files you may copy, involving overdetermined linear systems, include:

• x29.txt, points in the plane, 9 rows, 4 columns;
• x30.txt, a linear system, 4 rows, 5 columns;
• x31.txt, a linear system, 10 rows, 5 columns;
• x32.txt, a linear system, 13 rows, 5 columns;
• x33.txt, a linear system, 96 rows, 6 columns;
• x34.txt, a linear system, 20 rows, 7 columns;
• x35.txt, a linear system, 30 rows, 7 columns;
• x36.txt, a linear system, 6 rows, 7 columns;
• x37.txt, a linear system, 6 rows, 7 columns;
• x38.txt, a linear system, 6 rows, 7 columns;
• x39.txt, a linear system, 6 rows, 7 columns;
• x40.txt, a linear system, 6 rows, 7 columns;
• x41.txt, a linear system, 6 rows, 7 columns;
• x42.txt, a linear system, 16 rows, 11 columns;
• x60.txt, a linear system, 3 rows, 5 columns;

More data files you may copy, involving overdetermined linear systems with equality and inequality constraints, include:

• x43.txt, a linear system made up of 3 subsystems, 12 rows, 2 columns;
• x43_01.txt, subsystem #1, 4 rows, 2 columns;
• x43_02.txt, subsystem #2, 5 rows, 2 columns;
• x43_03.txt, subsystem #3, 3 rows, 2 columns;
• x44.txt, a linear system made up of 3 subsystems, 12 rows, 2 columns;
• x44_01.txt, subsystem #1, 4 rows, 2 columns;
• x44_02.txt, subsystem #2, 5 rows, 2 columns;
• x44_03.txt, subsystem #3, 3 rows, 2 columns;
• x45.txt, a linear system made up of 3 subsystems, 12 rows, 2 columns;
• x45_01.txt, subsystem #1, 4 rows, 2 columns;
• x45_02.txt, subsystem #2, 5 rows, 2 columns;
• x45_03.txt, subsystem #3, 3 rows, 2 columns;
• x46.txt, a linear system, 3 rows, 2 columns;
• x47.txt, a system made up of a linear system, equality and inequality constraints, 5 rows, 2 columns;
• x47_01.txt, the linear system, 3 rows, 2 columns;
• x47_02.txt, the equality constraints, 0 rows, 2 columns;
• x47_03.txt, the inequality constraints, 2 rows, 2 columns;
• x48.txt, a system made up of a linear system, equality and inequality constraints, 5 rows, 2 columns;
• x48_01.txt, the linear system, 3 rows, 2 columns;
• x48_02.txt, the equality constraints, 0 rows, 2 columns;
• x48_03.txt, the inequality constraints, 2 rows, 2 columns;
• x49.txt, a system made up of a linear system, equality and inequality constraints, 4 rows, 2 columns;
• x49_01.txt, the linear system, 3 rows, 2 columns;
• x49_02.txt, the equality constraints, 1 rows, 2 columns;
• x49_03.txt, the inequality constraints, 0 rows, 2 columns;
• x50.txt, a system made up of a linear system, equality and inequality constraints, 4 rows, 2 columns;
• x50_01.txt, the linear system, 3 rows, 2 columns;
• x50_02.txt, the equality constraints, 1 rows, 2 columns;
• x50_03.txt, the inequality constraints, 0 rows, 2 columns;
• x51.txt, a system made up of a linear system, equality and inequality constraints, 6 rows, 2 columns;
• x51_01.txt, the linear system, 3 rows, 2 columns;
• x51_02.txt, the equality constraints, 1 rows, 2 columns;
• x51_03.txt, the inequality constraints, 2 rows, 2 columns;
• x52.txt, a system made up of a linear system, equality and inequality constraints, 6 rows, 2 columns;
• x52_01.txt, the linear system, 3 rows, 2 columns;
• x52_02.txt, the equality constraints, 1 rows, 2 columns;
• x52_03.txt, the inequality constraints, 2 rows, 2 columns;
• x53.txt, a system made up of a linear system, equality and inequality constraints, 6 rows, 2 columns;
• x53_01.txt, the linear system, 3 rows, 2 columns;
• x53_02.txt, the equality constraints, 1 rows, 2 columns;
• x53_03.txt, the inequality constraints, 2 rows, 2 columns;
• x54.txt, a system made up of a linear system, equality and inequality constraints, 13 rows, 5 columns;
• x54_01.txt, the linear system, 8 rows, 5 columns;
• x54_02.txt, the equality constraints, 3 rows, 5 columns;
• x54_03.txt, the inequality constraints, 2 rows, 5 columns;
• x55.txt, a system made up of a linear system, equality and inequality constraints, 14 rows, 7 columns;
• x55_01.txt, the linear system, 9 rows, 7 columns;
• x55_02.txt, the equality constraints, 0 rows, 7 columns;
• x55_03.txt, the inequality constraints, 5 rows, 7 columns;
• x56.txt, a system made up of a linear system, equality and inequality constraints, 11 rows, 5 columns;
• x56_01.txt, the linear system, 6 rows, 5 columns;
• x56_02.txt, the equality constraints, 0 rows, 5 columns;
• x56_03.txt, the inequality constraints, 5 rows, 5 columns;
• x57.txt, a system made up of a linear system, equality and inequality constraints, 11 rows, 5 columns;
• x57_01.txt, the linear system, 6 rows, 5 columns;
• x57_02.txt, the equality constraints, 0 rows, 5 columns;
• x57_03.txt, the inequality constraints, 5 rows, 5 columns;
• x58.txt, a system made up of a linear system, equality and inequality constraints, 5 rows, 3 columns;
• x58_01.txt, the linear system, 3 rows, 3 columns;
• x58_02.txt, the equality constraints, 0 rows, 3 columns;
• x58_03.txt, the inequality constraints, 2 rows, 3 columns;
• x59.txt, a system made up of a linear system, equality and inequality constraints, 6 rows, 2 columns;
• x59_01.txt, the linear system, 3 rows, 2 columns;
• x59_02.txt, the equality constraints, 1 rows, 2 columns;
• x59_03.txt, the inequality constraints, 2 rows, 2 columns;
• x61.txt, a system made up of a linear system, equality and inequality constraints, 13 rows, 7 columns;
• x61_01.txt, the linear system, 8 rows, 7 columns;
• x61_02.txt, the equality constraints, 3 rows, 7 columns;
• x61_03.txt, the inequality constraints, 2 rows, 7 columns;

Miscellaneous data files:

• x62.txt, 12 measurements of dye concentration in a liquid over time, 12 rows, 3 columns;

You can go up one level to the DATASETS directory.

Last revised on 15 July 2011.