STAR_DISCREPANCY
The Star Discrepancy of a Pointset

STAR_DISCREPANCY is a C program which computes bounds on the star discrepancy of an M-dimensional set of N points contained in the unit hypercube.

The star discrepancy is a commonly cited statistic for determining how uniformly a pointset is distributed over a region. For convenience, this region is usually taken as the unit hypercube; STAR_DISCREPANCY will assume that datasets under investigation are meant to fill up the unit hypercube.

If the pointset to be investigated actually lies in some other hypercube, a simply translation and rescaling may be enough to transform the data. This will probably NOT be satisfactory if the original region is rectangular, but has sides of different length, or if the region is not rectangular.

The discrepancy measures the worst error that would be made in estimating the area of a subregion of the hypercube by simply noting the fraction of the pointset contained in the subregion. If arbitrary subregions were allowed, then it would always be possible to make this error equal to 1 (just take the region consisting of the hypercube minus the pointset.) Since any "reasonable" area can be arbitrarily well approximated by rectangles, the star discrepancy calculation uses only rectangular subregions, whose sides are aligned with coordinates directions, and one of whose corners is at the origin.

Formally, the star discrepancy of a pointset of n points is symbolized by D_n^* and defined as

D_n^* = supremum ( P in I* ) | ( A(P,x) / n ) - lambda ( P ) |

Here, I* is the set of all M-dimensional subintervals of the form [0,p1] x [0,p2] x ... x [0,ps] where every p is between 0 and 1; P is any such subinterval; lambda(P) is the volume of the subinterval, A(P,x) is the number of points of the point set x that occur in P, and n is the number of points in x.

Clearly, the star discrepancy is measuring how badly the pointset estimates the volume of a subinterval. This worst error is somewhere between 0 (absolutely no error ever) and 1 (totally missing the volume of the unit hypercube). A value of 0.25, for instance, means that there is a subinterval in the unit hypercube for which the difference between its true and estimated volumes is 0.25. (It might have a volume of 0.80, and be estimated at 0.55, for instance, or a volume of 0.05 that is estimated at 0.30.)

File Format:

Two file formats are allowed to specify the point set x={x(1),...,x(n)} in I^m:

1) As real numbers, x(i)=(x(i,1),...,x(i,m)). The file must contain the n+1 following lines:

      m n reals
      x(1,1) x(1,2) ... x(1,m)
      ...
      x(n,1) x(n,2) ... x(n,m)

2) As fractions, x(i)=(num(i,1)/den(i),...,num(i,m)/den(i)). The file must contain the n+1 following lines:

      m n fractions
       num(1,1) num(1,2) ... num(1,m) den(1)
      ...
      num(n,1) num(n,2) ... num(n,m) den(n)

Usage:

star_discrepancy m epsilon n data

m is the spatial dimension;
epsilon is an error tolerance between 0 and 1, and indicates the allowable error in the estimate;
n is the number of points to be read from the file;
data is a file containing the data.

star_discrepancy epsilon n data num den

where the two extra arguments are:

num the optional balance numerator;
den the optional balance denominator;

Languages:

STAR_DISCREPANCY is available in a C version and a C++ version.

Related Data and Programs:

DIAPHONY, a C program which reads a file of N points in M dimensions and computes its diaphony, a measure of point dispersion.

TABLE_LATINIZE, a C++ program which can read a TABLE file and write out a "latinized" version.

TABLE_QUALITY, a C++ program which can read a TABLE file and print out measures of the quality of dispersion of the points.

Author:

Eric Thiemard

Reference:

http://rosowww.epfl.ch/papers/discrbound2/, the source code web site.
Eric Thiemard,
An Algorithm to Compute Bounds for the Star Discrepancy,
Journal of Complexity,
Volume 17, pages 850-880, 2001.

Source Code:

star_discrepancy.c, the source code.

Examples and Tests:

halton_reals.txt, 10 Halton points in 2D, using the "reals" format.
halton_reals_output.txt, the result of the command
star_discrepancy 2 0.001 10 halton_reals.txt

halton_fractions.txt, 10 Halton points in 2D, using the "fractions" format.
halton_fractions_output.txt, the result of the command
star_discrepancy 2 0.001 10 halton_fractions.txt

List of Routines:

MAIN is the main program for the star discrepancy bound computation.
DECOMPOSITION carries out the decomposition of a subinterval.
EXPLORE ???
FASTEXPLORE ???
FILEFORMAT reports the legal input file formats.
FREETREE frees storage associated with a tree.
INITLEX initializes the lexicon.
INITPARAMETERS sets program parameters based on user input and defaults.
INSERTLEX inserts an item into the lexicon.
LOWBOUND computes the lower bound.
MEMORY prints a message and terminates on memory allocation errors.
QUICKSORT uses Quicksort to sort an array.
READFILE reads the user's input data file.
SUBTREE ???
SUPERTREE ???
TRAITER ???
USAGE prints a usage message.

You can go up one level to the C source codes.

Last revised on 23 January 2012.

STAR_DISCREPANCY The Star Discrepancy of a Pointset