HISTOGRAM_DATA_2D_SAMPLE
Create and Sample a PDF Based on Sample Data

HISTOGRAM_DATA_2D_SAMPLE is a C++ program which demonstrates how to construct a Probability Density Function (PDF) from a table of sample data over a 2D region, and then to use that PDF to create new samples.

The program presented here is hard-wired to handle a specific problem. However, the ideas used in the program are easily extended to other regions and other dimensions.

For the problem given here, we assume we have sample values of a function F(X,Y) for each subregion of a region. These values might actually represent population counts, a density, the integral of some function over the subregion, or simply an abstract function. We implicitly assumed that all the values are positive.

The particular region studied here is the unit square, which has been broken down into a 20x20 array of equal subsquares.

If we normalize by the sum of the data values, the result is a PDF associated with each subregion. By assigning an arbitrary order to the subregions, we can add the PDF values up to the given subregion to get a CDF (cumulative density function) for that subregion. Now given an arbitrary random value U, we can locate the subregion whose CDF value just exceeds U. Choosing a random point within this subregion gives us the sample point. If we choose many such sample points, the statistics for this sample will tend to the discrete PDF that we defined from the data we were given.

Licensing:

The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.

Languages:

HISTOGRAM_DATA_2D_SAMPLE is available in a C version and a C++ version and a FORTRAN90 version and a MATLAB version.

Related Data and Programs:

FEM1D_SAMPLE, a C++ program which samples a scalar or vector finite element function of one variable, defined by FEM files, returning interpolated values at the sample points.

FEM2D_SAMPLE, a C++ program which evaluates a finite element function defined on an order 3 or order 6 triangulation.

PROB, a C++ library which evaluates and inverts a number of probabilistic distributions.

RANDOM_DATA, a C++ library which generates sample points for various probability distributions, spatial dimensions, and geometries;

WALKER_SAMPLE, a C++ library which efficiently samples a discrete probability vector using Walker sampling.

Source Code:

histogram_data_2d_sample.cpp, the source code.

Examples and Tests:

TEST01 computes 1000 samples, based on a 20x20 PDF table that is heavily biased toward the northwest corner.

test01.txt, 1000 sample points generated by the program.
test01.png, a PNG image of the data.

TEST02 computes 1000 samples, based on a 12x8 PDF table that is loosely based on the population of counties in Iowa.

test02.txt, 1000 sample points generated by the program.
test02.png, a PNG image of the data.

List of Routines:

MAIN is the main program for HISTOGRAM_DATA_2D_SAMPLE.
DISCRETE_CDF_TO_XY finds XY points corresponding to discrete CDF values.
GET_DISCRETE_PDF returns the value of the discrete PDF function in each cell.
R8MAT_COPY_NEW copies one R8MAT to a "new" R8MAT.
R8MAT_SCALE multiplies an R8MAT by a scalar.
R8MAT_SUM returns the sum of an R8MAT.
R8MAT_WRITE writes an R8MAT file.
R8VEC_UNIFORM_01_NEW returns a new unit pseudorandom R8VEC.
SET_DISCRETE_PDF sets a CDF from a discrete PDF.
TIMESTAMP prints the current YMDHMS date as a time stamp.

You can go up one level to the C++ source codes.

Last modified on 03 January 2012.