WALKER_SAMPLE
Efficient Probability Vector Sampling


WALKER_SAMPLE, a MATLAB library which efficiently samples a discrete probability vector.

For outcomes labeled 1, 2, 3, ..., N, a discrete probability vector X is an array of N non-negative values which sum to 1, such that X[i] is the probability of outcome i.

To sample the probability vector is to produce a sequence of outcomes i1, i2, i3, ..., which are each drawn with probability corresponding to X. For a general discrete probability vector X, a single sample operation might be expected to take a time that is proportional to O(N), the number of outcomes. Walker showed that, by constructing a new data structure, it was possible to carry out a sample in time of order O(1), that is, independent of the number of possible outcomes.

Licensing:

The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.

Languages:

WALKER_SAMPLE is available in a C version and a C++ version and a FORTRAN90 version and a Matlab version and a Python version.

Related Data and Programs:

HISTOGRAM_DATA_2D_SAMPLE, a MATLAB program which demonstrates how to construct a Probability Density Function (PDF) from a frequency table over a 2D domain, and then to use that PDF to create new samples.

HISTOGRAM_PDF_SAMPLE, a MATLAB library which demonstrates how sampling can be done by starting with the formula for a PDF, creating a histogram, constructing a histogram for the CDF, and then sampling.

HISTOGRAM_PDF_2D_SAMPLE, a MATLAB library which demonstrates how uniform sampling of a 2D region with respect to some known Probability Density Function (PDF) can be approximated by decomposing the region into rectangles, approximating the PDF by a piecewise constant function, constructing a histogram for the CDF, and then sampling.

PDFLIB, a MATLAB library which evaluates Probability Density Functions (PDF's) and produces random samples from them, including beta, binomial, chi, exponential, gamma, inverse chi, inverse gamma, multinomial, normal, scaled inverse chi, and uniform.

PROB, a MATLAB library which evaluates, samples, inverts, and characterizes a number of Probability Density Functions (PDF's) and Cumulative Density Functions (CDF's), including anglit, arcsin, benford, birthday, bernoulli, beta_binomial, beta, binomial, bradford, burr, cardiod, cauchy, chi, chi squared, circular, cosine, deranged, dipole, dirichlet mixture, discrete, empirical, english sentence and word length, error, exponential, extreme values, f, fisk, folded normal, frechet, gamma, generalized logistic, geometric, gompertz, gumbel, half normal, hypergeometric, inverse gaussian, laplace, levy, logistic, log normal, log series, log uniform, lorentz, maxwell, multinomial, nakagami, negative binomial, normal, pareto, planck, poisson, power, quasigeometric, rayleigh, reciprocal, runs, sech, semicircular, student t, triangle, uniform, von mises, weibull, zipf.

RANLIB, a MATLAB library which produces random samples from Probability Density Functions (PDF's), including Beta, Chi-square Exponential, F, Gamma, Multivariate normal, Noncentral chi-square, Noncentral F, Univariate normal, random permutations, Real uniform, Binomial, Negative Binomial, Multinomial, Poisson and Integer uniform, by Barry Brown and James Lovato.

REJECTION_SAMPLE, a MATLAB library which demonstrates acceptance/rejection sampling.

walker_sample_test

Reference:

  1. Donald Knuth,
    Seminumerical algorithms,
    Addison-Wesley-Longman, 1999.
  2. Warren Smith,
    How to sample from a probability distribution,
    April 18, 2002.
  3. Alastair Walker,
    An efficient method for generating discrete random variables with general distributions,
    ACM Transactions on Mathematical Software,
    Volume 3, Number 3, September 1977, pages 253-256.

Source Code:


Last revised on 10 February 2019.