CLUSTER_ENERGY
Clusterings with Minimal Energy


CLUSTER_ENERGY is a FORTRAN90 program which seeks to organize data into a given number of clusters, in a way which minimizes the cluster energy.

Specifically, suppose we are given a set of data points in NUM_DIM dimensional space. Suppose we are told to use C_NUM clusters. Each cluster is to be represented by a CENTER point. Each data point is to be assigned to a cluster. The total energy is the sum of the cluster energies, and the energy of a cluster is the sum of the squares of the distance of each data point to its center point.

This code allows the user to specify a dimension, the number of data points, the range of the data, a range of cluster values to try, and the number of cluster iterations to carry out. It then tries to compute the minimal cluster energy for the given data, and the various numbers of clusters.

Licensing:

The computer code and data files made available on this web page are distributed under the GNU LGPL license.

Languages:

CLUSTER_ENERGY is available in a FORTRAN90 version.

Related Data and Programs:

ASA058, a FORTRAN90 library which implements the K-means algorithm of Sparks.

ASA136, a FORTRAN77 file which contains the original text of the Hartigan and Wong clustering algorithm ASA136.

CVT_BASIS, a FORTRAN90 program which uses the CVT algorithm to cluster data.

CVT_BASIS_FLOW, a FORTRAN90 program which uses the CVT algorithm to cluster data related to fluid flow.

KMEANS, a FORTRAN90 library which uses the K-Means algorithm to cluster data.

LAU_NP, a FORTRAN90 library which contains heuristic algorithms for the K-center and K-median problems.

POD_BASIS_FLOW, a FORTRAN90 program which uses the POD algorithm to cluster data related to fluid flow.

POINT_MERGE, a FORTRAN90 library which considers N points in M dimensional space, and counts or indexes the unique or "tolerably unique" items.

SPAETH, a FORTRAN90 library which can cluster data according to various principles.

SPAETH, a dataset directory which contains a set of test data.

SPAETH2, a FORTRAN90 library which can cluster data according to various principles.

SPAETH2, a dataset directory which contains a set of test data.

SVD_BASIS, a FORTRAN90 program which uses the Singular Value Decomposition to cluster data.

Reference:

  1. John Hartigan, Manchek Wong,
    Algorithm AS 136: A K-Means Clustering Algorithm,
    Applied Statistics,
    Volume 28, Number 1, 1979, pages 100-108.
  2. Wendy Martinez, Angel Martinez,
    Computational Statistics Handbook with MATLAB,
    Chapman and Hall / CRC, 2002.
  3. David Sparks,
    Algorithm AS 58: Euclidean Cluster Analysis,
    Applied Statistics,
    Volume 22, Number 1, 1973, pages 126-130.

Source Code:

Examples and Tests:

List of Routines:

You can go up one level to the FORTRAN90 source codes.


Last revised on 05 January 2006.