SPAETH2
Cluster Analysis Tools

SPAETH2 is a FORTRAN90 library which analyzes data by grouping it into clusters.

The current implementation of the code is "under development": some things work, and some don't.

Licensing:

The computer code and data files made available on this web page are distributed under the GNU LGPL license.

Languages:

SPAETH2 is available in a FORTRAN90 version.

Related Data and Programs:

ASA058, a FORTRAN90 library which implements the K-means algorithm of Sparks.

ASA136, a FORTRAN90 library which implements the Hartigan and Wong clustering algorithm.

CITIES, a dataset directory which contains sets of information about cities and the distances between them;

CITIES, a FORTRAN90 library which handles various problems associated with a set of "cities" on a map.

KMEANS, a FORTRAN90 library which contains several different algorithms for the K-Means problem.

LAU_NP, a FORTRAN90 library which implements heuristic algorithms for various NP-hard combinatorial problems.

POINT_MERGE, a FORTRAN90 library which considers N points in M dimensional space, and counts or indexes the unique or "tolerably unique" items.

SPAETH, a FORTRAN90 library which can cluster data according to various principles.

SPAETH, a dataset directory which contains datasets for cluster analysis;

SPAETH2, a dataset directory which contains datasets for cluster analysis;

Reference:

Helmuth Spaeth,
Cluster Analysis Algorithms for Data Reduction and Classification of Objects,
Ellis Horwood, 1980,
QA278 S6813.
Helmuth Spaeth,
Cluster Dissection and Analysis,
Theory, FORTRAN Programs, Examples,
Ellis Horwood, 1985,
QA278 S68213.

Source Code:

spaeth2.f90, the source code.

Examples and Tests:

spaeth2_test.f90, a sample problem.
spaeth2_test.txt, the output file.

List of Routines:

CH_CAP capitalizes a single character.
CH_EQI is a case insensitive comparison of two characters for equality.
CH_TO_DIGIT returns the integer value of a base 10 digit.
CLUDIA clusters data for which a distance matrix has been supplied.
CLUSTA solves the multiple location problem in N dimensions.
CLUSTER_CENTROIDS determines the centroids of a clustering.
CLUSTER_MEDIANS determines the medians of a clustering.
CLUSTER_MEDIAN_DISTANCE finds the cluster median distance.
CLUSTER_POPULATION sets the cluster populations from the assignment array.
CLUSTER_VARIANCE determines the variances associated with a clustering.
COLPER seeks a column permutation which maximizes the "bond energy".
DATA_D_READ reads a real data set stored in a file.
DATA_D_PRINT prints a real data set.
DATA_D_SHOW makes a typewriter plot of a real data set.
DATA_SIZE counts the size of a data set stored in a file.
DIF_INVERSE returns the inverse of the second difference matrix.
DISMEA constructs a set of hierarchical clusters.
DIVGOW constructs a set of hierarchical clusters by doubling.
EMEANS clusters data using a variant of the K-Means algorithm for L1 norms.
GET_UNIT returns a free FORTRAN unit number.
HIERCL implements seven agglomerative hierarchical clustering methods.
HMEANS clusters data using the H-Means algorithm.
I4_FACTORIAL computes the factorial N!
I4_SWAP swaps two integer values.
I4VEC_INDICATOR sets an integer vector to the indicator vector A(I)=I.
I4VEC_PERML generates permutations of a vector in lexicographic order.
I4VEC_PERMS generates permutations of a vector in lexicographic order.
JOINER uses a very simple cluster assignment algorithm.
KMEANS clusters data using the K-Means algorithm.
LEADER uses a very simple cluster assignment algorithm.
LINKER contructs a minimal tree for a symmetric distance matrix.
ORDERED clusters one-dimensional ordered data into NC clusters.
PROFILE seeks an optimal variable ordering for a set of data.
R8_SWAP swaps two R8's.
R8MAT_DET computes the determinant of an R8MAT.
R8MAT_PRINT prints an R8MAT.
R8VEC_ASCENDS determines if a double precision vector is (weakly) ascending.
R8VEC_SORT_BUBBLE_A ascending bubble sorts an R8VEC.
RANDP randomly partitions a set of M items into N clusters.
S_TO_R8 reads an R8 from a string.
S_WORD_COUNT counts the number of "words" in a string.
STANDN solves the single location problem in N dimensions.
TIMESTAMP prints the current YMDHMS date as a time stamp.
TRANSF transforms a data set to have zero mean and unit variance.
URAND returns a pseudo-random number uniformly distributed in [0,1].
WMEANS clusters data using the determinant criterion.
ZWEIGO organizes a set of data into two clusters.

You can go up one level to the FORTRAN90 source codes.

Last revised on 13 November 2006.

SPAETH2 Cluster Analysis Tools