SS_GD_ALIGN
Sequence/sequence Global Distance Alignment
SS_GD_ALIGN
is a FORTRAN90 library which
implements some of the string
matching algorithms described in the reference [Chao].
A global
alignment is attempted between the strings, and similarity is measured by
a sort of distance. These algorithms carry out the computation in
linear space, and compute not just the optimal alignment
score, but the corresponding optimal alignment.
It's important to be able to compute alignments using "linear space",
that is, just a few vectors whose length N is equal to that
of a typical string. A quadratic algorithm would require a two
dimensional array of total dimension N*N. Realistic alignment
problems can involve strings of N=100,000 elements or more,
so a quadratic algorithm would be expensive or impossible to use.
The score for the actual best matching is determined without
constructing the best matching, and in fact it is a matter of some
difficulty to recover the matching, particularly if the algorithm is a
linear space one, which discards a great deal of intermediate information.
However, a linear space algorithm included here can also compute the
optimal matching, based on the idea of a recursive subdivision of the
problem.
Routines that use quadratic space are included as well, so the algorithms
can be compared for storage, speed, and correctness.
Licensing:
The computer code and data files made available on this web page
are distributed under
the GNU LGPL license.
Languages:
SS_GD_ALIGN is available in
a FORTRAN90 version.
Related Data and Programs:
PS_GG_ALIGN,
a FORTRAN90 library which
implements a profile/sequence global alignment using an affine gap penalty.
PS_LG_ALIGN,
a FORTRAN90 library which
implements a profile/sequence local alignment using an affine gap penalty.
PS_QG_ALIGN,
a FORTRAN90 library which
implements a profile/sequence quasiglobal alignment using an affine gap penalty.
SS_GG_ALIGN,
a FORTRAN90 library which
globally aligns two sequences using an affine gap penalty.
SS_LG_ALIGN,
a FORTRAN90 library which
locally aligns two sequences using an affine gap penalty.
SS_QG_ALIGN,
a FORTRAN90 library which
quasi-globally aligns two sequences using an affine gap penalty.
Reference:
-
Kun-Mao Chao, Ross Hardison, Webb Miller,
Recent Developments in Linear-Space Alignment Methods: A Survey,
Journal of Computational Biology,
Volume 1, Number 4, 1994, pages 271-291.
-
Eugene Myers, Webb Miller,
Optimal Alignments in Linear Space,
CABIOS,
Volume 4, number 1, 1988, pages 11-17.
-
Michael Waterman,
Introduction to Computational Biology,
Chapman and Hall, 1995,
ISBN: 0412993910,
LC: QH438.4.M33.W38.
Source Code:
Examples and Tests:
List of Routines:
-
A_INDEX sets up a reverse index for the amino acid codes.
-
A_TO_I4 returns the index of an alphabetic character.
-
CH_CAP capitalizes a single character.
-
CHVEC2_PRINT prints two vectors of characters.
-
I4_SWAP switches two integer values.
-
I4_TO_A returns the I-th alphabetic character.
-
I4VEC2_COMPARE compares pairs of integers stored in two vectors.
-
I4VEC2_PRINT prints a pair of integer vectors, with an optional title.
-
I4VEC2_SORT_A ascending sorts a vector of pairs of integers.
-
I4VEC_REVERSE reverses the elements of an integer vector.
-
PAM120 returns the PAM 120 substitution matrix.
-
PAM120_SCORE computes a single entry sequence/sequence matching score.
-
PAM200 returns the PAM 200 substitution matrix.
-
PAM200_SCORE computes a single entry sequence/sequence matching score.
-
RVEC2_SUM_IMAX returns the index of the maximum sum of two real vectors.
-
S_EQI is a case insensitive comparison of two strings for equality.
-
S_TO_CVEC converts a string to a character vector.
-
S_TO_I4 reads an integer value from a string.
-
SIMPLE_SCORE computes a single entry sequence/sequence matching score.
-
SORT_HEAP_EXTERNAL externally sorts a list of items into linear order.
-
SS_GD_BPQ does a global distance backward alignment path in quadratic space.
-
SS_GD_BSL does a global distance backward alignment score in linear space.
-
SS_GD_BSQ does a global distance backward alignment score in quadratic space.
-
SS_GD_FPQ does a global distance forward alignment path in quadratic space.
-
SS_GD_FSL does a global distance forward alignment score in linear space.
-
SS_GD_FSQ does a global distance forward alignment score in quadratic space.
-
SS_GD_MATCH_PRINT prints a global distance alignment.
-
SS_GD_RPL does a global distance recursive alignment path in linear space.
-
TIMESTAMP prints the current YMDHMS date as a time stamp.
-
WORD_LAST_READ returns the last word from a string.
-
WORD_NEXT_READ "reads" words from a string, one at a time.
You can go up one level to
the FORTRAN90 source codes.
Last revised on 20 January 2011.