SS_LG_ALIGN
Sequence/sequence Local Gap Alignment
SS_LG_ALIGN is a FORTRAN90 program
which implements some of the string
matching algorithms described in the reference [Chao].
These algorithms look for optimal local alignments of two strings
using linear space. (Compare the global alignment routines in
SS_GG_ALIGN.)
It's important to be able to compute alignments using "linear space",
that is, just a few vectors whose length N is equal to that
of a typical string. A quadratic algorithm would require a two
dimensional array of total dimension N*N. Realistic alignment
problems can involve strings of N=100,000 elements or more,
so a quadratic algorithm would be expensive or impossible to use.
The "matching" being considered does not actually require that every
element of string A match an identical element of string B.
Instead, the matching algorithm is essentially looking for the highest
scoring way of associating a portion of one string with a portion of the
other, allowing the operations of "mutation" (change one letter to
another), "deletion" (drop a string of consecutive letters) and
"insertion (insert a string of consecutive letters). Thus, at the same
time, we are measuring a sort of evolutionary distance between two
strings and a formal "editing distance" between them.
This set of routines assumes that an insertion or deletion of length
K is penalized using an "affine gap penalty formula" of the form:
Penalty = Gap_Open + K * Gap_Extend
This choice of penalty function has a major effect on the form
of the matching algorithms, particularly in the linear space case.
The score for the actual best matching is determined without explicitly
constructing the best matching. It is a matter of some
difficulty to recover the matching corresponding to the best score.
This is particularly true if the algorithm is a linear space one, which
discards a great deal of intermediate information. However, it is
possible to set up a recursive algorithm which determines the best
alignment, using only linear space.
Routines that use quadratic space are included as well, so the algorithms
can be compared for storage, speed, and correctness.
The names of the scoring and path routines include information
about whether they use a forward, backward, or recursive algorithm,
whether they compute the score or the path, and whether they use
linear or quadratic space. Thus, the routine
SS_LG_FSQ uses the forward algorithm to compute the score,
with quadratic space requirements.
Licensing:
The computer code and data files described and made available on this web page
are distributed under
the GNU LGPL license.
Languages:
SS_LG_ALIGN is available in
a FORTRAN90 version.
Related Data and Programs:
SS_GD_ALIGN,
a FORTRAN90 library which
globally aligns two sequences using a distance matrix.
SS_GG_ALIGN,
a FORTRAN90 library which
globally aligns two sequences using an affine gap penalty
SS_QG_ALIGN,
a FORTRAN90 library which
quasi-globally aligns two sequences using an affine gap penalty.
Reference:
-
Kun-Mao Chao, Ross Hardison, Webb Miller,
Recent Developments in Linear-Space Alignment Methods: A Survey,
Journal of Computational Biology,
Volume 1, Number 4, 1994, pages 271-291.
-
Eugene Myers, Webb Miller,
Optimal Alignments in Linear Space,
CABIOS,
Volume 4, number 1, 1988, pages 11-17.
-
Michael Waterman,
Introduction to Computational Biology,
Chapman and Hall, 1995,
ISBN: 0412993910,
LC: QH438.4.M33.W38.
Source Code:
Exa mples and Tests:
List of Routines:
-
A_INDEX sets up a reverse index for the amino acid codes.
-
A_TO_I4 returns the index of an alphabetic character.
-
CH_CAP capitalizes a single character.
-
CHVEC2_PRINT prints two vectors of characters.
-
CHVEC_PRINT prints a vector of characters.
-
GET_SEED returns a seed for the random number generator.
-
I4_RANDOM returns a random integer in a given range.
-
I4_SWAP switches two integer values.
-
I4_TO_A returns the I-th alphabetic character.
-
I4_TO_AMINO_CODE converts an integer to an amino code.
-
I4VEC2_COMPARE compares pairs of integers stored in two vectors.
-
I4VEC2_PRINT prints a pair of integer vectors.
-
I4VEC2_SORT_A ascending sorts a vector of pairs of integers.
-
I4VEC_REVERSE reverses the elements of an integer vector.
-
MUTATE applies a few mutations to a sequence.
-
PAM120 returns the PAM 120 substitution matrix.
-
PAM120_SCORE computes a single entry sequence/sequence matching score.
-
PAM200 returns the PAM 200 substitution matrix.
-
PAM200_SCORE computes a single entry sequence/sequence matching score.
-
R4MAT_IMAX returns the location of the maximum of a real M by N matrix.
-
R4VEC2_SUM_IMAX returns the index of the maximum sum of two real vectors.
-
S_EQI is a case insensitive comparison of two strings for equality.
-
S_TO_CHVEC converts a string to a character vector.
-
S_TO_I4 reads an integer value from a string.
-
SIMPLE_SCORE computes a single entry sequence/sequence matching score.
-
SORT_HEAP_EXTERNAL externally sorts a list of items into linear order.
-
SS_GG_BSL determines a global gap backward alignment score in linear space.
-
SS_GG_FSL determines a global gap forward alignment score in linear space.
-
SS_LG_BPQ determines a local gap backward alignment path in quadratic space.
-
SS_LG_BSL determines a local gap backward alignment score in linear space.
-
SS_LG_BSQ determines a local gap backward alignment score in quadratic space.
-
SS_LG_CORNERS determines the "corners" of an optimal local alignment.
-
SS_LG_FPQ determines a local gap forward alignment path in quadratic space.
-
SS_LG_FSL determines a local gap forward alignment score in linear space.
-
SS_LG_FSQ determines a local gap forward alignment score in quadratic space.
-
SS_LG_MATCH_PRINT prints a local gap alignment.
-
SS_LG_MATCH_SCORE scores a local gap alignment.
-
SS_LG_RPL determines a local gap recursive alignment path in linear space.
-
SS_LG_RPL_POP pops the data describing a subproblem off of the stack.
-
SS_LG_RPL_PUSH pushes the data describing a subproblem onto the stack.
-
TIMESTAMP prints the current YMDHMS date as a time stamp.
-
UNIFORM_01_SAMPLE is a portable random number generator.
-
WORD_LAST_READ returns the last word from a string.
-
WORD_NEXT_READ "reads" words from a string, one at a time.
You can go up one level to
the FORTRAN90 source codes.
Last revised on 28 December 2007.