PS_GG_ALIGN
Profile/Sequence Global Gap Alignment


PS_GG_ALIGN is a FORTRAN90 library which implements some of the string alignment algorithms described in the reference [Chao].

These algorithms carry out the computation in linear space, and compute not just the optimal alignment score, but also the corresponding optimal alignment. The only alignments considered are global, that is, the entirety of both strings are compared. Gaps in the alignment are assigned an affine gap penalty.

It's important to be able to compute alignments using "linear space", that is, just a few vectors whose length N is equal to that of a typical string. A quadratic algorithm would require a two dimensional array of total dimension N*N. Realistic alignment problems can involve strings of N=100,000 elements or more, so a quadratic algorithm would be expensive or impossible to use.

The optimal alignment score is computed without explicitly constructing the corresponding alignment. So a major feature of the algorithms is how to backtrack from the score to retrieve the alignment. It is a matter of some difficulty to recover the matching, particularly if the best score was calculated with a linear space algorithm, which discards a great deal of intermediate information. However, the linear space algorithm implemented here can also compute the optimal matching, based on the idea of a recursive subdivision of the problem.

This set of algorithms does not actually match a pair of sequences, but rather matches a sequence to a "profile". A profile is constructed based on information from many sequences, and can be thought of as a "generalized sequence", or a set of indices, where for each index we specify the likelihood that each possible nucleic acid will occur. These likelihoods can then be used to score the alignments we consider with the new candidate sequence.

This set of routines assumes that an insertion or deletion of length K is penalized using an "affine gap penalty formula" of the form:

Penalty = Gap_Open + K * Gap_Extend
This choice of penalty function has a major effect on the form of the matching algorithms, particularly in the linear space case. For the profile problems covered by these algorithms, the gap penalties are further adjusted using profile-position percentages specified by the user.

Routines that use quadratic space are included as well, so the algorithms can be compared for storage, speed, and correctness.

Licensing:

The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.

Languages:

PS_GG_ALIGN is available in a FORTRAN90 version.

Related Data and Programs:

PS_LG_ALIGN, a FORTRAN90 library which implements a profile/sequence local alignment using an affine gap penalty.

PS_QG_ALIGN, a FORTRAN90 library which implements a profile/sequence quasiglobal alignment using an affine gap penalty.

SS_GD_ALIGN, a FORTRAN90 library which globally aligns two sequences using a distance matrix.

SS_GG_ALIGN, a FORTRAN90 library which globally aligns two sequences using an affine gap penalty.

SS_LG_ALIGN, a FORTRAN90 library which locally aligns two sequences using an affine gap penalty.

SS_QG_ALIGN, a FORTRAN90 library which quasi-globally aligns two sequences using an affine gap penalty.

Reference:

  1. Kun-Mao Chao, Ross Hardison, Webb Miller,
    Recent Developments in Linear-Space Alignment Methods: A Survey,
    Journal of Computational Biology,
    Volume 1, Number 4, 1994, pages 271-291.
  2. Eugene Myers, Webb Miller,
    Optimal Alignments in Linear Space,
    CABIOS,
    Volume 4, number 1, 1988, pages 11-17.
  3. Albert Nijenhuis, Herbert Wilf,
    Combinatorial Algorithms for Computers and Calculators,
    Second Edition,
    Academic Press, 1978,
    ISBN: 0-12-519260-6,
    LC: QA164.N54.
  4. Michael Waterman,
    Introduction to Computational Biology,
    Chapman and Hall, 1995,
    ISBN: 0412993910,
    LC: QH438.4.M33.W38.

Source Code:

Examples and Tests:

List of routines:

You can go up one level to the FORTRAN90 source codes.


Last revised on 17 December 2007.