FASTA

FASTA is a dataset directory which contains examples of FASTA datasets, which represent nucleoride or peptide sequences, with each base-pair or amino acid represented by a single letter code.

A FASTA file contains a list of sequence data. Each sequence is represented by a pair of records. The first record begins with the character '>' followed by a comment or description. The second record is a series of characters representing amino acid or nucleic acids, using the IUB/IUPAC codes.

For a nucleic acid, the codes are:

        A --> adenosine
        C --> cytidine
        G --> guanine
        T --> thymidine
        U --> uridine
        R --> G A (purine)
        Y --> T C (pyrimidine)
        K --> G T (keto)
        M --> A C (amino)
        S --> G C (strong)
        W --> A T (weak)
        B --> G T C
        D --> G A T
        H --> A C T
        V --> G C A
        N --> A G C T (any)
        -  gap of indeterminate length

For an amino acid, the codes are:

        A ALA alanine
        B ASX aspartate or asparagine
        C CYS cystine
        D ASP aspartate
        E GLU glutamate
        F PHE phenylalanine
        G GLY glycine
        H HIS histidine
        I ILE isoleucine
        K LYS lysine
        L LEU leucine
        M MET methionine
        N ASN asparagine
        P PRO proline
        Q GLN glutamine
        R ARG arginine
        S SER serine
        T THR threonine
        U     selenocysteine
        V VAL valine
        W TRP tryptophan
        Y TYR tyrosine
        Z GLX glutamate or glutamine
        X     any
        *     translation stop
        -     gap of indeterminate length

Licensing:

The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.

Datasets:

two.fa, two sequences.
three.fa, 22 sequences.

You can go up one level to the DATASETS directory.

Last revised on 21 May 2017.