FASTA is a dataset directory which contains examples of FASTA datasets, which represent nucleoride or peptide sequences, with each base-pair or amino acid represented by a single letter code.
A FASTA file contains a list of sequence data. Each sequence is represented by a pair of records. The first record begins with the character '>' followed by a comment or description. The second record is a series of characters representing amino acid or nucleic acids, using the IUB/IUPAC codes.
For a nucleic acid, the codes are:
A --> adenosine C --> cytidine G --> guanine T --> thymidine U --> uridine R --> G A (purine) Y --> T C (pyrimidine) K --> G T (keto) M --> A C (amino) S --> G C (strong) W --> A T (weak) B --> G T C D --> G A T H --> A C T V --> G C A N --> A G C T (any) - gap of indeterminate length
For an amino acid, the codes are:
A ALA alanine B ASX aspartate or asparagine C CYS cystine D ASP aspartate E GLU glutamate F PHE phenylalanine G GLY glycine H HIS histidine I ILE isoleucine K LYS lysine L LEU leucine M MET methionine N ASN asparagine P PRO proline Q GLN glutamine R ARG arginine S SER serine T THR threonine U selenocysteine V VAL valine W TRP tryptophan Y TYR tyrosine Z GLX glutamate or glutamine X any * translation stop - gap of indeterminate length
The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.
You can go up one level to the DATASETS directory.