There are 7 different
input/output file types for this software. I will refer them with their
default name: tree.txt, tree.bin, dctc.in, matrx.txt, dist.txt, nhdist.txt,
seq.txt. For examples of these files you can check the samples.
I describe the syntax
of the text files briefly.
matrix.txt:
-
First
line: "#number_of_letters_in_the_ABC:"
-
Second
line: the number of the letters in the alphabet from which the sequences
are built.
-
Third
line: "#sequence_length:"
-
Fourth
line: the length of the sequences you intend to use.
-
Fifth
line: " #number_of_matrices:"
-
Sixth
line: the number of transition matrices. At each edge the program
will chose from them randomly with uniform distribution.
-
Seventh
line: empty.
-
The
next line contains the keyword: "sequence".
-
From
the next line follows the elements of the matrices given from row
to row. The elements of the matrices are separated by space, the rows of
the matrices are separated by the sign |. The matrices are separated by
an empty line. After the last matrix there must be an empty line again.
-
In
the next line is the starting sequence. This sequence is at the
root, and this is the common ancestor of the other sequences on the tree.
The elements of the sequence are separated by spaces, they might be written
in separate lines.
dctc.in:
-
First
line: the number of leaves on the tree.
-
Second
line: empty.
-
From
the next line come the quartet splits. Each split is in a separate
line. Each split is given by a quartet on the following way: the
elements are separated by spaces. the first two elements are on one of
the sides of the split, the last two elements are on the other side. It
is legal to separate the two sides by the sign "|".
-
The
following line is empty.
-
The
last line contains only one element: the sign zero "0".
dist.txt, nhdist.txt
-
First
line: the number of leaves. It is the same as the number of rows
or columns in this matrices.
-
From
the second line come the elements of the matrix. They are given
from row to row. Each element is separated by a space. Each row is separated
by the sign "|".
seq.txt
-
First
line: "#number_of_leaves:"
-
Second
line: The number of leaves of the tree. this is the same as the
number of the sequences.
-
Third
line: "#number_of_letters_in_the_ABC:"
-
Fourth
line: The number of elements from which the sequences are built.
This is 2 in the case of a Neyman model, 4 for DNA.
-
Fifth
line: "#sequence_length:".
-
Sixth
line: the length of the sequences. Longer sequences tend to increase
the probability of a successful tree reconstruction.
-
Seventh
line: "#rooted:"
-
Eighth
line: "true" or "false". If the tree is rooted, the answer is "true",
otherwise the answer is "false".
-
Ninth
line: empty.
-
From
the next line start the sequences. Each sequence is written in separate
line. The elements of a sequence are separated by a space. The end of the
sequence is noted by the sign "|".
tree.txt
-
First
line: "Number of leaves:" followed by the number of leaves of the
tree.
-
Second
line: "Number of cuts:" followed by the number of internal edges.
-
Third
line: empty.
-
From
the next line come the internal edges. Each such edge is described
by the same syntax. The leaves are separated by an empty line which line
can contain also "-" signs. The syntax of a leaf is as follows:
-
First
line: the label of the internal edge followed by ". Internal
edge:".
-
Second
line: empty
-
Third
line: "Left:" followed by the labels of the leaves. on one of the
sides of the edge.
-
Fourth
line: "Right:" followed by the labels of the leaves on the other
side of the internal edge.
-
After
the last leave there is an empty line that may contain "-" signs.
-
The
last line can be two different kind:
-
"Unrooted
tree" if the tree does not have a root. Otherwise it is:
-
"Rooted
tree: root = " followed by the label of the edge where the root
is and ". Internal edge"
©1999
All Rights Reserved
e-mail
the Pagemaster
GoBack