Escape to the main menu

There are 7 different input/output file types for this software. I will refer them with their default name: tree.txt, tree.bin, dctc.in, matrx.txt, dist.txt, nhdist.txt, seq.txt. For examples of these files you can check the samples.
I describe the syntax of the text files briefly.

matrix.txt:

First line: "#number_of_letters_in_the_ABC:"
Second line: the number of the letters in the alphabet from which the sequences are built.
Third line: "#sequence_length:"
Fourth line: the length of the sequences you intend to use.
Fifth line: " #number_of_matrices:"
Sixth line: the number of transition matrices. At each edge the program will chose from them randomly with uniform distribution.
Seventh line: empty.
The next line contains the keyword: "sequence".
From the next line follows the elements of the matrices given from row to row. The elements of the matrices are separated by space, the rows of the matrices are separated by the sign |. The matrices are separated by an empty line. After the last matrix there must be an empty line again.
In the next line is the starting sequence. This sequence is at the root, and this is the common ancestor of the other sequences on the tree. The elements of the sequence are separated by spaces, they might be written in separate lines.

dctc.in:

First line: the number of leaves on the tree.
Second line: empty.
From the next line come the quartet splits. Each split is in a separate line. Each split is given by a quartet on the following way: the elements are separated by spaces. the first two elements are on one of the sides of the split, the last two elements are on the other side. It is legal to separate the two sides by the sign "|".
The following line is empty.
The last line contains only one element: the sign zero "0".

dist.txt, nhdist.txt

First line: the number of leaves. It is the same as the number of rows or columns in this matrices.
From the second line come the elements of the matrix. They are given from row to row. Each element is separated by a space. Each row is separated by the sign "|".

seq.txt

First line: "#number_of_leaves:"
Second line: The number of leaves of the tree. this is the same as the number of the sequences.
Third line: "#number_of_letters_in_the_ABC:"
Fourth line: The number of elements from which the sequences are built. This is 2 in the case of a Neyman model, 4 for DNA.
Fifth line: "#sequence_length:".
Sixth line: the length of the sequences. Longer sequences tend to increase the probability of a successful tree reconstruction.
Seventh line: "#rooted:"
Eighth line: "true" or "false". If the tree is rooted, the answer is "true", otherwise the answer is "false".
Ninth line: empty.
From the next line start the sequences. Each sequence is written in separate line. The elements of a sequence are separated by a space. The end of the sequence is noted by the sign "|".

tree.txt

First line: "Number of leaves:" followed by the number of leaves of the tree.
Second line: "Number of cuts:" followed by the number of internal edges.
Third line: empty.
From the next line come the internal edges. Each such edge is described by the same syntax. The leaves are separated by an empty line which line can contain also "-" signs. The syntax of a leaf is as follows:

First line: the label of the internal edge followed by ". Internal edge:".
Second line: empty
Third line: "Left:" followed by the labels of the leaves. on one of the sides of the edge.
Fourth line: "Right:" followed by the labels of the leaves on the other side of the internal edge.

After the last leave there is an empty line that may contain "-" signs.
The last line can be two different kind:

"Unrooted tree" if the tree does not have a root. Otherwise it is:
"Rooted tree: root = " followed by the label of the edge where the root is and ". Internal edge"

GoBack