A program for counting leaf-multi-labeled trees

The formulas used for these computations are described in
É. Czabarka, P.L. Erdős, V. Johnson, V. Moulton: Generating functions for multi-labeled trees Discrete Applied Mathematics 161 (2013) 107-117
and in Virginia Johnson's Ph.D. thesis, (Aug. 2012) Enumeration Results on Leaf-Labeled Trees.

Some background

In evolutionary biology, it is common practice to use leaf-labeled (or phylogenetic) trees to represent the evolution of species, populations, organisms, and the like. Technically speaking, such a tree is a simple, connected graph with no cycles, and it is leaf-labeled in case each of its leaves (i.e. vertices of degree 1) is labeled by precisely one element from some set. The set of labels corresponds to the set of species, populations or organisms under consideration.

It has become apparent that it can also be useful to employ a slightly more general type of tree when trying to understand, for example, gene evolution. In particular, due to processes such as gene (or genome) duplication or lateral gene transfer, trees can often arise in which more than one leaf is labeled by the same element of the label set. We call such trees leaf-multi-labeled trees. Note that in case each leaf if labeled only once, a leaf-multi-labeled tree is just a leaf-labeled tree. In addition to arising in the study of gene versus species evolution, leaf-multi-labeled trees have been used to construct phylogenetic networks, and they naturally arise in biogeography.

The code and some caviats

Code for binary trees and code for non-binary trees are available by clicking on the appropriate links; and can be run using SageMath. This code is old: when it was written, the scripting language for Sage was Python 2. Currently the scripting language is Python 3. (e.g. what used to be "print var" is now "print(var)") There is a Python 2 to Python 3 converter online: https://python2to3.com/ I do not know how well it works.

Output description

The function T(n,k) (where n is the number of leaves and k is the number of labels) returns six lists of counts, each containing the number of trees using at most k labels on 0,1,2,...,n leaves. The first and second lists count rooted leaf-multi-labeled binary trees, the difference is that the second list gives the count for those trees that use all k labels, while the first gives the count that do not use more than k labels. The third list is for marked leaf-multi-labeled binary trees, the fourth for marked leaf-multi-labeled binary trees which use all k labels, the fifth is for unrooted leaf-multi-labeled binary trees, and the sixth is for unrooted leaf-labeled binary trees which use all k labels.

For example for the call T(10,5) the program returns:

Number of leaves,number of labels
10 5
Rooted MUL Binary Trees
[0, 5, 15, 75, 495, 3600, 28275, 232500, 1979385, 17287050, 154041450]
Rooted MUL Binary Trees using all k labels
[0, 0, 0, 0, 0, 105, 2625, 42075, 554820, 6578550, 73169250]
Marked MUL Binary Trees
[0, 5, 25, 110, 600, 4200, 31730, 255750, 2144400, 18530875, 163771800]
Marked MUL Binary Trees using all k labels
[0, 0, 0, 0, 0, 120, 2925, 46065, 599670, 7041285, 77720100]
Unrooted MUL Binary Trees
[0, 5, 15, 35, 120, 600, 3530, 23250, 165510, 1243825, 9733950]
Unrooted MUL Binary Trees using all k labels
[0, 0, 0, 0, 0, 15, 300, 3990, 44850, 462735, 4550955]

This output tells you that for example there are 28275 rooted leaf-multi-labeled binary trees on 6 leaves that use at most 5 labels and 2625 of these use all 5 labels, and there are 9733950 unrooted leaf-multi-labeled binary trees on 10 leaves that use at most 5 labels.


The function G(n,k) (where n is the number of leaves and k is the number of labels) returns two lists of counts, each containing the number of trees using at most k labels on 0,1,2,...,n leaves. The first list counts rooted non-binary leaf-multi-labeled trees, labeled with at most k labels. The second list is for unrooted leaf-multi-labeled non-binary trees, labeled with at most k labels.

For example for the call G(9, 5) the program returns:

Rooted Non-binary Multi-leaf-labeled Trees
[0, 5, 15, 110, 965, 9376, 97775, 1068450, 12081605, 140160650]
Unrooted Non-binary Multi-leaf-labeled Trees
[0, 5, 65, 335, 2840, 27151, 279465, 3028655, 34035550, 393044405]

This output tells you that for example there are 97,775 rooted leaf-multi-labeled non-binary trees on 6 leaves that use at most 5 labels and there are 34,035,550 unrooted leaf-multi-labeled non-binary trees on 8 leaves that use at most 5 labels.
Return to home page