PhyloX is a Python package with tools for generating, manipulating,
+ and analysing phylogenetic networks. It uses the NetworkX package
+ (
In the study of the evolutionary history of biological species and
+ languages, it is common to represent putative histories using graphs.
+ Traditionally, at least in biology, these graphs are most often trees,
+ such as the well-known tree drawn by Charles Darwin in one of his
+ notebooks. A tree like this is called a phylogenetic tree. In some
+ cases, the evolutionary history includes complex processes like
+ horizontal gene transfer and hybridisation. These processes cause a
+
A
A network is
When analysing or comparing phylogenetic networks or phylogenetic
+ network methods, it can be helpful to extract some (numerical)
+ parameters from the networks. Some of the most used properties are
+ the
In research on phylogenetic networks, it is common to restrict + attention to some well-known classes of phylogenetic networks. These + classes put additional restrictions on the definition of a network, + for the benefit of computational efficiency, to model certain + biological restraints, or for both.
+Kong et al.
+ (
A basic structure in any network or tree is the
+
A common modification to a phylogenetic network is to
+
These modifications are used in computational tools, for example
+ to reconstruct networks from ancestral profiles
+ (
For phylogenetic inference problems, it is often necessary to use + heuristics that search through a space of networks. Such a space of + networks takes the shape of a graph, whose objects are all networks + with a common set of leaf labels (the sampled taxa) and sometimes + also a set number of reticulations. The edges of the graph + correspond to small changes made to a network: there is an edge + between two networks if one can make a modification to one of the + networks to arrive at the second network.
+The modifications that are allowed are well-defined as types of
+
The names for vertical moves have not been standardised, but they
+ generally do the same. A vertical move that removes a reticulation
+ removes an incoming edge of a reticulation node, and then suppresses
+ the resulting degree 2 nodes. A vertical move that adds a
+ reticulation does the reverse: it
As mentioned, rearrangement moves can be used to traverse a space
+ of networks. This is used, for example, to sample posterior
+ distributions in Bayesian analyses Zhang et al.
+ (
To test phylogenetic network methods, one either needs to source + or create a test set of networks. Creating them is often the simpler + option, so methods to randomly generate phylogenetic networks are + ready at hand. Moreover, these methods are often based on + evolutionary models that are defined on a high level, i.e., with + explicit events for processes such as speciation, extinction, and + hybridisation.
+The paper
+ (
Because phylogenetic networks are graphs, a common representation
+ is as a list of edges. Another commonly used representation is the
+ extended Newick format
+ (
PhyloX is equipped to handle all the aspects of phylogenetic
+ networks mentioned in the previous section. It is written primarily
+ for explorative research into algorithmic aspects of phylogenetic
+ networks, although application-focused implementations can also be
+ realised with it. An example is the software
+ (
PhyloX handles all stages of a phylogenetic workflow involving
+ networks. This starts and ends with the input/output of networks.
+ The
+
from phylox import DiNetwork
+from phylox.constants import LABEL_ATTR
+
+network = DiNetwork()
+network.add_edges_from(((0,1),(1,2),(1,3)))
+network.nodes[2][LABEL_ATTR] = "leaf1"
+network.nodes[3][LABEL_ATTR] = "leaf2"
+ The same can be achieved with a modified initialisation of + DiNetwork:
+from phylox import DiNetwork
+
+network = DiNetwork(
+ edges=((0,1),(1,2),(1,3)),
+ labels=[(2,"leaf1"), (3,"leaf2")]
+)
+ Alternatively, the network can be initialised from a Newick + string with
+from phylox import DiNetwork
+
+network = DiNetwork.from_newick("((leaf1,leaf2));")
+ NetworkX also provides functionality to output networks in
+ several formats. For example, it is possible to output the list of
+ edges or to create a drawing of the network. Of course, output as
+ Newick string is also available with PhyloX (with
+
Networks can also be generated randomly in PhyloX, which can be
+ utilised to create test sets for new methods. The implemented
+ generators are based on the code from
+ (
The latter makes use of a large part of the functionality of
+ PhyloX, especially when sampling orchard networks: after generating
+ or choosing a starting network, the
+
from phylox.generators.randomTC import generate_network_random_tree_child_sequence
+from phylox.generators.mcmc import sample_mcmc_networks
+from phylox.classes import is_orchard
+from phylox.rearrangement.move import MoveType
+
+# Generate an arbitrary orchard network with 10 leaves and 5 reticulations
+start_network = generate_network_random_tree_child_sequence(10, 5, seed=4321)
+# Generate 100 orchard networks with 10 leaves and 5 reticulations
+sampled_networks = sample_mcmc_networks(
+ start_network,
+ {MoveType.TAIL: 0.5, MoveType.HEAD: 0.5},
+ number_of_samples=100,
+ burn_in=5,
+ restriction_map=is_orchard,
+ add_root_if_necessary=True,
+ correct_symmetries=False,
+ seed=1234,
+)
+# Write the sampled networks to a file
+with open("sampled_networks.nwk", "w") as f:
+ for network in sampled_networks:
+ f.write(network.newick() + "\n")
+ For this sampler to work correctly, the space of networks that is + sampled from needs to be connected. That is, it has to be possible + to transform each network into each other network in the space using + the selected rearrangement moves. In the example above, this means + that the space of orchard networks with 10 leaves and 5 + reticulations needs to be connected under tail moves and head moves + (i.e., rSPR moves).
+This is something the user needs to check or prove themselves, as
+ it is not viable to check this computationally. Fortunately, such
+ connectivity results have been studied in detail
+ (
Based on all the properties above, PhyloX provides a toolkit to
+ compare networks. For example, it can be used to determine whether
+ two networks are
+
Currently, no Python package enables a full workflow for analysing
+ properties and methods of phylogenetic networks. Isolated scripts for
+ this purpose do appear on GitHub or as pseudocode regularly, most
+ often as part of publications studying one method or one property
+ (
This package, PhyloX, aims to bring these scripts together: it + standardises implementations of several basic objects related to + phylogenetic networks, such as the networks themselves, the labelling + of the nodes, and rearrangement moves. It currently implements a + limited but important set of basic functions: I/O for networks (e.g., + lists of edges and extended Newick format), network generation for + test sets, comparing networks resulting from reconstruction methods, + and computing several well-used network properties such as the + reticulation number, the level, and the number of cherries.
+As mentioned above, there are currently no Python packages that + enable a complete workflow for phylogenetic networks. However, some + Python packages are available that enable part of this workflow or a + very similar one. In this section, we compare the functionality of + several of these packages to PhyloX, focussing only on usability for + phylogenetic networks.
+Like PhyloX,
+
However, it has very few methods for phylogenetic networks, and + most of those methods are also included in PhyloX. Another + advantage of using PhyloX over PhyloNetwork is the inclusion of + explicit random seeds. This is an important factor for the + reproducibility of research.
+Note that code from PhyloNetwork and PhyloX may be easy to + combine, as both use NetworkX to implement the phylogenetic + network class.
+This phylogenetics module,
+
Like Biopython’s phylogenetics package, the
+
The code of PhyloX is available as an open-source project on
+
Most of the code has been written in the form of separate scripts + during the author’s PhD project, which was conducted under Leo van + Iersel’s Vidi grant: 639.072.6
+Anyone willing to contribute is very welcome to do so via pull + requests and issues on GitHub!
+