TREE-QMC is a quartet-based method for estimating species trees directly from gene trees or characters, like the popular methods ASTRAL and Weighted ASTRAL/ASTER. However, TREE-QMC uses a different algorithmic approach than ASTRAL/ASTER, based on the Quartet Max Cut (QMC) framework of Snir and Rao.
To learn more about the TREE-QMC and weighted TREE-QMC, check out Han & Molloy, Genome Res, 2023 and Han & Molloy, bioRxiv, 2024, respectively.
TREE-QMC also implements some convenient features:
- TREE-QMC allows multi-labeled trees (i.e., MUL-trees) as input.
- TREE-QMC allows character data as input; in particular, it can be used to estimate species trees under the neutral Wright-Fisher model with MLE branch lengths, referred to as
bp
mode in Springer et al., J Heredity, 2020 and Molloy, Gatesy & Springer, Syst Biol, 2022. Although species trees can also be constructed from multi-state characters we do not make guarantees about performance. - TREE-QMC computes Partitioned Coalescence Support (PCS), specifically Quartet Quadrapartition Support (QQS) values, around a focal branch, as described by Gatesy et al., Mol Phy Evol, 2019.
Check out the tutorial for gene trees and the tutorial for character matrices.
TREE-QMC is based on the Quartet Max Cut (QMC) framework introduced by Sagi Snir and Satish Rao; see Snir & Rao, IEEE/ACM TCBB, 2010 and Avni, Cohen & Snir, Syst Biol, 2015. TREE-QMC contributes fast algorithms for constructing the quartet graph directly from the input trees, rather than explicitly enumerating all quartets or sampling quartets.
TREE-QMC now implements efficient (and brute force) algorithms for the quartet weighting schemes introduced by Chao Zhang and Siavash Mirarab; see Zhang & Mirarab, Mol Biol Evol, 2022. It also implements quartet support with the approach of Sayyari & Mirarab, Mol Biol Evol, 2016.
TREE-QMC uses MQLib for its max cut heuristic; see Dunning, Gupta, & Silberholz, INFORMS Journal on Computing, 2018.
TREE-QMC uses toms743 for its Lambert's W approximation; see Fritsch, Shafer & Crowley, Communications of the ACM, 1973 and Barry, Barry & Culligan-Hensley, ACM Transactions on Mathematical Software, 1995.
To build TREE-QMC, use commands:
git clone https://github.com/molloy-lab/TREE-QMC.git
cd TREE-QMC/external/MQLib
make
cd ../..
g++ -std=c++11 -O2 \
-I external/MQLib/include -I external/toms743 \
-o tree-qmc \
src/*.cpp external/toms743/toms743.cpp \
external/MQLib/bin/MQLib.a -lm \
-DVERSION=\"$(cat version.txt)\"
To run TREE-QMC, use command:
./tree-qmc -i <input file>
#1. For convenience, add the directory containing the tree-qmc
binary to your PATH
variable so that you can type tree-qmc
instead of <path to tree-qmc binary>tree-qmc
. For bash, this can be done by adding
export PATH=$PATH:"path to tree-qmc binary"
to ~/.bash_profile
file (or ~/.bashrc
file).
#2. If you are having strange problems, try removing the \r
characters from the input files and trying again:
cat file.txt | tr -d '\r' > newfile.txt
To see the TREE-QMC usage options, use command:
./tree-qmc -h
The output help message should be
=================================== TREE-QMC ===================================
BASIC USAGE:
tree-qmc (-i|--input) <input file>
**If the directory containing the tree-qmc binary is not part of $PATH, replace
eplace tree-qmc with <path to tree-qmc binary>/tree-qmc in the command above**
Help Options:
[-h|--help]
Prints this help message.
Input Options:
(-i|--input) <input file>
File with gene trees in newick format (required)
[(--chars)]
Input are characters rather than trees
Missing states are N, -, and ?
[(--bp)]
Input are binary characters i.e. bipartitions
Missing states are N, -, and ?
[(-a|-mapping) <mapping file>]
File with individual/leaf names (1st col) mapped to species (2nd col)
[(--root) <list of species separated by commas>]
Root species tree at given species if possible
[(--rootonly) <species tree file>]
Root species tree in file and then exit
[(--supportonly) <species tree file>]
Compute quartet support for species tree in file and then exit
[(--pcsonly) <species tree file>]
Compute partitioned coalescent support (PCS) for specified branch in
species tree in file (anotate branch with PCS) and then exit
Output Options:
[(-o|--output) <output file>]
File for writing output species tree (default: stdout)
[(--support)]
Compute quartet support for output species tree
[(--writetable) <table file>]
Write branch and quartet support information to CSV
[(--char2tree)]
Write character matrix as trees (newick strings) to output and exit
Algorithm Options:
[(--hybrid)]
Use hybrid weighting scheme (-w h)
[(--fast)]
Use fast algorithm that does not support weights or polytomies (-w f)
[(-B|--bayes)]
Use presets for bayesian support (-n 0.333 -x 1.0 -d 0.333)
[(-L|--lrt)]
Use presets for likelihood support (-n 0.0 -x 1.0 -d 0.0)
[(-S|--bootstrap)]
Use presets for boostrap support (-n 0 -x 100 -d 0)
[(-c|--contract) <float>]
Contract internal branches with support less than specified threshold
after mapping suport to the interval 0 to 1
Advanced Options:
[(-w|--weight) <character>]
Weighting scheme for quartets; see paper for details
-w n: none (default)
-w h: hybrid of support and length (recommended)
-w s: support only
-w l: length only
-w f: none fast
Refines polytomies arbitrarily so faster algorithm can be used
[(-n|--min) <float>]
Minimum value of input branch support
[(-x|--max) <float>]
Maximum value of input branch support
[(-d|--default) <float>]
Default branch support to use if branch input tree is missing support
Default branch length is 0.0
[(--norm_atax) <integer>]
Normalization scheme for artificial taxa; see paper for details
--norm_atax 0: none
--norm_atax 1: uniform
--norm_atax 2: non-uniform (default)
[(-e|--execution) <execution mode>]
-e 0: run efficient algorithm (default)
-e 1: run brute force algorithm for testing
-e 2: compute weighted quartets, then exit
-e 3: compute good and bad edges, then exit
[(-v|--verbose) <verbose mode>]
-v 0: write no subproblem information (default)
-v 1: write CSV with subproblem information (subproblem ID, parent
problem ID, depth of recursion, total # of taxa, # of artifical
taxa, species names)
-v 2: write CSV with subproblem information (info from v1 plus # of
of elements in f, # of pruned elements in f, # of zeroes in f)
[(--polyseed) <integer>]
Seeds random number generator with prior to arbitrarily resolving
polytomies. If seed is set to -1, system time is used;
otherwise, seed should be positive (default: 12345).
[(--maxcutseed) <integer>]
Seeds random number generator prior to calling the max cut heuristic
but after the preprocessing phase. If seed is set to -1, system time
is used; otherwise, seed should be positive (default: 1).
[--shared <use shared taxon data structure to normalize quartet weights>]
Do NOT use unless there are NO missing data!!!
Contact: Post issue to Github (https://github.com/molloy-lab/TREE-QMC/)
or email Yunheng Han ([email protected]) & Erin Molloy ([email protected])
If you use TREE-QMC or weighted TREE-QMC in your work, please cite:
Han and Molloy, 2023, Improving quartet graph construction for scalable
and accurate species tree estimation from gene trees, Genome Research,
http:doi.org/10.1101/gr.277629.122.
Han and Molloy, 2024, Improved robustness to gene tree incompleteness,
estimation errors, and systematic homology errors with weighted TREE-QMC,
bioRxiv, https://doi.org/10.1101/2024.09.27.615467.
================================================================================