Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with the read.paml_rst function while reading rst file from CodeML/PAML #72

Open
JPFQueiroz opened this issue Nov 27, 2021 · 0 comments

Comments

@JPFQueiroz
Copy link

Hello everyone,

I used CodeML in PAML through the graphical user interface PamlX to estimate branch lengths for a phylogeny of orthologous proteins given a known species tree. Also, I used the marginal reconstruction option to infer ancestral states at internal nodes that is useful to map substitutions along the branches.

I used the function read.paml_rst from treeio package (v 1.18.0) to read the rst file generated by CodeML/PAML. When I plotted the phylo object and the substitutions (subs) contained in the treedata class generated by read.paml_rst, I noted that branch lengths were atributted erroneously to their respective branches. To verify whether this error occurred with read.paml_rst and not with PamlX, I manually removed the newick text of the tree within the rst file and I saved it on a separate newick file. I read this file on R and I plotted it using ggtree, and the branch lengths were attributed correctly to their respective branches. So, I tried to fix the phylo object within the treedata class using the correct phylo object, but when I did it, I noted that substitutions were mapped to the wrong branches.

Example of the code I used to read the rst file and to plot the tree mapping the substitutions along branches.

library(treeio) library(ggtree) library(ggplot2) file.path("CodeML", "Run1", "rst") -> fp1 paml.data_01 <- read.paml_rst(rstfile = fp1, type = "Marginal") ggtree(paml.data_01, color = "darkgray", root.position = 0, ladderize = FALSE, branch.length = "branch.length") + xlim_tree(1.5) + geom_hilight(node = 76, fill = "steelblue", alpha = .25) + geom_hilight(node = 72, fill = "yellow", alpha = .25) + geom_hilight(node = 64, fill = "firebrick", alpha = .25) + geom_hilight(node = 59, fill = "green", alpha = .25) + geom_hilight(node = 46, fill = "orange", alpha = .25) + geom_tiplab(size = 2, hjust = -.5) + ylim(0, 50) + geom_treescale(x = .2, y = 10, width = .1, linesize = .5, fontsize = 3, offset = .5) + geom_rootedge(rootedge = .01, colour = "darkgray") + geom_text(aes(x=branch, label=subs), size=1, vjust=-.5, color="darkgreen")

I'm trying to understand how read.paml_rst is generating a erroneous phylo object. I did note that values of branch length are all correct, but they were attributed to the wrong branches. More specifically, the edge and edge.length listed on the phylo object within the treedata class do not match.

Is this a bug or did I specify something wrong while reading and plotting data from the rst file?

Sincerely,

Pedro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant