Skip to content

Commit

Permalink
Merge branch 'master' into statSTR_onlypassing
Browse files Browse the repository at this point in the history
  • Loading branch information
aryarm authored Nov 6, 2024
2 parents cce4f0d + 42740ec commit 91091ec
Show file tree
Hide file tree
Showing 71 changed files with 4,711 additions and 268 deletions.
10 changes: 5 additions & 5 deletions .devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,7 @@
// Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
"image": "mcr.microsoft.com/devcontainers/base:jammy",
"features": {
"ghcr.io/rocker-org/devcontainer-features/miniforge:1": {
"version": "latest",
"variant": "Mambaforge"
}
"ghcr.io/rocker-org/devcontainer-features/miniforge:2": {}
},

// Features to add to the dev container. More info: https://containers.dev/features.
Expand All @@ -23,7 +20,10 @@
// Configure tool-specific properties.
"customizations": {
"vscode": {
"extensions": ["ms-python.python"],
"extensions": [
"ms-python.python",
"ms-vscode.live-server"
],
"settings": {
"python.condaPath": "/opt/conda/condabin/conda",
"python.defaultInterpreterPath": "/opt/conda/envs/trtools/bin/python",
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/conventional-prs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ jobs:
statuses: write # for amannn/action-semantic-pull-request to mark status of analyzed PR
runs-on: ubuntu-latest
steps:
- uses: amannn/action-semantic-pull-request@v5.4.0
- uses: amannn/action-semantic-pull-request@v5
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
6 changes: 3 additions & 3 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ jobs:
- { python: "3.12", os: "ubuntu-latest", session: "tests" }
# - { python: "3.11", os: "windows-latest", session: "tests" }
- { python: "3.8", os: "macos-13", session: "tests" }
- { python: "3.9", os: "macos-latest", session: "tests" }
- { python: "3.8", os: "ubuntu-latest", session: "size" }
- { python: "3.8", os: "ubuntu-latest", session: "coverage" }

Expand All @@ -34,11 +35,10 @@ jobs:
- name: Check out the repository
uses: actions/checkout@v4

- name: Setup Mambaforge
- name: Setup Miniforge
uses: conda-incubator/setup-miniconda@v3
with:
activate-environment: trtools
miniforge-variant: Mambaforge
auto-activate-base: false
miniforge-version: latest
use-mamba: true
Expand Down Expand Up @@ -101,7 +101,7 @@ jobs:
uses: actions/checkout@v4

- name: Check for large files
uses: actionsdesk/lfs-warning@v3.2
uses: ppremk/lfs-warning@v3.3
with:
token: ${{ secrets.GITHUB_TOKEN }} # Optional
filesizelimit: 500000b
Expand Down
178 changes: 1 addition & 177 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@

.. a location that the doc/index.rst uses for including this file
.. before_header
.. image:: https://github.com/codespaces/badge.svg
:width: 160
:target: https://codespaces.new/gymrek-lab/TRTools
Expand All @@ -13,183 +9,11 @@
:target: https://codecov.io/gh/gymrek-lab/TRTools


.. a location that the doc/index.rst uses for including this file
.. after_header
TRTools
=======

.. a location that the doc/index.rst uses for including this file
.. after_title
TRTools includes a variety of utilities for filtering, quality control and analysis of tandem repeats downstream of genotyping them from next-generation sequencing. It supports multiple recent genotyping tools (see below).

See full documentation and examples at https://trtools.readthedocs.io/en/stable/.
See full documentation and examples at https://trtools.readthedocs.io

If you use TRTools in your work, please cite: Nima Mousavi, Jonathan Margoliash, Neha Pusarla, Shubham Saini, Richard Yanicky, Melissa Gymrek. (2020) TRTools: a toolkit for genome-wide analysis of tandem repeats. Bioinformatics. (https://doi.org/10.1093/bioinformatics/btaa736)

Install
-------

Note: TRTools supports Python versions 3.8 and up. We do not officially support python version 3.7 as it is `end of life <https://devguide.python.org/versions/#status-of-python-versions>`_, but we believe TRTools likely works with it from previous testing results.

With conda
^^^^^^^^^^

::

conda install -c conda-forge -c bioconda trtools

Optionally install :code:`bcftools` which is used to prepare input files for TRTools (and :code:`ART` which is used by simTR) by running:

::

conda install -c conda-forge -c bioconda bcftools art

With pip
^^^^^^^^

First install :code:`htslib` (which contains :code:`tabix` and :code:`bgzip`). Optionally install :code:`bcftools`.
These are used to prepare input files for TRTools and aren't installed by pip.

Then run:

::

pip install --upgrade pip
pip install trtools

Note: TRTools installation may fail for pip version 10.0.1, hence the need to upgrade pip first

Note: if you will run or test :code:`simTR`, you will also need to install `ART <https://www.niehs.nih.gov/research/resources/software/biostatistics/art/index.cfm>`_. The simTR tests will only run if the executable :code:`art_illumina` is found on your :code:`PATH`. If it has been installed, :code:`which art_illumina` should return a path.

From source
^^^^^^^^^^^

To install from source (only recommended for development) clone the TRTools repository from `github <https://github.com/gymrek-lab/TRTools/>`_ and checkout the branch you're interested in::

git clone -b master https://github.com/gymrek-lab/TRTools
cd TRTools/

Now, create 1) a conda environment with our development tools and 2) a virtual environment with our dependencies and an editable install of TRTools::

conda env create -n trtools -f dev-env.yml
conda run -n trtools poetry install

Now, whenever you'd like to run/import pytest or TRTools, you will first need to activate both environments::

conda activate trtools
poetry shell

With Docker
^^^^^^^^^^^

Please refer to `the biocontainers registry for TRTools <https://biocontainers.pro/tools/trtools>`_ for all of our images. To use the most recent release, run the following command::

docker pull quay.io/biocontainers/trtools:latest

Tools
-----
TRTools includes the following tools.

* `mergeSTR <https://trtools.readthedocs.io/en/stable/source/mergeSTR.html>`_: a tool to merge VCF files across multiple samples genotyped using the same tool
* `dumpSTR <https://trtools.readthedocs.io/en/stable/source/dumpSTR.html>`_: a tool for filtering VCF files with TR genotypes
* `qcSTR <https://trtools.readthedocs.io/en/stable/source/qcSTR.html>`_: a tool for generating various quality control plots for a TR callset
* `statSTR <https://trtools.readthedocs.io/en/stable/source/statSTR.html>`_: a tool for computing various statistics on VCF files
* `compareSTR <https://trtools.readthedocs.io/en/stable/source/compareSTR.html>`_: a tool for comparing TR callsets
* `associaTR <https://trtools.readthedocs.io/en/stable/source/associaTR.html>`_: a tool for testing TR length-phenotype associations (e.g., running a TR GWAS)
* `prancSTR <https://trtools.readthedocs.io/en/stable/source/prancSTR.html>`_: a tool for identifying somatic mosacisim at TRs. Currently only compatible with HipSTR VCF files. (*beta mode*)
* `simTR <https://trtools.readthedocs.io/en/stable/source/simTR.html>`_: a tool for simulating next-generation sequencing reads from TR regions. (*beta mode*)
* `annotaTR <https://trtools.readthedocs.io/en/stable/source/annotaTR.html>`_: a tool for annotating TR VCF files with dosage or other metadata and optionally converting to PGEN output.

Type :code:`<command> --help` to see a full set of options.

It additionally includes a python library, :code:`trtools`, which can be accessed from within Python scripts. e.g.::

import trtools.utils.utils as stls
allele_freqs = {5: 0.5, 6: 0.5} # 50% of alleles have 5 repeat copies, 50% have 6
stls.GetHeterozygosity(allele_freqs) # should return 0.5

Usage
-----

We recommend new users start with the example commands described in the `command-line interface for each tool <https://trtools.readthedocs.io/en/stable/UTILITIES.html>`_.
We also suggest going through our `vignettes <https://trtools.readthedocs.io/en/stable/VIGNETTES.html>`_ that walk through some example workflows using TRTools.

Supported TR Callers
--------------------
TRTools supports VCFs from the following TR genotyping tools:

* AdVNTR_
* ExpansionHunter_
* GangSTR_ version 2.4 or higher
* HipSTR [`main repo <https://github.com/tfwillems/HipSTR>`_] [`Gymrek Lab repo <https://github.com/gymrek-lab/hipstr>`_]
* PopSTR_ version 2 or higher

See our description of the `features and example use-cases <https://trtools.readthedocs.io/en/stable/CALLERS.html>`_ of each of these tools.

..
please ensure this list of links remains the same as the one in the main README
.. _AdVNTR: https://advntr.readthedocs.io/en/latest/
.. _ExpansionHunter: https://github.com/Illumina/ExpansionHunter
.. _GangSTR: https://github.com/gymreklab/gangstr
.. _HipSTR: https://hipstr-tool.github.io/HipSTR/
.. _PopSTR: https://github.com/DecodeGenetics/popSTR

Testing
-------
After you've installed TRTools, we recommend running our tests to confirm that TRTools works properly on your system. Just execute the following::

test_trtools.sh

Development Notes
-----------------

* TRTools only currently supports diploid genotypes. Haploid calls, such as those on male chrX or chrY, are not yet supported but should be coming soon.

Contact Us
----------
Please submit an issue on the `trtools github <https://github.com/gymrek-lab/TRTools>`_

.. _Contributing:

Contributing
------------
We appreciate contributions to TRTools. If you would like to contribute a fix or new feature, follow these guidelines:

1. Consider `discussing <https://github.com/gymrek-lab/TRTools/issues>`_ your solution with us first so we can provide help or feedback if necessary.
#. Install TRTools from source `as above <From source_>`_.
#. Fork the TRTools repository.
#. Create a branch off of :code:`master` titled with the name of your feature.
#. Make your changes.
#. If you need to add a dependency or update the version of a dependency, you can use the :code:`poetry add` command.

* You should specify a `version constraint <https://python-poetry.org/docs/master/dependency-specification#version-constraints>`_ when adding a dependency. Use the oldest version compatible with your code. Don't worry if you're not sure at first, since you can (and should!) always update it later. For example, to specify a version of :code:`numpy>=1.23.0`, you can run :code:`poetry add 'numpy>=1.23.0'`.
* Afterwards, double-check that the :code:`poetry.lock` file contains 1.23.0 in it. **All of our dependencies should be locked to their minimum versions at all times.** To downgrade to a specific version of :code:`numpy` in our lock file, you can explicitly add the version via :code:`poetry add 'numpy==1.23.0'`, manually edit the pyproject.toml file to use a :code:`>=` sign in front of the version number, and then run :code:`poetry lock --no-update`.

#. Document your changes.

* Ensure all functions, modules, classes etc. conform to `numpy docstring standards <https://numpydoc.readthedocs.io/en/latest/format.html>`_.

If applicable, update the REAMDEs in the directories of the files you changed with new usage information.

* New doc pages for `the website <https://trtools.readthedocs.io/en/stable/>`_ can be created under :code:`<project-root>/doc` and linked to as appropriate.
* If you have added significant amounts of documentation in any of these ways, build the documentation locally to ensure it looks good.

:code:`cd` to the :code:`doc` directory and run :code:`make clean && make html`, then view :code:`doc/_build/html/index.html` and navigate from there

#. Add tests to test any new functionality. Add them to the :code:`tests/` folder in the directory of the code you modified.

* :code:`cd` to the root of the project and run :code:`poetry run pytest --cov=. --cov-report term-missing` to make sure that (1) all tests pass and (2) any code you have added is covered by tests. (Code coverage may **not** go down).
* :code:`cd` to the root of the project and run :code:`nox` to make sure that the tests pass on all versions of python that we support.

#. Submit a pull request (PR) **to the master branch** of the central repository with a description of what changes you have made. Prefix the title of the PR according to the `conventional commits spec <https://www.conventionalcommits.org>`_.
A member of the TRTools team will reply and continue the contribution process from there, possibly asking for additional information/effort on your part.

* If you are reviewing a pull request, please double-check that the PR addresses each item in `our PR checklist <https://github.com/gymrek-lab/TRTools/blob/master/.github/pull_request_template.md>`_

Publishing
----------
If you are a TRTools maintainer and wish to publish changes and distribute them to PyPI and bioconda, please see PUBLISHING.rst in the root of the git repo.
If you are a community member and would like that to happen, contact us (see above).
8 changes: 4 additions & 4 deletions dev-env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ channels:
- nodefaults
dependencies:
- conda-forge::python=3.8 # the lowest version of python that we formally support
- conda-forge::pip==23.3.2
- bioconda::bcftools==1.19
- conda-forge::pip==24.0
- bioconda::bcftools==1.20
- bioconda::art==2016.06.05
- conda-forge::poetry==1.8.3 # keep in sync with release.yml
- conda-forge::nox==2023.04.22
- conda-forge::poetry-plugin-export==1.6.0
- conda-forge::nox==2024.4.15
- conda-forge::poetry-plugin-export==1.8.0
- pip:
- nox-poetry==1.0.3
- poetry-conda==0.1.1
12 changes: 10 additions & 2 deletions doc/CALLERS.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
.. _CALLERS:

Supported TR Genotypers
=======================

TRTools currently supports 5 tandem repeat genotypers. It also supports the Beagle imputation software (see :ref:`below <Beagle_section>`).
TRTools currently supports 6 tandem repeat genotypers. It also supports the Beagle imputation software (see :ref:`below <Beagle_section>`).
We summarize them in the first table and provide some basic parameters of their functionality in the second.
For more information on a genotyper, please see its website linked below.

Expand Down Expand Up @@ -37,6 +39,9 @@ For more information on a genotyper, please see its website linked below.
| PopSTR_ (v2.0) | Designed for genome-wide genotyping |
| | of short or expanded TRs. |
+----------------------------+--------------------------------------+
| LongTR_ (v1.0) | Designed for genome-wide genotyping |
| | of STRs and VNTRs from long reads. |
+----------------------------+--------------------------------------+

|
Expand All @@ -56,6 +61,8 @@ For more information on a genotyper, please see its website linked below.
+----------------------------+--------------------------+----------------------------+------------------------+--------------------------+-------------------------+------------------------+
| PopSTR_ (v2.0) | 1-6bp | Yes | Length | 540,1401 (hg38) | Illumina | Many |
+----------------------------+--------------------------+----------------------------+------------------------+--------------------------+-------------------------+------------------------+
| LongTR_ (v1.0) | 1+bp | No | Length, sequence | No ref provided | PacBio HiFi, ONT | Many |
+----------------------------+--------------------------+----------------------------+------------------------+--------------------------+-------------------------+------------------------+

Since each of these tools take as input a list of TRs to genotype, they could also be used on custom panels of TR loci.
Tool information and reference panel numbers shown above are based on downloads from the github repository of each tool as of July 2, 2020.
Expand All @@ -67,11 +74,12 @@ see :ref:`Contributing` for more information.
..
please ensure this list of links remains the same as the one in the main README
.. _AdVNTR: https://advntr.readthedocs.io/en/latest/
.. _AdVNTR: https://advntr.readthedocs.io
.. _ExpansionHunter: https://github.com/Illumina/ExpansionHunter
.. _GangSTR: https://github.com/gymreklab/gangstr
.. _HipSTR: https://hipstr-tool.github.io/HipSTR/
.. _PopSTR: https://github.com/DecodeGenetics/popSTR
.. _LongTR: https://github.com/gymrek-lab/longtr

.. _Beagle_section:

Expand Down
2 changes: 2 additions & 0 deletions doc/LIBRARY_SPEC.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _LIBRARY_SPEC:

TRHarmonizer Library Details
============================

Expand Down
3 changes: 3 additions & 0 deletions doc/UTILITIES.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _UTILITIES:

Command-Line Tools
===========================

Expand All @@ -6,6 +8,7 @@ TRTools offers a variety of command-line tools for performing manipulations to T

.. toctree::
:maxdepth: 1
:name: source
:hidden:

source/dumpSTR.rst
Expand Down
2 changes: 2 additions & 0 deletions doc/VIGNETTES.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _VIGNETTES:

TRTools Vignettes
==================

Expand Down
Loading

0 comments on commit 91091ec

Please sign in to comment.