Utility for merging RNA-seq expression counts files from St. Jude Cloud.
Request Feature
·
Report Bug
·
⭐ Consider starring the repo! ⭐
You can install stjudecloud-merge-counts
using the Python Package Index (PyPI).
pip install stjudecloud-merge-counts
stjudecloud-merge-counts
has 4 subcommands:
concordance-test
- Performs arecursive
andsequential
merge and verifies that the results are concordant.metadata
- Compiles file metadata into a tab-delimited matrix.recursive
- Merges count files in a recursive, divide-and-conquer strategy.sequential
- Merges count files sequentially. This method should produce the same results asrecursive
, but it requires significantly more time than the recursive approach.
All four subcommands require a set of DNAnexus file IDs to be supplied as commandline arguments.
For feature counts vended from St. Jude Cloud platform, the following example will merge the vended counts into a tab-delimited matrix. Replace project-G2KfyQ09XB5BBKKf1BXx9ZkK
with the project identifier for your DNAnexus project containing feature counts.
dx ls --brief project-G2KfyQ09XB5BBKKf1BXx9ZkK:/immediate/FEATURE_COUNTS/ | xargs stjudecloud-merge-counts recursive
- When a file belongs to multiple datasets in St. Jude Cloud, we will pick a single dataset name to use for that file based on an ordered priority list. In short, we'll generally choose the best dataset (from the St. Jude Cloud team's perspective). You can find the priority of datasets listed here: https://github.com/stjudecloud/merge-counts/blob/master/mergecounts/utils/dx.py#L19.
- So, for example, if the file for SJABCD1234 belongs to both the PCGP and CSTN datasets, the sample name generated will include the PCGP dataset name (since it is higher priority), giving you the column name
SJABCD1234 (PCGP)
.
- So, for example, if the file for SJABCD1234 belongs to both the PCGP and CSTN datasets, the sample name generated will include the PCGP dataset name (since it is higher priority), giving you the column name
If you are interested in contributing to the code, please first review our CONTRIBUTING.md document.
To bootstrap a development environment, please use the following commands.
# Clone the repository
git clone [email protected]:stjudecloud/merge-counts.git
cd merge-counts
# Install the project using poetry
poetry install
merge-counts provides a (currently patchy) set of tests — both unit and end-to-end.
py.test
Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.
This project is licensed under the MIT License—see the LICENSE.md file for details.
Copyright © 2020 St. Jude Cloud Team.