Skip to content

Utility for merging RNA-seq expression counts files from St. Jude Cloud.

License

Notifications You must be signed in to change notification settings

stjudecloud/merge-counts

Repository files navigation

merge-counts

Actions: CI Status PyPI PyPI: Downloads PyPI: Downloads License: MIT

Utility for merging RNA-seq expression counts files from St. Jude Cloud.

Request Feature · Report Bug · ⭐ Consider starring the repo! ⭐

📚 Getting Started

Installation

You can install stjudecloud-merge-counts using the Python Package Index (PyPI).

pip install stjudecloud-merge-counts

Usage

stjudecloud-merge-counts has 4 subcommands:

  • concordance-test - Performs a recursive and sequential merge and verifies that the results are concordant.
  • metadata - Compiles file metadata into a tab-delimited matrix.
  • recursive - Merges count files in a recursive, divide-and-conquer strategy.
  • sequential - Merges count files sequentially. This method should produce the same results as recursive, but it requires significantly more time than the recursive approach.

All four subcommands require a set of DNAnexus file IDs to be supplied as commandline arguments.

For feature counts vended from St. Jude Cloud platform, the following example will merge the vended counts into a tab-delimited matrix. Replace project-G2KfyQ09XB5BBKKf1BXx9ZkK with the project identifier for your DNAnexus project containing feature counts.

dx ls --brief project-G2KfyQ09XB5BBKKf1BXx9ZkK:/immediate/FEATURE_COUNTS/ | xargs stjudecloud-merge-counts recursive

Caveats

  • When a file belongs to multiple datasets in St. Jude Cloud, we will pick a single dataset name to use for that file based on an ordered priority list. In short, we'll generally choose the best dataset (from the St. Jude Cloud team's perspective). You can find the priority of datasets listed here: https://github.com/stjudecloud/merge-counts/blob/master/mergecounts/utils/dx.py#L19.
    • So, for example, if the file for SJABCD1234 belongs to both the PCGP and CSTN datasets, the sample name generated will include the PCGP dataset name (since it is higher priority), giving you the column name SJABCD1234 (PCGP).

🖥️ Development

If you are interested in contributing to the code, please first review our CONTRIBUTING.md document.

To bootstrap a development environment, please use the following commands.

# Clone the repository
git clone [email protected]:stjudecloud/merge-counts.git
cd merge-counts

# Install the project using poetry
poetry install

🚧️ Tests

merge-counts provides a (currently patchy) set of tests — both unit and end-to-end.

py.test

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.

📝 License

This project is licensed under the MIT License—see the LICENSE.md file for details.

Copyright © 2020 St. Jude Cloud Team.

About

Utility for merging RNA-seq expression counts files from St. Jude Cloud.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages