Skip to content

Searching large collections of sequencing data with genome-scale queries

License

Notifications You must be signed in to change notification settings

sourmash-bio/branchwater

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

branchwater

This is the central repository for branchwater.

branchwater is the framework we use for searching large collections of sequencing data with genome-scale queries. At its core it is a new search index for sourmash signatures, allowing near real-time search of large scale databases. It is an inverted index implemented on top of RocksDB.

You can read more about branchwater in Sourmash Branchwater Enables Lightweight Petabyte-Scale Sequence Search, Irber et al., 2022, and you can read about one of the earliest use cases in Biogeographic Distribution of Five Antarctic Cyanobacteria Using Large-Scale k-mer Searching with sourmash branchwater, Lumian et al., 2022.

branchwater had a couple of names over time:

Here are a few blog posts:

Code repository links and details.

branchwater is based on sourmash, and the search index data structure live there since version 0.12 of the Rust crate.

branchwater is currently (Jan 2024) mostly contained in this repo, with the tools developed to work with the new index:

  • branchwater-api, a search server indexing ~946,000 SRA metagenomes.
  • branchwater-web, a webapp that takes a genome of interest and rapidly searches for publicly-available metagenomes within NCBI's sequence read archive with branchwater. Metadata associated with the metagenome accessions are summarized in interactive tables, plots, and maps.
  • branchwater-index, a command-line interface to build the search index. See the Query README for more details.
  • branchwater-query, a command-line interface to submit queries to a search server.

There are also additional resources:

  • The code for monitoring the SRA and building sourmash sketches from genomes and metagenomes is in wort.
  • sourmash_plugin_branchwater is a sourmash plugin exposing more features from branchwater in sourmash.

Need help? Have questions? Want to make a suggestion?

Please file branchwater-specific issues and pull requests in the branchwater repo. We also hang out in the sourmash repo a lot, if you have more general questions about sourmash. And there's a gitter/matrix channel where you can contact a number of the sourmash collaborators.

License information

branchwater is AGPL licensed.

The webapp was developed by the USDA Agricultural Research Service, Genomics and Bioinformatics Research Unit group in Gainesville, FL, Primarily authored by Suzanne Fleishman and led by Adam Rivers. Check out their other work at https://tinyecology.com. As a work of the United States Government, the original code is available under the CC0 1.0 Universal Public Domain Dedication (CC0 1.0).