You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The goal of this PR is to discuss the introduction of a new file type that replaces the beddb and bed2ddb formats. This file type will be able to store any type of data and be used as backing for gene annotations, bed-like regions, arbitrary points, etc...
Questions to address:
Tile API: the current API takes a zoom level, start and end position. It works right now because any genomic data is converted to a linearized representation where chromosomes are concatenated using a given chromosome order.
Dataframe-backed files will not have this limitation. The tile API will have to have a chromosome order associated with it to indicate which data should be retrieved between coordinates x0 and x1.
Column to use as the index: A dataframe may have the start and end positions at arbitrary positions. The request should include an indicator of which columns to use for the positions of the data.
Use cases
Replacing the current beddb and tile bed2db formats.
Perfomance
Filtering a 970K line file takes about 200ms. It may be possible to improve this through parallelization, sorting, indexing or subdividing the file into sections (e.g. chromosomes)
The text was updated successfully, but these errors were encountered:
If you have sorted BED data and absToChr coordinates, you could use bedextract or tabix. Tabix might be better for one-off queries of individual ranges, while bedextract may do better with a query containing multiple ranges.
This point came up on slack. The replacement for beddb should be treatable both as an interval track and as a coverage/vector track. e.g. bigBed natively supports both.
The goal of this PR is to discuss the introduction of a new file type that replaces the
beddb
andbed2ddb
formats. This file type will be able to store any type of data and be used as backing for gene annotations, bed-like regions, arbitrary points, etc...Questions to address:
Dataframe-backed files will not have this limitation. The tile API will have to have a chromosome order associated with it to indicate which data should be retrieved between coordinates x0 and x1.
Example API:
Use cases
Perfomance
Filtering a 970K line file takes about 200ms. It may be possible to improve this through parallelization, sorting, indexing or subdividing the file into sections (e.g. chromosomes)
The text was updated successfully, but these errors were encountered: