Skip to content
This repository has been archived by the owner on Aug 11, 2021. It is now read-only.

Tabix error using UCSC bedgraph example #4

Open
lynxoid opened this issue Nov 8, 2013 · 3 comments
Open

Tabix error using UCSC bedgraph example #4

lynxoid opened this issue Nov 8, 2013 · 3 comments

Comments

@lynxoid
Copy link

lynxoid commented Nov 8, 2013

I am trying to create a custom track for the WashU EpiGenome browser (instructions here: http://washugb.blogspot.com/2012/09/generate-tabix-files-from-bigwig-files.html), so I am using a bedgraph file example posted at UCSC page: http://genome.ucsc.edu/goldenPath/help/bedgraph.html

The file looks like this:

browser position chr19:49302001-49304701
browser hide all
browser pack refGene encodeRegions
browser full altGraph
#   300 base wide bar graph, autoScale is on by default == graphing
#   limits will dynamically change to always show full range of data
#   in viewing window, priority = 20 positions this as the second graph
#   Note, zero-relative, half-open coordinate system in use for bedGraph format
track type=bedGraph name="BedGraph Format" description="BedGraph format" visibility=full color=200,100,0 altColor=0,100,200 priority=20
chr19 49302000 49302300 -1.0
chr19 49302300 49302600 -0.75
chr19 49302600 49302900 -0.50
chr19 49302900 49303200 -0.25
chr19 49303200 49303500 0.0
chr19 49303500 49303800 0.25
chr19 49303800 49304100 0.50
chr19 49304100 49304400 0.75
chr19 49304400 49304700 1.00

I run bzip first:

bgzip input.bedgraph 

and then I run tabix:

tabix -p bed input.bedgraph.gz

at which point I get these errors:

[get_intv] the following line cannot be parsed and skipped: browser position chr19:49302001-49304701
[ti_index_core] the indexes overlap or are out of bounds

If bedgraph is not the file format tabix expects, what is the input file format?

Thanks!

@pd3
Copy link
Member

pd3 commented Nov 11, 2013

I was unable to reproduce the problem. What version of tabix and bgzip are you using? This is what I did:

# cat | while read A B C D; do echo -e "$A\t$B\t$C\t$D"; done | bgzip -c > rmme.bed.gz
chr19 49302000 49302300 -1.0
chr19 49302300 49302600 -0.75
chr19 49302600 49302900 -0.50
chr19 49302900 49303200 -0.25
chr19 49303200 49303500 0.0
chr19 49303500 49303800 0.25
chr19 49303800 49304100 0.50
chr19 49304100 49304400 0.75
chr19 49304400 49304700 1.00
$ tabix -p bed rmme.bed.gz
$ tabix rmme.bed.gz chr19:49303800-49304100
chr19   49303500    49303800    0.25
chr19   49303800    49304100    0.50

@lynxoid
Copy link
Author

lynxoid commented Nov 12, 2013

I figured it out: I was using spaces as separators while tabs were expected by default. I don't think I saw this fact mentioned anywhere obvious, so may be it's a good idea to make it more transparent -- or parse on whitespace instead.

@lh3
Copy link
Member

lh3 commented Nov 12, 2013

I just realized that the UCSC format page does not explicitly require that BED should be TAB delimited. However, the "BED detail format" does require TAB as the only separator. Conventionally, BED files are TAB delimited, too. That said, it would be good to throw a warning/error when the line is space delimited or optionally parse space-delimited files. This is not of high priority, though.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants