better version/build management #8

stephenturner · 2017-01-19T21:24:03Z

with the changes in #6 it's much easier to recreate annotation tables. the files are named e.g. galgal5, but which version/build is actually used depends on what's current in ensembl. e.g., when I first built this package, chicken was on galgal4. i had to manually update the filenames, and I probably did the wrong thing by just deleting (rather than deprecating) the old datasets. maybe that's okay since it's still versioned in a release. not sure how to best handle these issues.

The text was updated successfully, but these errors were encountered:

aaronwolen · 2017-01-20T13:41:15Z

🤔...

One potential solution: name recipes and tables based on species, so hsapiens.yml would create a table called hsapiens that includes annotations for whatever the most recent build/version happens to be.

Previous versions could be specified by appending the version number. Most users will (probably) want the most up to date info and only need to type hsapiens, users with more specific needs would have to type something like hsapiens_GRCh37.

What's your opinion on providing previous genome versions?

We could maintain recipes for older builds and provide a function that allows users to build them locally. That way they're still easily accessible for reproducibility purposes without causing the package size to explode.

stephenturner · 2017-01-20T18:06:40Z

I do think there's a need to be able to maintain or recreate older versions. I operate a core facility - I've had folks that I've done analysis for years ago using, e.g., Galgal4, but if I now created or recreated the data, it'd be galgal5. Also, for human specifically, lots of folks (me included) are still using GRCh37.

There might be a few ways to manage this. I think you'd need to know which archive version of ensembl you'd need to go after to get the build you're interested in. Also, maybe there's some way to retrieve and record this information from the biomart query.

I do like the idea of just typing hsapiens... I'm sure there's a way to "alias" different names to the same dataset. Not very experience with R data package creation. This is my first/only.

aaronwolen · 2017-01-20T20:42:18Z

This is a good point. Attaching GRCh38 data to an object called hsapiens would probably violate user assumptions. Perhaps it's better be more explicit and stick to naming objects after the relevant genome version?

I'm also in a bioinformatics core and frequently switching between different projects that require different genomes/builds, so I loved the idea of annotables. It can be a real time saver!

stephenturner mentioned this issue Aug 8, 2023

Grcm38 gene annotations are Grcm39 based! #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better version/build management #8

better version/build management #8

stephenturner commented Jan 19, 2017

aaronwolen commented Jan 20, 2017

stephenturner commented Jan 20, 2017

aaronwolen commented Jan 20, 2017

better version/build management #8

better version/build management #8

Comments

stephenturner commented Jan 19, 2017

aaronwolen commented Jan 20, 2017

stephenturner commented Jan 20, 2017

aaronwolen commented Jan 20, 2017