Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to htslib 1.10.2 #184

Merged
merged 81 commits into from
Mar 17, 2020
Merged

Update to htslib 1.10.2 #184

merged 81 commits into from
Mar 17, 2020

Conversation

brainstorm
Copy link
Member

@brainstorm brainstorm commented Feb 3, 2020

Last mile TODO:

Please feel free to jump in and pullrequest against this branch and I'll merge in/rebase/coordinate the changes as fast as I can since I would like to have this shipped relatively soon.

@brainstorm
Copy link
Member Author

brainstorm commented Feb 3, 2020

Small side-detail CI/CD detail: How come there's a "local", deprecated rust-bio htslib fork when we could be tracking tags/versions from upstream's htslib?

git clone --recursive https://github.com/brainstorm/rust-htslib/ and changing to this PR's branch should work fine if you are interested in working with it.

I just noticed TravisCI failing right now since I didn't realise that rust-htslib was pointing to rust-bio's outdated htslib fork :-S

@juliangehring
Copy link
Contributor

juliangehring commented Feb 3, 2020

@brainstorm https://github.com/brainstorm/rust-htslib/pull/1 should fix the issue with the unknown Z_* constants from zlib. Since they are constant anyway, one might as well use the actual values for now.

brainstorm and others added 2 commits February 4, 2020 08:07
…large_positions.md to spot 64 bit reference position changes and sam_* function renaming. Only 4 errors left, but need to closely check types afterwards very carefully. Specially for the binary formats where this change does not apply and (sub)fields that do not necessarily require big representations
@brainstorm
Copy link
Member Author

brainstorm commented Feb 4, 2020

Thanks @juliangehring, it's good to have more eyes on this! Please feel free to pull in my last commit and quickly review? I'm paranoid on putting an i64 on the wrong place and blow things up for somebody down the line ;)

Speaking of which I'll try to fix the TravisCI submodules issue next so that the tests run.

I still don't get why rust-bio keeps an outdated fork in the org when it can perfectly point the submodule upstream, which is to my eyes more desirable? :-S

…t-bio/rust-htslib/jobs/645812050#L166, rollback and try with bionic ubuntu distro. It does compile locally on OSX
…compiles fine in OSX and fails on Linux now...
@brainstorm
Copy link
Member Author

brainstorm commented Feb 4, 2020

@nlhepler Since you are involved in hts-sys packaging, could you please have a look at the curl functionality in this PR and see if I'm missing some -L or other compiler flag somewhere?

EDIT: nevermind, fixed, at least for OSX, need to test other platforms ;)

@@ -8,7 +8,7 @@ description = "This library provides HTSlib bindings."
readme = "README.md"
keywords = ["htslib", "bam", "bioinformatics", "pileup", "sequencing"]
license = "MIT"
repository = "https://github.com/rust-bio/rust-htslib.git"
repository = "https://github.com/samtools/rust-htslib.git"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line now points to the wrong repo, I'm afraid :)

Suggested change
repository = "https://github.com/samtools/rust-htslib.git"
repository = "https://github.com/rust-bio/rust-htslib.git"

Copy link
Member Author

@brainstorm brainstorm Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jch-13 for the quick insight but as I pointed out earlier I think that rust-htslib should avoid this antipattern, for the following reasons:

  1. Keeping a local, rust-bio org, htslib fork might encourage small tweaks to that package, making it diverge from upstream and making future rust-htslib upgrades painful.
  2. Any required change to htslib should be upstreamed anyway, so there's no need to keep a local (outdated) fork that needs to be updated periodically.
  3. Since we use submodules to point to htslib, we might as well just point to specific tags and commits from upstream as I'm doing here since I'm pointing to version 1.10.2 of htslib from the samtools repo.

Hope those make sense? I would definitely like to hear more opinions, perhaps I'm missing very valid reasons to keep that htslib repo/fork around?

Cheers!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some confusion here. The repository = entry in the package section of the Cargo.toml merely points to a place, where more information / documentation can be found about this package. Thus, to accurately reflect how hts-sys currently points to the rust-bio fork of htslib, it should actually read:

repository =  "https://github.com/rust-bio/htslib.git"

However, I think we can revert to actually pointing to the original htslib repo (see below for the history of this). Thus, this submodule import should be changed to point to the htslib 1.10.2 release you are updating the wrapper to and the pointer here should be changed to:

repository = "https://github.com/samtools/htslib.git"

However, the current suggested change (repository = "https://github.com/samtools/rust-htslib.git") simply doesn't exist (with the rust-prefix remaining in the repo name).

As to the history of this, I did a bit of digging:

This was originally introduced to have a commit applied that was not part of the htslib releases at the time. See for example PR #121 and especially the commit that introduced this. However, if you go over to the htslib repo, you will find that the respective commit 52368d672ce4b33a607688fc987ff396bdce68a1 has been released as of htslib 1.10, see e.g. the 1.10 release notes and the responsible htslib pull request 766 which contains this commit. Thus, as part of this update of rust-htslib to htslib 1.10.2, it should be OK to revert to pointing to the original samtools/htslib.git repo in its release 1.10.2 state.

Hope, I cleared this up a bit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S.: I see that the submodule obviously already points to the correct upstream release tag, so just correct the repository entry with the rust-prefix released. And we can now all feel safe about doing this, knowing the history of it all... :D

Copy link
Member Author

@brainstorm brainstorm Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, you are absolutely right, thanks @dlaehnemann, sorry @jch-13 for the misunderstanding. I'll amend that right now ;)

@brainstorm
Copy link
Member Author

brainstorm commented Mar 1, 2020

For future reference, the flags required (shown as htslib Makefile diff, but can be passed as env, of course) for an OSX host with musl-cross installed to cross-compile htslib with make lib-static (3.3MB htslib.a generated at the end):

(base) rvalls@umccr htslib % git diff
diff --git a/Makefile b/Makefile
index e8101d5..5f2eb8f 100644
--- a/Makefile
+++ b/Makefile
@@ -22,20 +22,20 @@
 # FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 # DEALINGS IN THE SOFTWARE.

-CC     = gcc
-AR     = ar
-RANLIB = ranlib
+CC     = x86_64-linux-musl-gcc -static
+AR     = x86_64-linux-musl-ar
+RANLIB = x86_64-linux-musl-ranlib

 # Default libraries to link if configure is not used
 htslib_default_libs = -lz -lm -lbz2 -llzma -lcurl

-CPPFLAGS =
+CPPFLAGS = -I/usr/local/opt/bzip2/include -I/usr/local/Cellar/curl-openssl/7.68.0/include -I/usr/local/Cellar/xz/5.2.4/include -I/usr/local/opt/zlib/include
 # TODO: make the 64-bit support for VCF optional via configure, for now add -DVCF_ALLOW_INT64
 #       to CFLAGS manually, here or in config.mk if the latter exists.
 # TODO: probably update cram code to make it compile cleanly with -Wc++-compat
 # For testing strict C99 support add -std=c99 -D_XOPEN_SOURCE=600
-#CFLAGS   = -g -Wall -O2 -pedantic -std=c99 -D_XOPEN_SOURCE=600
-CFLAGS   = -g -Wall -O2 -fvisibility=hidden
+CFLAGS   = -g -Wall -O2 -pedantic -std=c99 -D_XOPEN_SOURCE=600
+#CFLAGS   = -g -Wall -O2 -fvisibility=hidden
 EXTRA_CFLAGS_PIC = -fpic
 LDFLAGS  = -fvisibility=hidden
 LIBS     = $(htslib_default_libs)

This is assuming that the required packages are installed via Homebrew:

brew install bzip2 zlib xz curl-openssl brainstorm/musl-cross/musl-cross

Next up, let's figure out a more portable/clean version of that leveraging pkg-config and/or build.rs... ideally getting rid of the docker container(s) route as well if possible.

@brainstorm
Copy link
Member Author

brainstorm commented Mar 11, 2020

I just noticed that @johanneskoester went for GitHub Actions instead of TravisCI and musl via #191, cool!

Today I've homebrew-bottled musl for OSX catalina and mojave, otherwise the builds take unnecessarily way too long, so the github actions can use it via homebrew too as:

$ brew install brainstorm/musl-cross/musl-cross

This could bypass the need to use docker containers, but it's really up to you guys on how you want to deliver this crate for devs?

Of course I'll be rebasing all the TravisCI tests and rollbacks out from this PR soon, btw, too many unrelated CI tests for what was supposed to be a htslib-1.10.2 version bump.

@brainstorm
Copy link
Member Author

@brainstorm
Copy link
Member Author

brainstorm commented Mar 13, 2020

After a few hours of testing if we can get away without docker for the musl-static building, I don't think that it's very ergonomic to have reliable/portable musl builds for rust-htslib devs without docker due to tricky curl+openssl dependencies on the new htslib. As an example, see the following Dockerfile and imagine a dev machine getting tweaked that way to have a proper setup:

https://github.com/clux/muslrust/blob/master/Dockerfile

So, given that @johanneskoester is already on track with the new github actions with musl support via cross, I will assume this docker-based building strategy (or clux, or muslrust or whatever other container) will be the default from now on.

@johanneskoester
Copy link
Contributor

@brainstorm sounds good. I have no real opinion on musl. Cross seems the way to go indeed.

@johanneskoester
Copy link
Contributor

I would like to merge this as soon as possible, cause we are hit by yet another htslib bug which I hope to fix by upgrading: varlociraptor/varlociraptor#101.

@johanneskoester
Copy link
Contributor

Do you think you can quickly fix the musl-related errors with the docker approach? If not, let us deactivate musl tests for now, and fix it in a later PR.

@brainstorm
Copy link
Member Author

@johanneskoester The musl stuff is indeed non trivial, specially with the openssl+curl combo. I guess it'll take me a few more hours to crack that nut, imho. So yes, please, go ahead and prepare to merge this PR w/o musl for now?

Feel free to rebase and make any changes you see fit from this point on, going to sleep now ;)

/cc @ohofmann

@johanneskoester johanneskoester changed the base branch from master to htslib-1.10.2 March 17, 2020 11:31
@johanneskoester johanneskoester merged commit 3976c80 into rust-bio:htslib-1.10.2 Mar 17, 2020
johanneskoester added a commit that referenced this pull request Mar 17, 2020
* Update to htslib 1.10.2 (#184)

* Pull 1.10.2 submodule. Initial pass on i32->i64 type conversion, will eventually fix issue #183

* Fixes issue #109 while I'm at it, long hanging fruit fix

* Fix compression levels

* Going through https://github.com/samtools/htslib/blob/develop/README.large_positions.md to spot 64 bit reference position changes and sam_* function renaming. Only 4 errors left, but need to closely check types afterwards very carefully. Specially for the binary formats where this change does not apply and (sub)fields that do not necessarily require big representations

* Repoint submodule to upstream release tag

* Switch from rust-bio hosted htslib to upstream's samtools htslib.

* Try to amend submodule

* Handle submodules manually on TravisCI, since it is failing now: https://travis-ci.org/rust-bio/rust-htslib/builds/645772614

* Odd balls that would need review marked with XXX. New htslib seems to require libcurl

* Add curl-sys crate for hplugin_curl, aws s3 et al

* Add conditional libcurl compilation flags

* Be nice with cargo fmt [ci skip]

* Try installing libcurl meta-package instead of the openssl dependent one?: https://travis-ci.org/rust-bio/rust-htslib/builds/645790138#L1529

* Does not work with libcurl-dev metapackage: https://travis-ci.org/rust-bio/rust-htslib/jobs/645812050#L166, rollback and try with bionic ubuntu distro. It does compile locally on OSX

* Add matrix compile support for TravisCI, I would like to know why it compiles fine in OSX and fails on Linux now...

* PPA source no longer needed for ubuntu bionic? [ci skip]

* Bionic packages htslib with gnutls instead of openssl: https://packages.ubuntu.com/source/disco/htslib ... and it even comes with pre-shipped libcurl4-gnutls-dev, so no need to install it explicitly?: https://travis-ci.org/rust-bio/rust-htslib/jobs/645816116#L160

* sed is not gsed on OSX... not entirely sure this substitution is 100% needed

* Add libssl-dev to the mix for the openssl-sys: https://travis-ci.org/rust-bio/rust-htslib/jobs/645825022#L1000. Testsuite seems to run partially now: https://travis-ci.org/rust-bio/rust-htslib/jobs/645825022#L623

* Add build-essential in hopes to fix libcurl issue, incorporate #185, fix one testsuite warning

* Add curl on default features for hts-sys

* Point to the right htslib upstream repo, thanks @dlaehnemann [skip ci]

* Appease OSX musl-cross toolchain requirements: https://travis-ci.org/rust-bio/rust-htslib/jobs/645907695#L1222

* TravisCI timesout installing brew deps: https://travis-ci.org/rust-bio/rust-htslib/jobs/647667984#L212 ... Not sure if travis_wait will work with homebrew addons?: https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received... trying

* Try workaround for https://twitter.com/braincode/status/1226103002550849542

* Do not update homebrew (time intensive), switch from travis_wait to verbose to keep the build output being generated, hopefully

* brew install --verbose is not verbose enough for travis, reverting travis_wait and bumping to 40min for brew deps install and musl-cross

* Explicitly install system libcurl on linux and osx

* This .cargo/config makes it test right locally: rust-lang/rust#60149 (comment), only 5 tests failing now and no linking errors from libcurl (on OSX, at least) :)

* Cargo.toml missing quote misshap

* Same linker flags should apply for Linux builds? build-essential not necessary

* tid is i64 now, CRAM vs BAM test l_data is off by 3 and m_data is off too

* There should be no need for travis_wait with -v

* Debugging htslib structs with @jkbonfield guidance...

* Disable osx and musl-cross homebrew on the TravisCI matrix since it times out anyway, start looking into l_extranul and alignment issues for CRAM/BAM. Add curl_sys crate in hts-sys/src/lib.rs.

* Mystery solved for test_read_cram, testing some of the struct fields, explicitly state that l_data should differ according to #184 (comment)

* Extract selective CRAM/BAM field comparison to test helper function

* Formatting

* Temporarily commenting the hfile format category detection makes .bed filetype tests pass but .BAI idx locator fails

* Format detection seems to be broken upstream on htslib, see #184 (comment)

* Revert to libcurl4-openssl-dev instead of gnutls. All TravisCI script targets compiling and passing tests otherwise

* Bump up curl-sys version

* Experimenting musl cross-compilation with rustembedded cross. Ideally this would get rid of the homebrew musl-cross issue (too long compile times). /cc @nlhepler

* Remove assert_ne since cannot guarantee that l_data is different on all bam/crams

* Add minimum deps needed for rustembedded cross docker container

* Run docker in TravisCI, even if it is only supported on linux jobs :/

* Add CFLAGS suggested by @nlhepler

* Separate musl from GNU toolchains on Rust cross

* Bump up bindgen, vendorize openssl, further experimentation with compile flags

* Deprecate the mixed GNU/MUSL container

* Add minimal explanation on how to use the new docker files [ci skip]

* cross 0.2.0 with openssl vendored. Progress but more htslib compile time issues remain still

* Make sure libzma is present in both containers [ci skip]

* Focusing on LLVMv8 GNU cross build first, musl will come later on (if possible w/ htslib)

* Two more usize that should be u64

* Amend and settle BED header debate

* Rename test name accordingly

* Transition to cross instead of cargo for builds and tests

* All GNU/clang/llvm8 builds and tests seem to succeed, transitioning to musl testing now...

* Remove stray (repeated) musl build

* Remove @FiloSottile musl-cross homebrew formula since it takes too long to compile and timeouts CI, using clux/muslrust gets me a bit farther. Remove local rustup musl target... since htslib 1.10, the introduction of openssl complicated local compilation w/ musl greatly, unfortunately we have to resort to docker these days instead :/

* Experiment with toolchain setup in https://gitlab.com/rust_musl_docker/image. Remove vendored openssl since it's shipped in cross containers.

* Remove build.rs flags noise and different tests, containers should handle them transparently depending on the defined environment within, thanks @nlhepler

* Indicate dependencies and build flags required for OSX [ci skip].

* enable musl builds

* add libclang

* minor

* dbg

* revert

* dbg

* dbg

* Revert cross/docker madness, bottled up musl-cross myself over here: https://github.com/brainstorm/homebrew-musl-cross/commit/9304bf9cd6bc2b784083948771977d502030815a, high sierra bottle coming up soon. Cargo build script draft

* Add in openssl to hts-sys

* Make clippy boolean logic happy

* Remove homebrew OSX check since we'll just use rustembedded cross

Co-authored-by: Julian Gehring <[email protected]>
Co-authored-by: Johannes Köster <[email protected]>

* disable musl tests

Co-authored-by: Roman Valls Guimera <[email protected]>
Co-authored-by: Julian Gehring <[email protected]>
@brainstorm
Copy link
Member Author

Thanks to everybody!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants