-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
packaging imports with ../ #6
Comments
I am all supporting a wide variety of use cases. But I am not for supporting use cases that are inheritantly unpackagable. There is no intuitive way to solve these problems, so the only solution is one where wdl-packager would behave unintuitively. Having unintuitive behavior causes a lot of friction for developers and users, so that is a big sacrifice to make in my opinion. As for the broad case. Instead of importing Maybe we can provide this sort of information in a packaging guide? That would be more helpful than coding around all possible package problems that might occur. |
Hmm, you're making me reconsider whether to merge my custom file format! 😅 |
Hmm, maybe wdl-packager can do some rewriting of import statements? i.e. |
Would that be a solution? It sounds like something WDL-packager can support. Also I can add |
Although uncomfortable to change the source, I think it's slightly better than my approach that needs special logic in the source loader to "reinterpret" the import paths. Despite miniwdl not supporting full code generation, we could do a good enough job on the constrained problem of rewriting |
That would be doable indeed. It would just be an ad verbatim copy except the import statement. That is if, we think wdl-packager got WDL packaging right the first try. We might want to think about more about that. |
The only downside of ZIP is that it doesn't achieve the best possible compression, because it compresses each file separately without sharing a "dictionary." Usually the difference would be negligible, except that the reason I just got interested in packaging (after letting chanzuckerberg/miniwdl#292 languish for years!) was to send the package in an AWS Batch SubmitJob environment variable, which has a 30 KiB total size limit. It'd be nice not to have to arrange for some S3 bucket or public web server to host the package. Anyway, I admit this is a weird case, it's just factually what got me onto this topic 😄 |
wdl-packager now uses uncompressed zip files. Because they are still relatively small compared to checking out an entire repo including test files. I suppose WDL packager could also use tar files. It's just packing an archive.
Sorted on size. Tar uncompressed is bigger than zip. zip deflated is already quite small. But tar is indeed 33% smaller with the same deflate compression (level 9). XZ -9 works very well. Less than half the size of gzip. EDIT: I uses zip initially because that is what cromwell supports. But as soon as tar.xz gets into the spec wdl-packager can also support that. I prefer to support only formats that can be parsed by Python's stdlib. (gzip, bzip2, lzma). Bzip2 is not a suitable candidate in my opinion as decompression speeds are very slow. Lzma works much better in that regard. Also I see that XZ format is supported in the java world as well. |
The thought just occurred to me, since my need for maximum compression is kind of a corner case, for that purpose I can simply use XZ on the uncompressed ZIP file. It's probably better to stick with ZIP for general WDL packaging, which will be less surprising to the community. |
If it helps I can add a compression level flag to wdl-packager. That should shrink the size somewhat. The reason the zipfiles are uncompressed by default.
|
@mlin while writing the package specification I thought of another way to handle Given a file called
Done, and no need to rewrite the source. It won't work for http style imports, but I hope this helps a bit with doing a minimum of import rewriting. |
@rhpvorderman I had some similar thoughts early on in looking at this,
I'd still be concerned about producing a zip that wouldn't readily work with Cromwell. Another thing I was unsure about was where to put the default input JSON (and "additional files" as wdl-packager supports). Do they go in the root or alongside the main WDL? Depending on that, what kind of relative paths does the input JSON use to refer to the additional files? By putting the main WDL in the root, I at least avoided having to make those decisions. 😅 Thanks for the great PR on miniwdl btw, I'm a bit tied up the next few days but will send comments asarp. |
Ah, yes that is a very good reason. I can't believe we overlooked that.
Take your time. I wrote a spec for WDL packaging too and put it up for discussion on the OpenWDL repo. Packaging is mainly a BioWDL issue, since we have all these modular repositories, so we had a discussion with the team and decided to push for this ourselves. |
@rhpvorderman @mlin is this still open for discussion? I would like to make this a standard we can put in OpenWDL |
@vsmalladi Sure! The reason this has not had much activity is that currently this would only benefit the LUMC and we also have other things we do besides working on WDL pipelines. So these not very pressing concerns tend to get stuck on the backlog. It would be very nice if more people get involved. |
@vsmalladi Also FYI- this repo & thread were the original inspiration for |
@vsmalladi I already started on a discussion about packaging: openwdl/wdl#499 |
Hi @rhpvorderman @DavyCats
I noticed wdl-packager rejects top-level WDLs that import from outside their own directory e.g.
import "../../../foo/bar/bas.wdl"
. I'm concerned that several of the important Broad WDL repos (warp, viral-pipelines, etc.) have settled on that approach (example), so we should try to support it even if we organize our own source code differently...It seems possible in principle to embed in the ZIP file, the minimum directory subtree needed to make the relative paths work. The problem with that is the "main" WDL file gets buried in some subdirectory which isn't straightforward to find upon opening the ZIP file.
We could add a manifest file to the ZIP, with the equivalent of
Main-Class:
from a JAR manifest and that also could hold default inputs if desired. I could make miniwdl read that, but obviously no control over Cromwell. Come to think of it, I wonder how are repos with the above structure sent into Cromwell/Terra anyhow? (Probably by public URL instead of ZIP file?)Have you thought about this, any other approaches?
(I spent this past weekend educating myself about the packaging problem by implementing an alternative approach that inlines all the WDL source in a YAML file along with a manifest, and relies on dedicated logic in miniwdl's source loader to resolve the paths. However, after finishing it, I think I should just go back to ZIP files.)
The text was updated successfully, but these errors were encountered: