Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subworkflows can only use modules present in the same repo #1927

Open
awgymer opened this issue Oct 11, 2022 · 8 comments
Open

Subworkflows can only use modules present in the same repo #1927

awgymer opened this issue Oct 11, 2022 · 8 comments
Assignees

Comments

@awgymer
Copy link
Contributor

awgymer commented Oct 11, 2022

Description of feature

This is a current (possibly permanent) limitation of subworkflows.
This means you cannot define a subworkflow in an nf-core structured repo which uses nf-core modules directly.

The implications to allow this would be greater complexity for updating and installing.

Should be clearly documented.

@mberacochea
Copy link

My team is currently adopting nf-core tools and we've noticed this limitation. I'm interested in working on adding support for 'hybrid' subworkflows. Any guidance on how to begin would be helpful.

@awgymer
Copy link
Contributor Author

awgymer commented Sep 14, 2023

This is quite a thorny problem and right now there is no proper solution I am afraid. You could mirror the GitHub modules repo and add your own subworkflows and modules to that but that has its own wrinkles.

I hope we can find a better solution eventually but obviously as an open source project supporting split open-source, in-house work is probably not a priority issue.

@mberacochea
Copy link

I understand... I'll share our solution or workaround as soon as we find one that we are happy with. Thank you

@GallVp
Copy link
Member

GallVp commented Dec 13, 2023

Hi @awgymer and @mberacochea

Thanks for the hints. Here is what I have settled on for now:

  1. Inside the organisation (XYZ) repo, create nf-core-modules directory. Do:
cd nf-core-modules
touch main.nf
touch nextflow.config

cat <<-EOF > .nf-core.yml
repository_type: pipeline
EOF
  1. nf-core-modules directory will behave as a pipeline and the nf-core modules can be installed with version control using nf-core tools.

  2. Inside the organisation (XYZ) repo, create a nf-core-hybridisation.sh to keep track of hybrid modules. Example:

#!/usr/bin/env bash

cp -r ./nf-core-modules/modules/nf-core/gunzip ./modules/nf-core/ # needed for hybrid testing

mkdir -p ./modules/XYZ/cat
cp -r ./nf-core-modules/modules/nf-core/cat/cat ./modules/XYZ/cat # Needed for a hybrid sub-workflow

This way the hybridisation can be version controlled. I am not sure it will work in every situation. Looking forward to your thoughts.

@awgymer
Copy link
Contributor Author

awgymer commented Dec 13, 2023

If I understand this correctly you are basically using a "pipeline repo" to mirror modules into your remote and then syncing them with bash then?

This is a little like an idea that has been raised here which would see subworkflows package their modules alongside themselves.

I've only thought about it a little bit, but the idea in my head would be to create a 3rd "repository_type" of "subworkflow". This would mostly behave like a "pipeline" but with a few differences (some assumptions about pipeline repos wouldn't be quite the same).

The tooling could then be refactored to basically do a recursive pass of "subworkflows" updating/installing modules within (or perhaps they should be frozen I'm not sure).

@GallVp
Copy link
Member

GallVp commented Dec 14, 2023

If I understand this correctly you are basically using a "pipeline repo" to mirror modules into your remote and then syncing them with bash then?

Yes, that's true. Essentially I am creating two copies in the same repo. Not ideal. But it is explicit and allows me to use nf-core tools to stay up to date with nf-core/modules. For me, it is really a temporary solution as I intend to eventually contribute all the local org modules and sub-workflows to nf-core/modules.

This is a little like an idea that has been raised here which would see subworkflows package their modules alongside themselves.

I've only thought about it a little bit, but the idea in my head would be to create a 3rd "repository_type" of "subworkflow". This would mostly behave like a "pipeline" but with a few differences (some assumptions about pipeline repos wouldn't be quite the same).

The tooling could then be refactored to basically do a recursive pass of "subworkflows" updating/installing modules within (or perhaps they should be frozen I'm not sure).

Yes, I like the idea of freezing modules inside sub-workflows. When a sub-workflow is downloaded by a pipeline developer, the nf-core tools can generate a warning saying that the sub-workflow modules are outdated. The developer can choose to keep using the outdated modules or create a sub-workflow update pull request which goes through the nf-test Github Actions along with the community review. Does this also prevent the sub-workflow malfunction due to breaking module updates? Or, is that already taken care of by some other mechanism?

@drpatelh
Copy link
Member

We could also have the ability to provide multiple --git-remote options on the CLI and have some sort of fallback mechanism as to where the appropriate components are sourced? Don't know how the dependencies between modules and subworkflows are currently tracked in tools because this would need to be mirrored in modules.json somehow.

For example, --git-remote <MYGITHUB_REPO> --git-remote <NF_CORE_MODULES_REPO>. Tricky thing will be deciding which one takes precedence if you have the same modules in both of these repos, especially if you have more than 2 --git-remote.

Blasting some ideas out there. What do you think @mashehu @mirpedrol ?

@ghost
Copy link

ghost commented Jan 19, 2024

Thank you @drpatelh .

To give a perspective of my case. I developed a subworfklow that uses internal (our nf-core_modules-like repo) and external (public nf-core/modules) modules. When I try to install this module with nf-core install --git-remote <internal nf-core modules URL> ..., nf-core tools can't find the modules.

What I would suggest is something like pip (https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-extra-index-url). I would add --extra-git-url or something like this, where the extra adds to what is not found in the --git-remote. This way, the --git-remote would have precedent to the --extra-git-url

This way, we can still use public modules and subworkflows and keep up-to-date with new releases with occasional local patches without the need to internalize modules without the intent to modify them heavily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants