Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate options for running on a spark cluster #27

Open
dolegi opened this issue Oct 30, 2024 · 1 comment
Open

Investigate options for running on a spark cluster #27

dolegi opened this issue Oct 30, 2024 · 1 comment
Assignees

Comments

@dolegi
Copy link
Collaborator

dolegi commented Oct 30, 2024

there are some problems using the directrunner on jasmin (crashes on http errors).
We need a better way to run the scripts, probably a spark cluster would be most stable, using the beam to spark runner.

Investigate how we can do this, does jasmin have an existing spark cluster, is there one available to CEH somewhere (Iain might know/be able to point in a direction).

Another option is spinning up our own spark cluster on jasmin, investigate if possible/reasonable.

acceptance criteria

  • We know how to move forward with running the scripts on a spark cluster
@mattjbr123
Copy link
Collaborator

UKCEH's datalabs has the capability of using Spark clusters. I will try there with a minimal version of the pipeline first, to see if it is at least viable in general

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants