Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide method for stopping Batch 5 Jobs upon user request #5990

Closed
wants to merge 4 commits into from

Conversation

cppwfs
Copy link
Contributor

@cppwfs cppwfs commented Oct 16, 2024

In the previous release of SCDF we used the JsrJobOperator to stop job executions. The 2 stages of stopping jobs is as follows:

  1. Sets the Batch Status of the job execution to STOPPING. This signals to Spring Batch to stop execution at the next step.
  2. If the Job is a StepLocator it will go through each of the StoppableTasklets and stop them.

Since Dataflow never deals with a Job directly, it never executed the 2nd stage. Meaning the Job is in the application that SCDF launched, not SCDF itself. So when using Batch 4.x it was just a quick check of the JobRegistry to retrieve the job, which was always empty since SCDF never deals with Jobs directly. But with Batch 5.x they loaded the Job Registry and attempted to retrieve the Job in a different way using the SimpleJobOperator. This caused a NPE when it tried to retrieve and determine if the Job was a StepLocator. In the updated solution, SCDF doesn't use the SimpleJobOperator since SCDF doesn't have access to the StepLocator for the Job, nor does it use the JobRegistry and even if it did, it would always be empty.

Resolves #5986

In the previous release of SCDF we used the JsrJobOperator to stop job executions.
The 2 stages of stopping jobs is as follows:
1) Sets the Batch Status of the job execution to STOPPING.  This signals to Spring Batch to stop execution at the next step.
2) If the Job is a StepLocator it will go through each of the StoppableTasklets and stop them.

Since Dataflow never deals with a Job directly, it never executed the 2nd stage.  Meaning the Job is in the application that SCDF launched, not SCDF itself.
So when using Batch 4.x it was just a quick check of the JobRegistry to retrieve the job, which was always empty since SCDF never deals with Jobs directly.
But with Batch 5.x they loaded the Job Registry and attempted to retrieve the Job in a different way using the SimpleJobOperator.
In the updated solution, SCDF doesn't use the SimpleJobOperator since SCDF doesn't have access to the StepLocator for the Job, nor does it use the JobRegistry and even if it did, it would always be empty.

w
@cppwfs cppwfs added this to the 3.0.x milestone Oct 16, 2024
Copy link
Contributor

@onobc onobc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @cppwfs . Makes sense. I have one suggestion/comment to address but other than that 👍🏻

* Modify stopAll so that it calls stop() instead of using its own logic
* Add Test for stopAll
jobExecution = jobService.getJobExecution(jobExecution.getId());
assertThat(jobExecution.isRunning()).isTrue();
assertThat(jobExecution.getStatus()).isNotEqualTo(BatchStatus.STOPPING);
jobService.stop(jobExecution.getId());
jobExecution = jobService.getJobExecution(jobExecution.getId());
assertThat(jobExecution.getStatus()).isEqualTo(BatchStatus.STOPPED);
assertThat(jobExecution.getStatus()).isEqualTo(BatchStatus.STOPPING);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we wait for STOPPED as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STOPPED only occurs once the batch job has completed. In this test we are just testing that SCDF sets the state to STOPPING. We are not running real jobs here, just creating / modifying the state of the JobExecution.

Extract job stop code from stop(long) and place into stopJobExecution method.
The stopJobExecution  method will be used by stop(long) and stopAll.
Moved Job Stop Code from assert methods to the tests.
Updated the JobExecution Create methods so that they verify that the job is not Stopping
pom.xml Show resolved Hide resolved
Copy link
Contributor

@onobc onobc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @cppwfs . Other than the commented out skipper in pom.xml - LGTM!

@cppwfs
Copy link
Contributor Author

cppwfs commented Oct 17, 2024

Thank you @onobc for the reviews that refined this PR.
Thanks @corneil for the great question!
Polished, Squashed, Merged.

@cppwfs cppwfs closed this Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SimpleJobService stop/stopAll need to support JobOperator
3 participants