Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job Status shows as STARTED if the SCDF server goes down in between Job execution. #5498

Closed
PSHREYASHOLLA opened this issue Oct 6, 2023 · 10 comments
Labels
for/team-attention For team attention

Comments

@PSHREYASHOLLA
Copy link

PSHREYASHOLLA commented Oct 6, 2023

Trying out the sample BillRun application available as part of https://dataflow.spring.io/docs/batch-developer-guides/batch/data-flow-spring-batch/.

Now we want to check the behavior of a job if the SCDF server crashes.

So for the same I have modified BillProcessor class to add a delay of 2 minutes before the 3rd record is added to the database like,
public class BillProcessor implements ItemProcessor<Usage, Bill> {

@Override
public Bill process(Usage usage) {
    **if(usage.getFirstName().equals("michael"))
	{
		try {
			Thread.sleep(120000);
		} catch (InterruptedException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}**
	Double billAmount = usage.getDataUsage() * .001 + usage.getMinutes() * .01;
	return new Bill(usage.getId(), usage.getFirstName(), usage.getLastName(),
			usage.getDataUsage(), usage.getMinutes(), billAmount);
}

}

Now we launch the billRun task and the batch inserts the first 2 records into the DB and goes to sleep for 2 minutes,
image

We bring down the server now. After server restart the job execution status remains as STARTED,
image

So now its neither continuing on server startup with this status nor its allowing me to restart as the button is disabled,
image.

So,

  1. What should be the STATUS and behavior here in case of server crash in between a job execution?
  2. How to continue this job since it has inserted 2 records, how to make continue from 3rd?
  3. If I click on STOP here it goes into STOPPING status and hangs, the step still shows STARTED status,
    image
    image
@github-actions github-actions bot added the status/need-triage Team needs to triage and take a first look label Oct 6, 2023
@cppwfs
Copy link
Contributor

cppwfs commented Oct 6, 2023

Hello @PSHREYASHOLLA ,
Is this an SCDF instance that is deploying the tasks on your local environment (vs Kubernetes or Cloud Foundry)?
If so, SCDF is terminated it will also terminate any apps it deployed. Thus exhibiting the behavior you see. However if you are deploying to Kubernetes or Cloud Foundry,

@cppwfs cppwfs added status/need-feedback Calling participant to provide feedback and removed status/need-triage Team needs to triage and take a first look labels Oct 6, 2023
@PSHREYASHOLLA
Copy link
Author

PSHREYASHOLLA commented Oct 9, 2023

Hi @cppwfs,

Yes its deployed as a local instance.
When you say terminate what should be the status, as I mentioned above its in STARTED state then it shows option to STOP, but clicking on STOP button does not work?

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Oct 9, 2023
@cppwfs
Copy link
Contributor

cppwfs commented Oct 9, 2023

What you are experiencing is that when SCDF (using the local deployer) terminates it also terminates the JVM of the apps that it launched. Thus the task can not update its state in the DB because the JVM it is running on is sigkill thus the task does not update the database.
Terminating the SCDF when it launches tasks using the Kubernetes or Cloud Foundry deployer will not have this issue.
You can use SCDF to view the state of task executions of tasks that are launched externally from dataflow and the termination of SCDF will have no effect on the task executions.

@github-actions github-actions bot added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Oct 9, 2023
@PSHREYASHOLLA
Copy link
Author

Hi @cppwfs,

Please provide the answers for below,

  1. Understood that using the local deployer) terminates it also terminates the JVM of the apps that it launched. But once SCDF is launched back again, we should be able to update the status to STOP right, by calling STOP API?

  2. Terminating the SCDF when it launches tasks using the Kubernetes or Cloud Foundry deployer will not have this issue. >>>What if kubernetes instance goes down in between job execution, how is the JOB treated then?

We are trying to solve our Job Resume-ability problem.

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Oct 9, 2023
@cppwfs
Copy link
Contributor

cppwfs commented Oct 9, 2023

Data flow does not update the state of a task or job execution once it has been started. You can add an issue to SCDF requesting this feature.
Each Task or Batch Job manages the information in its TASK_EXECUTION Or JOB_EXECUTION tables. If the app is SIG-TERM'd then it will update the tables correctly, however, if they are SIG-KILL'd the table state is left in the state at the time of the SIG-KILL.
You can manually set the state of the executions as well.

@github-actions github-actions bot added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Oct 9, 2023
@PSHREYASHOLLA
Copy link
Author

Each Task or Batch Job manages the information in its TASK_EXECUTION Or JOB_EXECUTION tables.>>>>>> So in case of local deployment, can we update this status manually by calling any API?

@PSHREYASHOLLA
Copy link
Author

Each Task or Batch Job manages the information in its TASK_EXECUTION Or JOB_EXECUTION tables.>>>>>> So in case of local deployment, can we update this status manually by calling any API?

@cppwfs
Copy link
Contributor

cppwfs commented Oct 11, 2023

At this time we don't have an API to do this. It would have to be done manually through the database.

@cppwfs cppwfs added status/need-feedback Calling participant to provide feedback and removed for/team-attention For team attention labels Oct 11, 2023
@cppwfs
Copy link
Contributor

cppwfs commented Oct 11, 2023

Closing this issue and we'll refer to #5502

@cppwfs cppwfs closed this as completed Oct 11, 2023
@PSHREYASHOLLA
Copy link
Author

At this time we don't have an API to do this. It would have to be done manually through the database. >>>>> Are there any steps I can follow?

@github-actions github-actions bot added for/team-attention For team attention and removed status/need-feedback Calling participant to provide feedback labels Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for/team-attention For team attention
Projects
None yet
Development

No branches or pull requests

2 participants