Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task ID returned before database doc is created #722

Open
uniqueg opened this issue Oct 3, 2023 · 0 comments
Open

Task ID returned before database doc is created #722

uniqueg opened this issue Oct 3, 2023 · 0 comments

Comments

@uniqueg
Copy link

uniqueg commented Oct 3, 2023

Problem

We have run into a problem that suggest that in certain situations, Funnel may return a task ID to the client before a database document for the task has been successfully created.

Solution

Return the task ID to the client only after the database document is successfully created. If an error occurs during creation of the task, return a 500 error.

Details

We tried to execute a task via Snakemake to a Funnel instance set up in front of a Slurm cluster.

The client logged the following:

[TES] Task submitted: ckdu7ltckctsbus621rg
[TES] Task errored: ckdu7ltckctsbus621rg

No specific error message was made available to the client (or, at least, it wasn't logged).

When polling Funnel for the job info with

curl https://our.funnel.instance/v1/tasks/ckdu7ltckctsbus621rg
# FUNNEL_SERVER_USER and FUNNEL_SERVER_PASSWORD were already set

we received:

{
  "error": "task not found: taskID: ckdu7ltckctsbus621rg",
  "code": 5,
  "message": "task not found: taskID: ckdu7ltckctsbus621rg"
}

Digging into the Funnel service we found that a directory for the task was created, but was empty.

Logging information

slurm-jobs.txt

JobId=360 UserId=slurmer(1010) GroupId=slurmer(1010) Name=ckdu7ltckctsbus621rg JobState=FAILED Partition=knls TimeLimit=720 StartTime=2023-10-03T09:50:16 EndTime=2023-10-03T09:50:17 NodeList=compute0 NodeCnt=1 ProcCnt=2 WorkDir=/ ReservationName= Gres= Account= QOS=normal WcKey= Cluster=unknown SubmitTime=2023-10-03T09:50:15 EligibleTime=2023-10-03T09:50:15 DerivedExitCode=0:0 ExitCode=1:0 

slurmdbd.log

[2023-10-03T08:43:20.226] error: mysql_real_connect failed: 2002 Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)
[2023-10-03T08:43:20.237] error: The database must be up when starting the MYSQL plugin.  Trying again in 5 seconds.
[2023-10-03T08:43:25.271] error: Database settings not recommended values: innodb_buffer_pool_size innodb_log_file_size innodb_lock_wait_timeout
[2023-10-03T08:43:25.448] slurmdbd version 19.05.2 started

^^ This got us thinking that perhaps something went wrong with the database when Funnel started up. However, we do not have other information to corroborate that. The actual failed task was executed an hour later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant