You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have run into a problem that suggest that in certain situations, Funnel may return a task ID to the client before a database document for the task has been successfully created.
Solution
Return the task ID to the client only after the database document is successfully created. If an error occurs during creation of the task, return a 500 error.
Details
We tried to execute a task via Snakemake to a Funnel instance set up in front of a Slurm cluster.
[2023-10-03T08:43:20.226] error: mysql_real_connect failed: 2002 Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)[2023-10-03T08:43:20.237] error: The database must be up when starting the MYSQL plugin. Trying again in 5 seconds.[2023-10-03T08:43:25.271] error: Database settings not recommended values: innodb_buffer_pool_size innodb_log_file_size innodb_lock_wait_timeout[2023-10-03T08:43:25.448] slurmdbd version 19.05.2 started
^^ This got us thinking that perhaps something went wrong with the database when Funnel started up. However, we do not have other information to corroborate that. The actual failed task was executed an hour later.
The text was updated successfully, but these errors were encountered:
Problem
We have run into a problem that suggest that in certain situations, Funnel may return a task ID to the client before a database document for the task has been successfully created.
Solution
Return the task ID to the client only after the database document is successfully created. If an error occurs during creation of the task, return a
500
error.Details
We tried to execute a task via Snakemake to a Funnel instance set up in front of a Slurm cluster.
The client logged the following:
No specific error message was made available to the client (or, at least, it wasn't logged).
When polling Funnel for the job info with
curl https://our.funnel.instance/v1/tasks/ckdu7ltckctsbus621rg # FUNNEL_SERVER_USER and FUNNEL_SERVER_PASSWORD were already set
we received:
Digging into the Funnel service we found that a directory for the task was created, but was empty.
Logging information
slurm-jobs.txt
JobId=360 UserId=slurmer(1010) GroupId=slurmer(1010) Name=ckdu7ltckctsbus621rg JobState=FAILED Partition=knls TimeLimit=720 StartTime=2023-10-03T09:50:16 EndTime=2023-10-03T09:50:17 NodeList=compute0 NodeCnt=1 ProcCnt=2 WorkDir=/ ReservationName= Gres= Account= QOS=normal WcKey= Cluster=unknown SubmitTime=2023-10-03T09:50:15 EligibleTime=2023-10-03T09:50:15 DerivedExitCode=0:0 ExitCode=1:0
slurmdbd.log
^^ This got us thinking that perhaps something went wrong with the database when Funnel started up. However, we do not have other information to corroborate that. The actual failed task was executed an hour later.
The text was updated successfully, but these errors were encountered: