Should worker process exit when the worker commits seppuku? #501

psugihara · 2024-11-04T21:20:09Z

Summary

I have a single graphile-worker process that processes a small queue of mostly low priority web scraper stuff with occasional high priority items (emails).

I'm seeing my worker die without the process exiting. There's a supervisor that is supposed to restart if it fails, but the process never exits so the supervisor doesn't get triggered. I can program around this by parsing the log output (e.g. look for "seppuku" and restart), but I'm guessing I'm just missing something in my setup. Is there a setting that will make the process exit when it commits seppuku?

Additional context

Node v20.15.1

Here's the script I run on the server to start the worker:
bunx --bun graphile-worker --cleanup DELETE_PERMAFAILED_JOBS,GC_TASK_IDENTIFIERS,GC_JOB_QUEUES && bunx --bun graphile-worker --no-prepared-statements

Here's an example of the error I see when it stops processing without exiting the process:

[job(worker-6f1b0ec5b9b6945331: enrich-job-posts{39050})] ERROR: [scrapeJob] error scraping job data: Request failed with status code 429

[worker(worker-6f1b0ec5b9b6945331)] ERROR: Failed task 39050 (enrich-job-posts, 11853.30ms, attempt 2 of 25) with error 'Request failed with status code 429':

  Error

      at new AxiosError (/opt/render/project/src/node_modules/.pnpm/[email protected]/node_modules/axios/lib/core/AxiosError.js:21:4)

      at settle (/opt/render/project/src/node_modules/.pnpm/[email protected]/node_modules/axios/lib/core/settle.js:2)

      at handleStreamEnd (/opt/render/project/src/node_modules/.pnpm/[email protected]/node_modules/axios/lib/adapters/http.js:599:9)

      at endReadableNT (native)

      at processTicksAndRejections (native)

      at /opt/render/project/src/node_modules/.pnpm/[email protected]/node_modules/axios/lib/core/Axios.js:48:12

      at asyncFunctionResume (native)

      at promiseReactionJobWithoutPromiseUnwrapAsyncContext (native)

      at promiseReactionJob (native)

      at processTicksAndRejections (native)

[worker(worker-6f1b0ec5b9b6945331)] ERROR: Failed to release job '39050' after failure 'Request failed with status code 429'; committing seppuku

Failed to connect

[core] ERROR: Worker exited, but pool is in continuous mode, is active, and is not shutting down... Did something go wrong?

41 |     rej = reject

42 |   }).catch((err) => {

43 |     // replace the stack trace that leads to `TCP.onStreamRead` with one that leads back to the

44 |     // application that created the query

45 |     Error.captureStackTrace(err)

46 |     throw err

       ^

ECONNREFUSED: Failed to connect

 syscall: "connect"


      at object.<anonymous> (/opt/render/project/src/node_modules/.pnpm/[email protected][email protected]/node_modules/pg-pool/index.js:46:3)

      at promiseReactionJob (native:1:1)

      at processTicksAndRejections (native:1:1)


      at /opt/render/project/src/node_modules/.pnpm/[email protected][email protected]/node_modules/graphile-worker/dist/main.js:610:13

[core] ERROR: Worker exited with error: ECONNREFUSED: Failed to connect

The text was updated successfully, but these errors were encountered:

benjie · 2024-11-04T22:43:38Z

What version of worker?

psugihara · 2024-11-04T22:45:14Z

[email protected]

benjie · 2024-11-04T23:05:52Z

Can you reproduce the issue using Node rather than bun?

psugihara · 2024-11-04T23:13:50Z

I haven't tried yet or done much debugging other than confirming that the process is still around. I can try to put together a minimal repro at some point. For clarification, the process is supposed to die after the seppuku message?

Thanks for the quick help (and this amazing tool)!

benjie · 2024-11-04T23:41:28Z

It doesn’t explicitly exit the process, it just stops running the worker; that should mean everything then shuts down relatively cleanly and then the process should exit.

JosephHalter · 2024-11-22T14:57:13Z

I just had a similar issue with version 0.16.5 - I had 2 scheduled tasks configured in cronTab, they both commited seppuku and just stop running entirely despite being scheduled to run every 5 minutes. I didn't expect a job failure to cause it to stop being scheduled entirely, I only noticed after many days and it caused a lot of issues.

Is there any way to ensure that a job failing doesn't prevent it from running at the next scheduled time?

benjie · 2024-11-22T17:56:04Z

@JosephHalter can you provide a reproduction?

benjie · 2024-11-22T18:11:20Z

It's not 100% clear how to handle this. A force exit would kill all concurrently working jobs which isn't ideal. A graceful shutdown might take forever. Spinning a new worker up to replace the faulty one may cause a thundering herd problem.

Here's my first punt at it:

fdd041c

JosephHalter · 2024-11-23T19:19:10Z

@benjie Sadly it's pretty hard to reproduce, I only have the logs to show that the tasks failed and seppuku happened but I don't know exactly how it failed. It's also on a private repo that I can't share. Failure in itself can happen and isn't a big issue, the problem was that the job stop begin scheduled entirely - if it was in the system crontab new jobs would have been spawned no matter how the previous ones failed. Indeed it's possible to have herd issues if you keep spawning new jobs but that would have been easier to notice. I've no problem with force killing concurrently working jobs, although like you suggested in the comments of your pull request, going for a graceful exit and only forcefully killing it after a delay would be best, even better if the delay is configurable and if you put 0 then it doesn't forcefully kill the workers for people who don't want the behavior at all.

Thank you for taking this issue seriously and immediately creating a pull request 👍

benjie linked a pull request Nov 22, 2024 that will close this issue

Batched job processing (opt-in) #474

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should worker process exit when the worker commits seppuku? #501

Should worker process exit when the worker commits seppuku? #501

psugihara commented Nov 4, 2024

benjie commented Nov 4, 2024

psugihara commented Nov 4, 2024

benjie commented Nov 4, 2024

psugihara commented Nov 4, 2024

benjie commented Nov 4, 2024

JosephHalter commented Nov 22, 2024

benjie commented Nov 22, 2024

benjie commented Nov 22, 2024

JosephHalter commented Nov 23, 2024 •

edited

Loading

Should worker process exit when the worker commits seppuku? #501

Should worker process exit when the worker commits seppuku? #501

Comments

psugihara commented Nov 4, 2024

Summary

Additional context

benjie commented Nov 4, 2024

psugihara commented Nov 4, 2024

benjie commented Nov 4, 2024

psugihara commented Nov 4, 2024

benjie commented Nov 4, 2024

JosephHalter commented Nov 22, 2024

benjie commented Nov 22, 2024

benjie commented Nov 22, 2024

JosephHalter commented Nov 23, 2024 • edited Loading

JosephHalter commented Nov 23, 2024 •

edited

Loading