:poolboy.checkout returning :full #59

jeregrine · 2016-04-27T22:00:29Z

https://github.com/edgurgel/verk/blob/master/lib/verk/workers_manager.ex#L188 on version 0.9.13 (but would happen the same on 10) returns :full

If you need more details from our redis instance we can grab them for you.

jeregrine · 2016-04-27T22:00:38Z

cc @DavidAntaramian

edgurgel · 2016-04-27T22:17:03Z

WorkersManager watches the workers and keep track of the free workers. We do this because :poolboy.status is a GenServer.call and we already have the workers information available.

We could simply start using the status to know if we have available workers.

I would try this approach changing this private function: https://github.com/edgurgel/verk/blob/v0.9.13/lib/verk/workers_manager.ex#L210-L212

Then doing some quick benchmarks to see if it affects too much. If it's decreasing the performance we could potentially call status just when we think that we have free workers. Basically use both the WorkersManager ets table AND :poolboy.status.

Does this make any sense? :P

jeregrine · 2016-04-28T13:39:01Z

Makes sense to me.

mitchellhenke · 2016-05-03T21:56:30Z

going to have a go at this

edgurgel · 2016-05-05T22:42:10Z

Awesome @mitchellhenke. I was thinking on a simple way to run benchmarks for workers.

I was thinking on:

Enqueue a "StartBenchWorker" that prints the time
Enqueue 10k/100k NoOpWorkers
Enqueue a "FinishBenchWorker" that prints the time

We would get for free a pipeline of workers running. OFC that the FinishBenchWorker wouldn't be the strict last worker to run as it will run concurrently with the NoOpworkers in the end. It will be a good estimate still.

This may even become part of our test suite somehow.

Thoughts?

mitchellhenke · 2016-05-13T16:31:47Z

So I started investigating this on my machine (2.7 GHz Intel Core i5 / 8 GB 1867 MHz DDR3 for what it's worth).

From this limited benchmark, it doesn't look like :poolboy.status makes a significant difference in throughput for 10k or 100k jobs:

10k Jobs
Current (ms)	8250.157	6568.419	6970.148	7006.6	6191.896	6966.409	6132.113	6773.351	6320.074	7959.506
Average	6913.8673
Std Dev	708.4890134

Poolboy Status (ms)	5777.431	7104.012	6427.04	6457.407	8404.173	6383.751	6420.81	8187.788	7864.628	6475.438
Average	6950.2478
Std Dev	895.9131822

100k Jobs
Current (ms)	71920.293	73990.513	68666.102	65416.878	68894.512	71506.667	63874.567	62768.951	64139.824	68178.992
Average	67935.7299
Std Dev	3811.426986

Poolboy Status (ms)	77480.684	75714.092	71881.198	68302.955	68916.468	62165.245	61049.917	62572.434	63234.677	61973.636
Average	67329.1306
Std Dev	6071.034864

The code for the benchmark is here: https://github.com/mitchellhenke/verk/compare/master...benching?expand=1

The code for the :poolboy.status change is here: https://github.com/mitchellhenke/verk/compare/benching...poolboy_status?expand=1

What I did notice was that the current bottleneck is likely the WorkersManager handling the large amount of DOWN and :done messages, which would get in the way of sending jobs to the Workers?

Would it make sense to have the workers handle the dequeuing of jobs? Sidekiq switched from something resembling verk's current manager style dequeuing to having the workers themselves dequeue jobs. It would be a heavier undertaking, but would solve the issue at hand and potentially increase throughput. I will probably play around with it and see what happens.

Outside of that, should I PR the :poolboy.status change for now? Happy to explore other options to solve this issue as well if you have other ideas.

edgurgel · 2016-05-13T22:14:54Z

Great findings. I already have a local branch with some work to investigate extracting part of the workers manager processing to the workers. I will push this soon so we can start discussing. Is this fair?

I would still push forward the :poolboy.status change so we fix the current issue. What do you think?

mitchellhenke · 2016-05-13T23:27:11Z

That sounds great! I've opened #67 for this issue.

I'd be glad to take a look or help for the worker manager/worker changes whenever :)

edgurgel · 2016-05-14T04:59:18Z

I should also point that it's important to tweak the benchmark as well to check with workers that spend more time doing something. I think it's safe to say you wouldn't put stuff to be done as a job if you can do it instantly. I'm keen to try with some random sleep (say 0 to 60 seconds). So then we have a less artificial benchmark. I will play with the benchmark today :)

edgurgel · 2016-05-14T11:23:16Z

I forgot to add a comment to this:

Would it make sense to have the workers handle the dequeuing of jobs? Sidekiq switched from something resembling verk's current manager style dequeuing to having the workers themselves dequeue jobs. It would be a heavier undertaking, but would solve the issue at hand and potentially increase throughput. I will probably play around with it and see what happens.

Sidekiq can't be a real comparison because of the concurrency that the Erlang VM can deal with. The amount of workers and queues that Verk can handle is much greater. We have 400+ queues running each more than 10 workers per machine. If all of them start hitting Redis to dequeue jobs we may overload the Redis connections with unnecessary requests. The monitoring also can't be avoided completely as a Worker can simply die and we need to keep track of such failures.

(I thought that it would be a good addition to this discussion.)

I'm currently thinking of different approaches to speed up the "happy path".

mitchellhenke · 2016-05-18T14:47:48Z

That makes sense, yeah. Appreciate the explanation.

Is splitting up the things the workers manager has to do (queueing, handling monitoring, etc.) into separate processes worth looking into?

edgurgel · 2016-05-18T22:30:02Z

@mitchellhenke yeah. I just created this issue so we can discuss more! I'm keen to improve the WorkersManager :)

edgurgel added the bug label Apr 27, 2016

edgurgel added this to the 1.0.0 milestone Apr 27, 2016

mitchellhenke mentioned this issue May 13, 2016

use :poolboy.status to get free workers #67

Merged

edgurgel closed this as completed in #67 May 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

:poolboy.checkout returning :full #59

:poolboy.checkout returning :full #59

jeregrine commented Apr 27, 2016

jeregrine commented Apr 27, 2016

edgurgel commented Apr 27, 2016

jeregrine commented Apr 28, 2016

mitchellhenke commented May 3, 2016

edgurgel commented May 5, 2016

mitchellhenke commented May 13, 2016

edgurgel commented May 13, 2016 •

edited

Loading

mitchellhenke commented May 13, 2016

edgurgel commented May 14, 2016

edgurgel commented May 14, 2016 •

edited

Loading

mitchellhenke commented May 18, 2016

edgurgel commented May 18, 2016 •

edited

Loading

:poolboy.checkout returning :full #59

:poolboy.checkout returning :full #59

Comments

jeregrine commented Apr 27, 2016

jeregrine commented Apr 27, 2016

edgurgel commented Apr 27, 2016

jeregrine commented Apr 28, 2016

mitchellhenke commented May 3, 2016

edgurgel commented May 5, 2016

mitchellhenke commented May 13, 2016

edgurgel commented May 13, 2016 • edited Loading

mitchellhenke commented May 13, 2016

edgurgel commented May 14, 2016

edgurgel commented May 14, 2016 • edited Loading

mitchellhenke commented May 18, 2016

edgurgel commented May 18, 2016 • edited Loading

edgurgel commented May 13, 2016 •

edited

Loading

edgurgel commented May 14, 2016 •

edited

Loading

edgurgel commented May 18, 2016 •

edited

Loading