Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower Ruby Thread priority for jobs by default when running in Async mode #1554

Open
bensheldon opened this issue Nov 26, 2024 · 1 comment

Comments

@bensheldon
Copy link
Owner

bensheldon commented Nov 26, 2024

It's possible for GoodJob to introduce latency into web requests when running in async mode, especially if the GoodJob job is heavily CPU-bound, as can happen, for example, when rendering out views for Turbo broadcasts

Thread scheduling and priority seems to be a less-than-well understood area of Ruby

An attempt to explain

Ruby threads are OS threads; and OS threads are preemptive, meaning the OS is entirely responsible for switching execution between threads. But, because of the GVL (Global VM Lock), the Ruby VM actually has a say in when that switching happens.

The Ruby VM has a default thread "Quantum" of 100ms. That means that the Ruby VM will grant a thread the GVL for a maximum of 100ms before the Ruby VM takes it back and gives it to another thread. That is 100ms of Ruby processing unless the thread goes into IO or sleeps or otherwise the thread releases the GVL on its own.

This is an ok way to balance execution across threads, unless those thread workloads are wildly different (homogenous tasks are always better!)

The dreaded "Tail Latency" of multithreaded behavior can happen, related to the Ruby Thread Quantum, when you have what might otherwise be a very short request, for example:

  • A request that could be 10ms because it's making ten 1ms calls to Memcached/Redis to fetch some cached values and then returns them (IO-bound Thread)

...but when it's running in a thread next to:

  • A request that takes 1 second and largely spends its time doing string manipulation, for example a background thread that is taking a bunch of complex hashes and arrays and serializing them into a payload to send to a metrics server. Or rendering slow/big/complex views for Turbo Broadcasts (CPU-bound Thread)

...then the CPU-bound thread will be very greedy with holding the GVL and it will look like

  1. IO-bound Thread: Starts 1ms network request and releases GVL
  2. CPU-bound Thread: Does 100ms of work on the CPU before the GVL is taken back
  3. IO-bound Thread: Gets GVL back and starts next 1ms network request and releases GVL
  4. CPU-bound Thread: Does 100ms of work on the CPU before the GVL is taken back
    ....

See where this is going? The IO-bound thread is taking waaaaaaay longer than the 10ms it could ideally take if the other thread wasn't so greedy with the GVL.

I wrote a quick script to simulate this (starting with the script Aaron Patterson wrote in the Ruby issue linked above). As you can see the IO-bound Thread took more than 1 second to complete, more than the 10ms we expected!

❯ vernier run --interval 1 -- ruby script.rb
starting profiler with interval 1 and allocation interval 0
fib(36) took 1.3947540000081062 seconds
io_total: 1.099480000033509 seconds
cpu_total: 1.3967440000269562 seconds
#<Vernier::Result 6.430636 seconds, 3 threads, 191429 samples, 294 unique>
written to /var/folders/5y/9zpy_s_n6sd6vv3wr62qvp9m0000gn/T/profile20241126-21607-do86i3.vernier.json.gz

And here's what that looks like in Vernier, where you can see the GVL switch back to the IO-bound Thread every 100ms to do the teensy amount of work before handing back to the CPU-bound Thread:

Image

Example script
def measure
  x = Process.clock_gettime(Process::CLOCK_MONOTONIC)
  yield
  Process.clock_gettime(Process::CLOCK_MONOTONIC) - x
end

def fib(n)
  if n < 2
    n
  else
    fib(n - 2) + fib(n - 1)
  end
end

# find fib that takes ~1 second
fib_i = 50.times.find { |i| measure { fib(i) } >= 1 }
sleep_i = measure { fib(fib_i) }

puts "fib(#{fib_i}) took #{sleep_i} seconds"

# Simulate a thread that makes ten 1ms IO calls in quick succession
io_thread = Thread.new {
  Thread.current.name = "io_thread"
  io_total = measure {
    10.times { sleep 0.001 }
  }
  puts "io_total: #{io_total} seconds"
}
Thread.pass

# Simulate a thread that makes a CPU-bound call for 1 second
cpu_thread = Thread.new {
  Thread.current.name = "cpu_thread"
  cpu_total = measure {
    fib(fib_i)
  }
  puts "cpu_total: #{cpu_total} seconds"
}
Thread.pass

io_thread.join
cpu_thread.join

How does Thread Priority work:

Ruby Thread Priority "is just hint for Ruby thread scheduler. It may be ignored on some platform." But now that that's out of the way, C Ruby's thread priority is calculated as:

The number of bit-shifts of the default Thread Quantum (100ms). Meaning either multiplying (if positive) or dividing (if negative) by powers of 2.

Thread#priority Calculation Result
-N 100ms / (2^N)
-3 100ms / (2^3) 12.5ms
-2 100ms / (2^2) 25ms
-1 100ms / (2^1) 50ms
0 100ms 100ms
1 100ms * (2^1) 200ms
2 100ms * (2^2) 400ms
3 100ms * (2^3) 800ms
N 100ms * (2^N)

This makes sense because a lower priority (negative) should release its GVL more frequently (and thus be less greedy).

What does this mean for GoodJob

When running jobs async, in the same process as web requests, we should probably lower the priority. Maybe to -3 ?

I dunno if we should allow the priority to be set directly via config or just have a configuration setting like lower_thread_priority = true with a default based on the execution mode that could be overriden.

@mperham
Copy link

mperham commented Nov 27, 2024

Thanks for doing this research Ben. Lower priority seems to make sense when embedding in another process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Inbox
Development

No branches or pull requests

2 participants