refactor: decouple worker threads from non-worker threads #1137

Alliballibaba2 · 2024-11-04T20:07:02Z

This PR refactors how threads are started and is meant as a step towards scaling threads at runtime.

How worker threads are currently started:

Currently, worker threads are started from regular threads via sending a special request
to ServeHTTP. The disadvantage here is that between sending the special worker request
and receiving it, we are losing control over which thread becomes a worker thread.
Worker threads and regular threads are inevitable coupled to each other.

How worker threads are started with this PR:

This PR decouples worker threads from regular threads and makes the php_thread struct
a wrapper around the thread's lifetime.

A 'PHP thread' is currently just a pthread with its own TSRM storage (this doesn't
necessarily have to be tied to a real thread in the future as discussed in #1090).

The thread starts, does some work in a loop and then stops. This PR makes it possible
to configure these 3 lifetime hooks from the go side via the php_thread struct:

onStartup: Right before the thread is ready (once)
onWork: The actual work the thread does (in a loop)
onShutdown: Right before the thread is stopped (once)

This allows re-using the same mechanism for regular threads as well as worker threads.
It also makes it easier to create other potential types of threads in the future
(like 'scheduled workers' or 'task workers').

Additionally, it now would also be possible to grab an 'idle thread', exchange it's hooks and
turn it into a different type of thread at runtime without stopping the underlying thread.
(This PR doesn't go that far though)

# Conflicts: # frankenphp.c # frankenphp.go # php_thread.go # worker.go

Alliballibaba2 · 2024-11-04T21:02:42Z

Hmm that segfault is interesting, It's probably not fully safe to execute a PHP script while calling (void)ts_resource(0);, I'll adjust the logic.

frankenphp.c

frankenphp.go

php_thread.go

php_threads_test.go

testdata/sleep.php

worker.go

frankenphp.go

AlliBalliBaba · 2024-11-11T18:28:21Z

Or wait... Are fibers actually still broken? They don't seem to be on main

# Conflicts: # frankenphp.go

AlliBalliBaba · 2024-11-15T12:26:32Z

This approach seems to actually fix fibers? @dunglas @withinboredom At least tests were red in #1151 and now the fiber-basic.php test is green.
All I did was remove the execute_script call from go to C and instead executed the PHP script directly inside the C loop. The only calls from go to C are now happening when registering $_SERVER variables in go_register_variables (probably not problematic?)

AlliBalliBaba · 2024-11-15T12:29:38Z

frankenphp.c

+  while (true) {
+    char *scriptName = go_frankenphp_on_thread_work(thread_index);
+
+    // if the script name is NULL, the thread should exit
+    if (scriptName == NULL) {
+      break;
+    }
+
+    // if the script name is not empty, execute the PHP script
+    if (strlen(scriptName) != 0) {
+      int exit_status = frankenphp_execute_script(scriptName);
+      go_frankenphp_after_thread_work(thread_index, exit_status);
+    }
  }


Running the 'thread work' loop like this seems to fix fibers.

Maybe calling the methods before_script_execution and after_script_execution would make more sense then.

withinboredom · 2024-11-15T17:25:08Z

The reason fibers fail is go-C-go, and fibers move the stack, which causes go to panic. If you can remove one of those, go is quite happy.

AlliBalliBaba · 2024-11-15T18:47:44Z

So C-go-C should be fine then 👍

withinboredom · 2024-11-15T19:57:47Z

As long as it isn't possible to end up with C-go-C-go, yeah.

withinboredom

Looking pretty good. I see a lot of uses of atomics, which are usually red flags unless you are doing low-level things.

For example, it may be better to put into the worker struct a boolean for "isRestarting" that gets unconditionally set to false once a worker is started. When restarting, set to true. Then just check that boolean. Otherwise, the workersAreRestarting is bound to a specific time. There's a couple other cases I pointed out where you've bound the execution to time, when it would be better to bind it to the struct lifetime (threadsAreBooting and isReady).

In all, I think this is a great step in the right direction! I've checked it out and will give it a whirl.

php_thread.go

php_threads.go

worker.go

withinboredom · 2024-11-16T20:44:29Z

I'll be honest, after spending awhile on this branch and trying to make some changes... I don't think this is a step in the right direction. The "hooks" system makes things quite complex to debug, especially since hooks can be swapped out during execution. All the boolean flags and waitgroups makes it even harder to find out where things are going wrong and why. In essence, it feels like a Rube Goldberg machine when stepping line by line. It's pretty fun to do, but it doesn't feel very solid and feels easy to break.

Have you considered modeling this as a state machine? Maybe something like this:

type workerState int

const (
	workerStateInactive workerState = iota
	workerStateReady
	workerStateActive
	workerStateDrain
)

type workerStateMachine struct {
	currentState workerState
	booting      bool
	mu           sync.RWMutex
	subscribers  map[workerState][]chan struct{}
}

// Transition models a state machine for workers.
// A worker thread moves from inactive to ready to active to drain to inactive.
// Inactive means a thread is not currently uninitialized.
// Ready means a thread has prepared to run a request.
// Active means a thread is running requests or assigned to a worker.
// Drain means a thread will not accept new requests and return to Inactive after the current request.
func (w *workerStateMachine) Transition(nextState workerState) {
	w.mu.Lock()
	defer w.mu.Unlock()

	if w.currentState == nextState {
		return
	}

	notifySubs := func(state workerState) {
		if c, ok := w.subscribers[state]; ok {
			for _, ch := range c {
				close(ch)
			}
			delete(w.subscribers, state)
		}
	}

	switch w.currentState {
	case workerStateInactive:
		switch nextState {
		case workerStateInactive:
			return
		case workerStateActive:
			panic("worker cannot transition from inactive to active")
		case workerStateReady:
			w.currentState = workerStateReady
			notifySubs(workerStateReady)
			return
		case workerStateDrain:
			w.currentState = workerStateDrain
			notifySubs(workerStateDrain)
			return
		}
	case workerStateReady:
		switch nextState {
		case workerStateInactive:
			panic("worker cannot transition from ready to inactive")
		case workerStateActive:
			if w.booting {
				w.booting = false
			}
			w.currentState = workerStateActive
			notifySubs(workerStateActive)
			return
		case workerStateReady:
			return
		case workerStateDrain:
			w.currentState = workerStateDrain
			notifySubs(workerStateDrain)
			return
		}
	case workerStateActive:
		switch nextState {
		case workerStateInactive:
			panic("worker cannot transition from active to inactive")
		case workerStateActive:
			return
		case workerStateReady:
			panic("worker cannot transition from active to ready")
		case workerStateDrain:
			w.currentState = workerStateDrain
			notifySubs(workerStateDrain)
			return
		}
	case workerStateDrain:
		switch nextState {
		case workerStateInactive:
			w.currentState = workerStateInactive
			notifySubs(workerStateInactive)
			return
		case workerStateActive:
			panic("worker cannot transition from drain to active")
		case workerStateReady:
			panic("worker cannot transition from drain to ready")
		case workerStateDrain:
			return
		}
	}
}

func (w *workerStateMachine) CurrentState() workerState {
	w.mu.RLock()
	defer w.mu.RUnlock()
	return w.currentState
}

func (w *workerStateMachine) IsBooting() bool {
	w.mu.RLock()
	defer w.mu.RUnlock()
	return w.booting
}

// WaitForNext blocks until the given state has transitioned.
func (w *workerStateMachine) WaitForNext(state workerState) {
	w.mu.Lock()

	if w.currentState == state {
		return
	}

	if w.subscribers == nil {
		w.subscribers = make(map[workerState][]chan struct{})
	}

	if _, ok := w.subscribers[state]; !ok {
		w.subscribers[state] = []chan struct{}{}
	}

	ch := make(chan struct{})
	w.subscribers[state] = append(w.subscribers[state], ch)
	barrier := sync.WaitGroup{}
	barrier.Add(1)
	go func() {
		<-ch
		barrier.Done()
	}()

	w.mu.Unlock()
	barrier.Wait()
}

// WaitFor blocks until the given state has transitioned, or returns immediately if the state has already transitioned.
func (w *workerStateMachine) WaitFor(state workerState) {
	w.mu.RLock()
	if w.currentState >= state {
		w.mu.RUnlock()
		return
	}
	w.mu.RUnlock()

	// todo: a race can happen in this empty space

	w.WaitForNext(state)
}

Using this would alleviate a lot of need for waitgroups and booleans. For example, when initializing threads, you could simply do something like:

	ready := sync.WaitGroup{}

	for _, thread := range phpThreads {
		thread.setInactive()
		ready.Add(1)
		go func() {
			thread.currentState.WaitFor(workerStateReady)
			ready.Done()
		}()
		if !C.frankenphp_new_php_thread(C.uintptr_t(thread.threadIndex)) {
			panic(fmt.Sprintf("unable to create thread %d", thread.threadIndex))
		}
	}

	ready.Wait()

Then in the switches, just define what needs to happen on each state change per the thread lifetime. Workers and cgi-requests can be handled similarly.

What do you think?

AlliBalliBaba · 2024-11-16T22:48:03Z

Yeah a state machine definitely makes sense to abstract away some of the WaitGroups 👍.
This doesn't get rid of hooks though. Hooks are still needed in order to know what to do when coming from C->go. I guess we could make php_thread an interface instead and have the different types of threads extend it? (the performance impact would probably be minimal)

I actually quite like atomic.Bools, they're very efficient since they're lock free.

…tion instead.

# Conflicts: # worker.go

withinboredom · 2024-11-22T20:02:13Z

I just discovered a major issue with this implementation. Try to output a large response and you'll discover that go_ub_write is called long after go_frankenphp_after_script_execution which will cause a segmentation fault once the thread is reset.

withinboredom · 2024-11-22T20:04:49Z

Actually, this is likely to be the same case in main and might be why we occasionally see segfaults -- this could happen with small responses, it would just be much more rare.

withinboredom · 2024-11-22T21:41:46Z

Or maybe it is my branch of php-master. I'll check out an official branch later.

AlliBalliBaba · 2024-11-22T22:03:50Z

Hmm that would be weird, go_ub_write should happen on the same thread. It works for me with 1MB of text in the dev.Dockfile at least.

withinboredom · 2024-11-22T22:09:16Z

Yeah, I highly suspect it is an issue with my experimental build of php. I only saw it when it doesn't write the entire response out in one go, fwiw.

Alliballibaba2 added 15 commits November 1, 2024 23:10

Decouple workers.

fe1158f

Moves code to separate file.

ad34140

Cleans up the exponential backoff.

89b211d

Initial working implementation.

7d2ab8c

Refactors php threads to take callbacks.

f7e7d41

Cleanup.

c03c59b

Cleanup.

a9857dc

Cleanup.

bac9555

Cleanup.

a2f8d59

Merge branch 'main' into refactor/start-worker-threads-directly

279924c

Adjusts watcher logic.

0825453

Adjusts the watcher logic.

17d5cbe

Fix opcache_reset race condition.

09e0ca6

Merge branch 'main' into refactor/start-worker-threads-directly

a726a2c

# Conflicts: # frankenphp.c # frankenphp.go # php_thread.go # worker.go

Fixing merge conflicts and formatting.

7f13ada

Sec32fun32 approved these changes Nov 4, 2024

View reviewed changes

dunglas reviewed Nov 4, 2024

View reviewed changes

Alliballibaba2 added 9 commits November 5, 2024 11:39

Prevents overlapping of TSRM reservation and script execution.

13fb4bb

Adjustments as suggested by @dunglas.

a8a00c8

Adds error assertions.

b4dd138

Adds comments.

03f98fa

Removes logs and explicitly compares to C.false.

e52dd0f

Resets check.

cd98e33

Adds cast for safety.

4e2a2c6

Fixes waitgroup overflow.

c51eb93

Resolves waitgroup race condition on startup.

89d8e26

withinboredom reviewed Nov 6, 2024

View reviewed changes

frankenphp.go Outdated Show resolved Hide resolved

Alliballibaba2 mentioned this pull request Nov 6, 2024

Fix: Load the os env once at startup. #1141

Closed

Alliballibaba2 added 3 commits November 15, 2024 12:55

Merge branch 'main' into refactor/start-worker-threads-directly

740fac7

# Conflicts: # frankenphp.go

Fixes merge conflict.

8a272cb

Adds fibers test back in.

ecce5d5

AlliBalliBaba reviewed Nov 15, 2024

View reviewed changes

AlliBalliBaba mentioned this pull request Nov 15, 2024

frankenphp_worker_php_ready_worker is skewing to negative values with worker restarts #1163

Closed

Refactors new thread loop approach.

06ebd67

withinboredom reviewed Nov 15, 2024

View reviewed changes

php_thread.go Outdated Show resolved Hide resolved

php_thread.go Outdated Show resolved Hide resolved

php_threads.go Outdated Show resolved Hide resolved

worker.go Outdated Show resolved Hide resolved

worker.go Outdated Show resolved Hide resolved

withinboredom force-pushed the refactor/start-worker-threads-directly branch from 98ba095 to 06ebd67 Compare November 16, 2024 10:23

withinboredom mentioned this pull request Nov 16, 2024

fix(metrics): handle the case where the worker is already assigned to a thread #1171

Merged

Alliballibaba2 added 2 commits November 16, 2024 16:57

Removes redundant check.

c811f4a

Adds compareAndSwap.

6bd047a

Alliballibaba2 added 5 commits November 17, 2024 22:39

Refactor: removes global waitgroups and uses a 'thread state' abstrac…

55ad8ba

…tion instead.

Merge branch 'main' into refactor/start-worker-threads-directly

3ffbe06

# Conflicts: # worker.go

Removes unnecessary method.

01ed92b

Updates comment.

790cccc

Removes unnecessary booleans.

0dd2605

withinboredom mentioned this pull request Nov 21, 2024

refactor: simplify exponential backoff and refactor env #1185

Merged

withinboredom added a commit that referenced this pull request Nov 22, 2024

apply frankenphp.c changes from #1137

62c8319

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: decouple worker threads from non-worker threads #1137

refactor: decouple worker threads from non-worker threads #1137

Alliballibaba2 commented Nov 4, 2024

Alliballibaba2 commented Nov 4, 2024

AlliBalliBaba commented Nov 11, 2024

AlliBalliBaba commented Nov 15, 2024

AlliBalliBaba Nov 15, 2024

AlliBalliBaba Nov 15, 2024

withinboredom commented Nov 15, 2024

AlliBalliBaba commented Nov 15, 2024

withinboredom commented Nov 15, 2024

withinboredom left a comment

withinboredom commented Nov 16, 2024 •

edited

Loading

AlliBalliBaba commented Nov 16, 2024

withinboredom commented Nov 22, 2024

withinboredom commented Nov 22, 2024

withinboredom commented Nov 22, 2024

AlliBalliBaba commented Nov 22, 2024

withinboredom commented Nov 22, 2024

refactor: decouple worker threads from non-worker threads #1137

Are you sure you want to change the base?

refactor: decouple worker threads from non-worker threads #1137

Conversation

Alliballibaba2 commented Nov 4, 2024

How worker threads are currently started:

How worker threads are started with this PR:

Alliballibaba2 commented Nov 4, 2024

AlliBalliBaba commented Nov 11, 2024

AlliBalliBaba commented Nov 15, 2024

AlliBalliBaba Nov 15, 2024

Choose a reason for hiding this comment

AlliBalliBaba Nov 15, 2024

Choose a reason for hiding this comment

withinboredom commented Nov 15, 2024

AlliBalliBaba commented Nov 15, 2024

withinboredom commented Nov 15, 2024

withinboredom left a comment

Choose a reason for hiding this comment

withinboredom commented Nov 16, 2024 • edited Loading

AlliBalliBaba commented Nov 16, 2024

withinboredom commented Nov 22, 2024

withinboredom commented Nov 22, 2024

withinboredom commented Nov 22, 2024

AlliBalliBaba commented Nov 22, 2024

withinboredom commented Nov 22, 2024

withinboredom commented Nov 16, 2024 •

edited

Loading