Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Narrow down Mutex locked scope #120

Closed
wants to merge 0 commits into from
Closed

Narrow down Mutex locked scope #120

wants to merge 0 commits into from

Conversation

oxzi
Copy link
Member

@oxzi oxzi commented Oct 27, 2023

While working on #112, I experienced some delay when replaying the whole Icinga 2 Objects API state after an Event Stream connection loss. After taking some pprof snapshots, some big unlocks occurred.

Thus, I minimized the scopes or code areas being mutex protected. Especially in the GetCurrent function creates a long Lock even over SQL queries.

However, in most experimental sessions the locks were insignificant, while the SQL internals required huge time slots.


For the record, I altered the Listener as follows for pprof capturing.

Listener diff
diff --git a/internal/listener/listener.go b/internal/listener/listener.go
index c8e9bf7..7854d7a 100644
--- a/internal/listener/listener.go
+++ b/internal/listener/listener.go
@@ -14,6 +14,8 @@ import (
 	"github.com/icinga/icingadb/pkg/logging"
 	"go.uber.org/zap"
 	"net/http"
+	"net/http/pprof"
+	"runtime"
 	"time"
 )
 
@@ -27,6 +29,15 @@ type Listener struct {
 	mux  http.ServeMux
 }
 
+func init() {
+	// https://github.com/golang/go/issues/42834
+	// https://cs.opensource.google/go/go/+/refs/tags/go1.21.3:src/net/http/pprof/pprof.go;l=93
+	http.DefaultServeMux = http.NewServeMux()
+
+	runtime.SetBlockProfileRate(1)
+	runtime.SetMutexProfileFraction(1)
+}
+
 func NewListener(db *icingadb.DB, configFile *config.ConfigFile, runtimeConfig *config.RuntimeConfig, logs *logging.Logging) *Listener {
 	l := &Listener{
 		configFile:    configFile,
@@ -38,6 +49,13 @@ func NewListener(db *icingadb.DB, configFile *config.ConfigFile, runtimeConfig *
 	l.mux.HandleFunc("/process-event", l.ProcessEvent)
 	l.mux.HandleFunc("/dump-config", l.DumpConfig)
 	l.mux.HandleFunc("/dump-incidents", l.DumpIncidents)
+
+	l.mux.HandleFunc("/debug/pprof/", pprof.Index)
+	l.mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
+	l.mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
+	l.mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
+	l.mux.HandleFunc("/debug/pprof/trace", pprof.Trace)
+
 	return l
 }

Relevant inspections can be performed by

$ go tool pprof http://localhost:5680/debug/pprof/mutex?seconds=10
> svg

and

$ wget -O trace http://localhost:5680/debug/pprof/trace?seconds=10
$ go tool trace trace

@cla-bot cla-bot bot added the cla/signed CLA is signed by all contributors of a PR label Oct 27, 2023
@julianbrost
Copy link
Collaborator

Especially in the GetCurrent function creates a long Lock even over SQL queries.

That's not ideal but somewhat intentional right now (or more of a leftover from quickly adding the database connectivity to my previously in-memory only prototype, the locking structure is pretty much the same as it was then): so far, the purpose of these mutexes is not only protecting the corresponding map, but also ensuring the constraint that there's only one open incident per object.

To improve this, it will probably need some per-object locking, as currently (i.e. current main), if an incident for some object is loaded, no other incident may be loaded, even though it would be sufficient if no other attempt was made to load the incident of the same object. Maybe this could be addresses with a magic "this incident is currently being loaded" value in the map and a sync.Cond, maybe also with some bigger restructuring. (Maybe #68 loading all open incidents anyways may allow saving some database queries.)

@julianbrost julianbrost requested review from julianbrost and removed request for julianbrost October 27, 2023 14:07
@oxzi oxzi force-pushed the mutex-scope branch 3 times, most recently from 4decb6f to 9393add Compare November 29, 2023 13:31
@julianbrost
Copy link
Collaborator

Were there any changes that address my previous comment?

Anyways, that might even be a use case for golang.org/x/sync/singleflight.

@oxzi oxzi force-pushed the mutex-scope branch 2 times, most recently from b8565c9 to 34bc344 Compare December 7, 2023 15:11
@oxzi
Copy link
Member Author

oxzi commented Dec 13, 2023

I tried some variations of the PR's initial idea, also with using each Incident's own Mutex, but this does not really seem to make a measurable effect. As there is now #129 and, as already written in the initial comment, the changes do not have a real measurable impact, I am not even sure if we should proceed with this PR.

@oxzi oxzi closed this Jan 2, 2024
@yhabteab yhabteab deleted the mutex-scope branch January 15, 2024 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla/signed CLA is signed by all contributors of a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants