Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible deadlock on startup #635

Closed
Dohbedoh opened this issue Mar 7, 2024 · 6 comments · Fixed by #637, #644 or #682
Closed

Possible deadlock on startup #635

Dohbedoh opened this issue Mar 7, 2024 · 6 comments · Fixed by #637, #644 or #682
Labels

Comments

@Dohbedoh
Copy link

Dohbedoh commented Mar 7, 2024

Jenkins and plugins versions report

The initialization of the Prometheus plugin may cause a deadlock on startup. In a scenario where a plugin / process causes the initialization of the DefaultPrometheusMetrics in the boot up org.jvnet.hudson.reactor.Reactor.runTask, if there are pipelines running (for example resumed pipelines) then there is a chance to hit it:

Deadlock details:

Found one Java-level deadlock:
=============================
"Periodic background build discarder thread":
  waiting to lock monitor 0x00007fd8800058a8 (object 0x000000064782f100, a hudson.ExtensionList$Lock),
  which is held by "GroovyInitScript.init"
"GroovyInitScript.init":
  waiting to lock monitor 0x00007fd86a766828 (object 0x0000000631c002a0, a org.jenkinsci.plugins.workflow.cps.CpsFlowExecution),
  which is held by "Running CpsFlowExecution[Owner[mypipeline #123]]"
"Running CpsFlowExecution[Owner[mypipeline #123]]":
  waiting to lock monitor 0x00007fd8800058a8 (object 0x000000064782f100, a hudson.ExtensionList$Lock),
  which is held by "GroovyInitScript.init"

Java stack information for the threads listed above:
===================================================
"Periodic background build discarder thread":
	at hudson.ExtensionList.ensureLoaded(ExtensionList.java:315)
	- waiting to lock <0x000000064782f100> (a hudson.ExtensionList$Lock)
	at hudson.ExtensionList.size(ExtensionList.java:191)
	at hudson.ExtensionList.lookupSingleton(ExtensionList.java:450)
	at jenkins.model.GlobalBuildDiscarderConfiguration.get(GlobalBuildDiscarderConfiguration.java:48)
	at jenkins.model.BackgroundGlobalBuildDiscarder.processJob(BackgroundGlobalBuildDiscarder.java:61)
	at jenkins.model.BackgroundGlobalBuildDiscarder.execute(BackgroundGlobalBuildDiscarder.java:55)
	at hudson.model.AsyncPeriodicWork.lambda$doRun$0(AsyncPeriodicWork.java:100)
	at hudson.model.AsyncPeriodicWork$$Lambda$360/234513123.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:748)
"GroovyInitScript.init":
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getCurrentHeads(CpsFlowExecution.java:981)
	- waiting to lock <0x0000000631c002a0> (a org.jenkinsci.plugins.workflow.cps.CpsFlowExecution)
	at org.jenkinsci.plugins.workflow.flow.FlowExecution.isComplete(FlowExecution.java:208)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.isComplete(CpsFlowExecution.java:1238)
	at org.jenkinsci.plugins.workflow.cps.RunningFlowActions.createFor(RunningFlowActions.java:52)
	at org.jenkinsci.plugins.workflow.cps.RunningFlowActions.createFor(RunningFlowActions.java:42)
	at hudson.model.Actionable.createFor(Actionable.java:114)
	at hudson.model.Actionable.getAction(Actionable.java:335)
	at org.jenkinsci.plugins.prometheus.JobCollector.hasTestResults(JobCollector.java:395)
	at org.jenkinsci.plugins.prometheus.JobCollector.appendJobMetrics(JobCollector.java:296)
	at org.jenkinsci.plugins.prometheus.JobCollector.lambda$collect$0(JobCollector.java:220)
	at org.jenkinsci.plugins.prometheus.JobCollector$$Lambda$270/321878244.accept(Unknown Source)
	at org.jenkinsci.plugins.prometheus.util.Jobs.forEachJob(Jobs.java:14)
	at org.jenkinsci.plugins.prometheus.JobCollector.collect(JobCollector.java:214)
	at io.prometheus.client.CollectorRegistry.collectorNames(CollectorRegistry.java:100)
	at io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:50)
	at org.jenkinsci.plugins.prometheus.service.DefaultPrometheusMetrics.<init>(DefaultPrometheusMetrics.java:29)
	at org.jenkinsci.plugins.prometheus.service.DefaultPrometheusMetrics$$FastClassByGuice$$af5295cb.newInstance(<generated>)
	at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
	at com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:61)
	at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:105)
	at com.google.inject.internal.ConstructorInjector.access$000(ConstructorInjector.java:32)
	at com.google.inject.internal.ConstructorInjector$1.call(ConstructorInjector.java:89)
	at com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:115)
	at hudson.ExtensionFinder$GuiceFinder$SezpozModule.onProvision(ExtensionFinder.java:566)
	at com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:126)
	at com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:68)
	at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:87)
	at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267)
	at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1103)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
	at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:145)
	at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
	at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
	at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
	at com.google.inject.internal.SingleMethodInjector.inject(SingleMethodInjector.java:84)
	at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:132)
	at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:114)
	at com.google.inject.internal.ConstructorInjector.access$000(ConstructorInjector.java:32)
	at com.google.inject.internal.ConstructorInjector$1.call(ConstructorInjector.java:89)
	at com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:115)
	at hudson.ExtensionFinder$GuiceFinder$SezpozModule.onProvision(ExtensionFinder.java:566)
	at com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:126)
	at com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:68)
	at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:87)
	at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:267)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1103)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
	at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:145)
	at hudson.ExtensionFinder$GuiceFinder$FaultTolerantScope$1.get(ExtensionFinder.java:440)
	at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
	at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1016)
	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
	at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1012)
	at hudson.ExtensionFinder$GuiceFinder._find(ExtensionFinder.java:402)
	at hudson.ExtensionFinder$GuiceFinder.find(ExtensionFinder.java:393)
	at hudson.ClassicPluginStrategy.findComponents(ClassicPluginStrategy.java:349)
	at hudson.ExtensionList.load(ExtensionList.java:381)
	at hudson.ExtensionList.ensureLoaded(ExtensionList.java:317)
	- locked <0x000000064782f100> (a hudson.ExtensionList$Lock)
	at hudson.ExtensionList.iterator(ExtensionList.java:172)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2025)
	at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2066)
	at org.codehaus.groovy.runtime.dgm$163.invoke(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274)
	at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
	at cli-shutdown$_run_closure1.doCall(cli-shutdown.groovy:4)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
	at cli-shutdown.run(cli-shutdown.groovy:8)
	at groovy.lang.GroovyShell.evaluate(GroovyShell.java:585)
	at jenkins.util.groovy.GroovyHookScript.execute(GroovyHookScript.java:138)
	at jenkins.util.groovy.GroovyHookScript.execute(GroovyHookScript.java:129)
	at jenkins.util.groovy.GroovyHookScript.run(GroovyHookScript.java:112)
	at hudson.init.impl.GroovyInitScript.init(GroovyInitScript.java:42)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at hudson.init.TaskMethodFinder.invoke(TaskMethodFinder.java:104)
	at hudson.init.TaskMethodFinder$TaskImpl.run(TaskMethodFinder.java:175)
	at org.jvnet.hudson.reactor.Reactor.runTask(Reactor.java:296)
	at jenkins.model.Jenkins$5.runTask(Jenkins.java:1131)
	at org.jvnet.hudson.reactor.Reactor$2.run(Reactor.java:214)
	at org.jvnet.hudson.reactor.Reactor$Node.run(Reactor.java:117)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
"Running CpsFlowExecution[Owner[mypipeline #123]]":
	at hudson.ExtensionList.ensureLoaded(ExtensionList.java:315)
	- waiting to lock <0x000000064782f100> (a hudson.ExtensionList$Lock)
	at hudson.ExtensionList.size(ExtensionList.java:191)
	at java.util.AbstractCollection.toArray(AbstractCollection.java:136)
	at java.util.ArrayList.addAll(ArrayList.java:583)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.getListenersToRun(CpsFlowExecution.java:1449)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.notifyListeners(CpsFlowExecution.java:1455)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.notifyNewHead(CpsThreadGroup.java:472)
	at org.jenkinsci.plugins.workflow.cps.FlowHead.setNewHead(FlowHead.java:157)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.onProgramEnd(CpsFlowExecution.java:1255)
	- locked <0x0000000631c002a0> (a org.jenkinsci.plugins.workflow.cps.CpsFlowExecution)
	at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.croak(CpsFlowExecution.java:882)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService.reportProblem(CpsVmExecutorService.java:57)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService.access$100(CpsVmExecutorService.java:23)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:69)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

What Operating System are you using (both controller, and any agents involved in the problem)?

Unix

Reproduction steps

It will not always reproduce the problem, since it is a race condition, but there is a chance that it does.

Expected Results

Jenkins starts, always.

Actual Results

Jenkins sometimes nevers finish starting up because of a deadlock.

Anything else?

The DefaultPrometheusMetrics should maybe be initialized lazily.

Are you interested in contributing a fix?

Sure.

@Dohbedoh Dohbedoh added the bug label Mar 7, 2024
@Waschndolos
Copy link

Nice observation thank you! Since you're interested in contributing a fix I guess you're already working on it?

@Dohbedoh
Copy link
Author

Dohbedoh commented Mar 8, 2024

I proposed something #637

@Waschndolos
Copy link

Waschndolos commented Mar 13, 2024

@Dohbedoh. I've just found an interesting issue here. The constructor of org.jenkinsci.plugins.prometheus.service.DefaultPrometheusMetrics is called twice by the guice framework. Although it's marked as a Singleton oO (org.jenkinsci.plugins.prometheus.context.Context). When you run it with hpi:run the first Constructor call won't trigger the init method, the 2nd does but then the 1st Object as you will will be used for the org.jenkinsci.plugins.prometheus.service.DefaultPrometheusMetrics#collectMetrics method. Then the if(!initialized) always hits. Do you have an idea?

Waschndolos pushed a commit that referenced this issue Mar 22, 2024
…uice usage to avoid duplicate instantiation of DefaultPrometheusMetrics class.
Waschndolos pushed a commit that referenced this issue Mar 22, 2024
…uice usage to avoid duplicate instantiation of DefaultPrometheusMetrics class. (#644)
Waschndolos pushed a commit that referenced this issue Mar 28, 2024
…plicit guice usage to avoid duplicate instantiation of DefaultPrometheusMetrics class. (#644)"

This reverts commit 9401c6e.
Waschndolos pushed a commit that referenced this issue Mar 29, 2024
…ances (#650)

* Revert "Fixes #635: Changing creation of service classes away from explicit guice usage to avoid duplicate instantiation of DefaultPrometheusMetrics class. (#644)"

This reverts commit 9401c6e.

* Revert "Fix initilization potential deadlock (#637)"

This reverts commit d79999a.
@Dohbedoh
Copy link
Author

@Waschndolos Should we maybe re-open this ? I still saw evidence of that problem in some instances running with latest versions of the plugin. The possibility that Collector kick during the initialization of a components is problematic because those collector eventually use extensions.

@Waschndolos
Copy link

@Dohbedoh I'm not sure how to solve it but to track it I'll reopen it.

@Dohbedoh
Copy link
Author

Proposed #682.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment