Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout while evaluating FedX-Query #4785

Open
keiligch opened this issue Sep 28, 2023 · 12 comments · May be fixed by #4828
Open

Timeout while evaluating FedX-Query #4785

keiligch opened this issue Sep 28, 2023 · 12 comments · May be fixed by #4828
Labels
🐞 bug issue is a bug

Comments

@keiligch
Copy link

keiligch commented Sep 28, 2023

Current Behavior

When using a FedX-Repository:
Specific query and graph (see repo linked in "Steps To Repoduce") and
((FedXTupleQuery) query).evaluate().stream().collect(Collectors.toList());
(or ((FedXTupleQuery) query).evaluate().hasNext();)
throws
org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout caused by java.util.concurrent.TimeoutException

Expected Behavior

providing a correct query result with a List<BindingSet>

Steps To Reproduce

  1. clone FedX-Query-Test-Repo

  2. run Main.main()

  3. get Exception at line 101

  4. (delete lines 8-48 in db.ttl, then the bug will not occur)

Version

4.3.6

Are you interested in contributing a solution yourself?

None

Anything else?

	at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
	at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
	at org.eclipse.rdf4j.federated.algebra.StatementSourcePattern.evaluate(StatementSourcePattern.java:109)
	at org.eclipse.rdf4j.federated.evaluation.FederationEvalStrategy.evaluate(FederationEvalStrategy.java:323)
	at org.eclipse.rdf4j.query.algebra.evaluation.QueryEvaluationStep.lambda$minimal$1(QueryEvaluationStep.java:63)
	at org.eclipse.rdf4j.federated.evaluation.union.ParallelUnionOperatorTask.performTaskInternal(ParallelUnionOperatorTask.java:46)
	at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelTaskBase.performTask(ParallelTaskBase.java:46)
	at org.eclipse.rdf4j.federated.evaluation.concurrent.ControlledWorkerScheduler$WorkerRunnable.run(ControlledWorkerScheduler.java:229)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
	Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
		... 13 more
		Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
			... 13 more
			Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
				... 13 more
				Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
					at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
					at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
					... 3 more
					Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
						at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
						at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
						at org.eclipse.rdf4j.federated.algebra.StatementSourcePattern.evaluate(StatementSourcePattern.java:109)
						at org.eclipse.rdf4j.federated.evaluation.FederationEvalStrategy.evaluate(FederationEvalStrategy.java:323)
						at org.eclipse.rdf4j.query.algebra.evaluation.QueryEvaluationStep.lambda$minimal$1(QueryEvaluationStep.java:63)
						at org.eclipse.rdf4j.federated.evaluation.union.ParallelUnionOperatorTask.performTaskInternal(ParallelUnionOperatorTask.java:46)
						at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelTaskBase.performTask(ParallelTaskBase.java:46)
						at org.eclipse.rdf4j.federated.evaluation.concurrent.ControlledWorkerScheduler$WorkerRunnable.run(ControlledWorkerScheduler.java:229)
						at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
						at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
						... 3 more
						Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
							... 13 more
							Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
								at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
								at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
								... 3 more
								Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
									at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
									at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
									at org.eclipse.rdf4j.federated.algebra.StatementSourcePattern.evaluate(StatementSourcePattern.java:109)
									at org.eclipse.rdf4j.federated.evaluation.FederationEvalStrategy.evaluate(FederationEvalStrategy.java:323)
									at org.eclipse.rdf4j.query.algebra.evaluation.QueryEvaluationStep.lambda$minimal$1(QueryEvaluationStep.java:63)
									at org.eclipse.rdf4j.federated.evaluation.union.ParallelUnionOperatorTask.performTaskInternal(ParallelUnionOperatorTask.java:46)
									at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelTaskBase.performTask(ParallelTaskBase.java:46)
									at org.eclipse.rdf4j.federated.evaluation.concurrent.ControlledWorkerScheduler$WorkerRunnable.run(ControlledWorkerScheduler.java:229)
									at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
									at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
									... 3 more
									Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
										... 13 more
										Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
											... 13 more
											Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
												... 13 more
												Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
													... 13 more
													Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
														... 13 more
														Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
															at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
															at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
															... 3 more
															Suppressed: org.eclipse.rdf4j.query.QueryInterruptedException: Query evaluation has run into a timeout.
																at org.eclipse.rdf4j.federated.exception.ExceptionUtil.toException(ExceptionUtil.java:176)
																at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:88)
																at org.eclipse.rdf4j.federated.algebra.StatementSourcePattern.evaluate(StatementSourcePattern.java:109)
																at org.eclipse.rdf4j.federated.evaluation.FederationEvalStrategy.evaluate(FederationEvalStrategy.java:323)
																at org.eclipse.rdf4j.query.algebra.evaluation.QueryEvaluationStep.lambda$minimal$1(QueryEvaluationStep.java:63)
																at org.eclipse.rdf4j.federated.evaluation.union.ParallelUnionOperatorTask.performTaskInternal(ParallelUnionOperatorTask.java:46)
																at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelTaskBase.performTask(ParallelTaskBase.java:46)
																at org.eclipse.rdf4j.federated.evaluation.concurrent.ControlledWorkerScheduler$WorkerRunnable.run(ControlledWorkerScheduler.java:229)
																at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
																at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
																... 3 more
															Caused by: java.util.concurrent.TimeoutException
																at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
																at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
																at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
																at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
																... 11 more
														Caused by: java.util.concurrent.TimeoutException
															at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
															at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
															at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
															at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
															... 3 more
													Caused by: java.util.concurrent.TimeoutException
														at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
														at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
														at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
														at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
														... 11 more
												Caused by: java.util.concurrent.TimeoutException
													at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
													at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
													at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
													at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
													... 11 more
											Caused by: java.util.concurrent.TimeoutException
												at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
												at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
												at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
												at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
												... 11 more
										Caused by: java.util.concurrent.TimeoutException
											at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
											at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
											at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
											at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
											... 11 more
									Caused by: java.util.concurrent.TimeoutException
										at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
										at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
										at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
										at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
										... 11 more
								Caused by: java.util.concurrent.TimeoutException
									at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
									at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
									at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
									at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
									... 11 more
							Caused by: java.util.concurrent.TimeoutException
								at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
								at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
								at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
								at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
								... 3 more
						Caused by: java.util.concurrent.TimeoutException
							at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
							at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
							at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
							at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
							... 11 more
					Caused by: java.util.concurrent.TimeoutException
						at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
						at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
						at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
						at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
						... 11 more
				Caused by: java.util.concurrent.TimeoutException
					at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
					at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
					at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
					at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
					... 3 more
			Caused by: java.util.concurrent.TimeoutException
				at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
				at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
				at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
				at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
				... 11 more
		Caused by: java.util.concurrent.TimeoutException
			at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
			at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
			at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
			at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
			... 11 more
	Caused by: java.util.concurrent.TimeoutException
		at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
		at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
		at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
		at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
		... 11 more
Caused by: java.util.concurrent.TimeoutException
	at java.base/java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:795)
	at org.eclipse.rdf4j.federated.evaluation.union.ControlledWorkerUnion.union(ControlledWorkerUnion.java:50)
	at org.eclipse.rdf4j.federated.evaluation.union.UnionExecutorBase.performExecution(UnionExecutorBase.java:34)
	at org.eclipse.rdf4j.federated.evaluation.concurrent.ParallelExecutorBase.run(ParallelExecutorBase.java:76)
	... 11 more

Process finished with exit code 1
@keiligch keiligch added the 🐞 bug issue is a bug label Sep 28, 2023
@keiligch keiligch changed the title Stuck in evaluating FedX-Query Timeout while evaluating FedX-Query Sep 28, 2023
@hmottestad
Copy link
Contributor

Did you try adjusting the timeout?

https://rdf4j.org/documentation/programming/federation/

@keiligch
Copy link
Author

Yes, but 120 seconds timeout seems more than sufficient for the small graph and relatively simple query to evaluate. The other queries were evaluated in fractions of a second.

@hmottestad
Copy link
Contributor

Does it still timeout when setting it to 0?

@keiligch
Copy link
Author

Yes, it throws after about 20s the TimeoutException and the QueryInterruptedException

@kenwenzel
Copy link
Contributor

Does ist also happen with NativeStore or LmdbStore?

@keiligch
Copy link
Author

Yes, with both. No difference

@hmottestad
Copy link
Contributor

Try setting the timeout directly on the Query object. Like this query.setMaxExecutionTime(0);.

Can you also update your example code to show how you are setting the timeout on the FedX repo?

@keiligch
Copy link
Author

I had the timeout setting as you wrote on the Query object and did not change it globally.
Also the example code repo is updated

@keiligch
Copy link
Author

were you able to reproduce with my example code?

@hmottestad
Copy link
Contributor

I haven't had a chance to test it out yet. Might be a little while before I have time.

kenwenzel added a commit to kenwenzel/rdf4j that referenced this issue Oct 25, 2023
@kenwenzel
Copy link
Contributor

I've created a test to show that the issue only happens against remote endpoints. If EndpointClassification.Local is used then the queries are working as expected.

I also added some "debug" output to ControlledWorkerUnion but I'm not sure if this helps to fix the problem.

@aschwarte10 do you have some time to look into this issue? I think it is critical as it leads to deadlocks when used with remote endpoints.

aschwarte10 added a commit that referenced this issue Oct 31, 2023
This commit provides a unit test that reproduces a deadlock scenario.
The issue is somewhere caused in the join with NUnion, and specifically
the union parts having relevant statements in multiple sources.

Note that the dataset is really small (42 triples, 40 relevant ones
contributing to the result)
@aschwarte10 aschwarte10 linked a pull request Oct 31, 2023 that will close this issue
5 tasks
@aschwarte10
Copy link
Contributor

Thanks for reporting the issue. It looks like you have stumbled indeed into a deadlock scenario (which fortunately at least recovers through timeout exceptions)

Unfortunately I only have very limited time available to work on RDF4J.

However, I tried to do a bit of an assessment, and have some findings to nail it down (but do not yet have an understanding of the cause). I have also pushed a PR to reproduce it with just 40 triples, so ideally this allows to debug it easier.

Some insights to share:

The issue occurs in a join with an NUnion node, where the statements of the union themselves are relevant at multiple sources.

An example FedX query plan looks like this:

QueryRoot
   Projection
      ProjectionElemList
         ProjectionElem "person"
         ProjectionElem "label"
      NJoin
         ExclusiveStatement
            Var (name=person)
            Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
            Var (name=_const_e1df31e0_uri, value=http://xmlns.com/foaf/0.1/Person, anonymous)
            StatementSource (id=repo1, type=REMOTE)
         NUnion
            StatementSourcePattern
               Var (name=person)
               Var (name=_const_9285ccfc_uri, value=http://www.w3.org/2000/01/rdf-schema#label, anonymous)
               Var (name=label)
               StatementSource (id=repo1, type=REMOTE)
               StatementSource (id=repo2, type=REMOTE)
            StatementSourcePattern
               Var (name=person)
               Var (name=_const_c9f3cb8c_uri, value=http://www.w3.org/2004/02/skos/core#prefLabel, anonymous)
               Var (name=label)
               StatementSource (id=repo1, type=REMOTE)
               StatementSource (id=repo2, type=REMOTE)

Note: due to the NUnion the join is executed in ControlledWorkerJoin. (i.e. no vectored / bind join evaluation).

It looks like the scheduler executing the unions is populated in some order that causes the deadlock, i.e. the results cannot be piped through. Might also be the main executor that blocks.

Note: the design idea is that everything scheduled in the UNION scheduler can executed and does not have dependencies, while the join scheduler may have dependencies and block.

If somebody finds the time to look further, please go ahead. I will try to look further once my time allows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug issue is a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants