Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance on GRAPH_GROUP #1278

Open
mhoangvslev opened this issue Apr 25, 2024 · 6 comments
Open

Slow performance on GRAPH_GROUP #1278

mhoangvslev opened this issue Apr 25, 2024 · 6 comments

Comments

@mhoangvslev
Copy link

mhoangvslev commented Apr 25, 2024

  • I created this graph group:

    DB.DBA.RDF_GRAPH_GROUP_DROP('http://www.batch0.fr/', 0);
    DB.DBA.RDF_GRAPH_GROUP_CREATE('http://www.batch0.fr/',0);
    DB.DBA.RDF_GRAPH_USER_PERMS_SET ('http://www.batch0.fr/', 'nobody', 9);
    DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor0.fr/');
    DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor1.fr/');
    DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor2.fr/');
    DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor3.fr/');
    DB.DBA.RDF_GRAPH_GROUP_INS('http://www.batch0.fr/', 'http://www.vendor4.fr/');
  • I executed this query, it takes forever when it should be instantaneous:

    SELECT COUNT(*)
    FROM <http://www.batch0.fr/>
    WHERE {
        ?s ?p ?o 
    } 
@HughWilliams
Copy link
Collaborator

What is the Virtuoso version you are using as this works for me querying from the SPARQL endpoint or isql, with the latest develop/7 build ?

@mhoangvslev
Copy link
Author

I use v7.2.12
The output of virtuoso-tis:

Version 7.2.12.3239-pthreads as of Feb 13 2024 (d698f21712)
Compiled for Linux (x86_64-alpine-linux-gnu)
Copyright (C) 1998-2024 OpenLink Software

@mhoangvslev
Copy link
Author

I also include in the link below the dump of the database I use (virtuoso.db + virtuoso.ini):
https://drive.google.com/file/d/1lAlzAkr6Vy3BZZGjf59padrTXaffDoNj/view?usp=sharing

@HughWilliams
Copy link
Collaborator

HughWilliams commented Apr 26, 2024

In your test case, you only had 4 graphs in the graph group, with no data inserted in any of the graphs. Whereas in the database provided, there are 20 graphs in the graph group, with a total of 3M+ triples across all the graphs.

Graph groups does not scale in Virtuoso Open Source, as the query across the graph group gets compiled as SELECT ... G IN () resulting in multiple join condition tests, which is a very time consuming operation to perform serially on every row, and so will not scale. The Virtuoso 8.x Commercial Edition implements a new invisible hash join algorithm, which would compile such queries as a hash IN join that runs in parallel, and is thus more performant and scalable.

@mhoangvslev
Copy link
Author

Thank you for your insight!

The workaround is to ingest graph data of the same group into separate Virtuoso databases and execute the queries accordingly. Will the implementation be ported to Virtuoso Open Source at some point?

@HughWilliams
Copy link
Collaborator

There are no plans for the invisible hash join feature to be ported to open source, it is a commercial only feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants