Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow post performance on /sparql-graph-crud-auth #1246

Open
ebremer opened this issue Feb 16, 2024 · 7 comments
Open

Slow post performance on /sparql-graph-crud-auth #1246

ebremer opened this issue Feb 16, 2024 · 7 comments

Comments

@ebremer
Copy link

ebremer commented Feb 16, 2024

I'm getting slow post performance when using /sparql-graph-crud-auth (latest dev version of Virtuoso) to post small graphs (300-800 triples each). Each post take about 15 seconds or so. Is there anyway to get better performance? Note, bulk loading works great (system currently holds 9 billion triples). It seems like something is choking it a bit.

@HughWilliams
Copy link
Collaborator

Are you saying the ~/sparql-graph-crud-auth endpoint performance is slow compared a previous Virtuoso open source build you have been using and if so, what are the gitid's of both for comparison ?

Also, if you have a test case for recreating the problem that would be ideal ...

@ebremer
Copy link
Author

ebremer commented Mar 5, 2024

No, I'm not claiming any regression. To date, I've loaded all of my RDF via isql bulk loading. Now, I need to delete small group of triples and add multiple small groups of triple programatically. This is actually the first time I'm using Virtuoso graph management end point. When I add batches of triples, there seems to be a minimal overhead in sending anything. So for the millions of small batches I needed to do, it would take too long. I buffered locally the smaller batches in a Jena memory model and pushed in batches of 100,000 triple which sped this up enough for it to work for me. Any way to improve speed for smaller batches?

@HughWilliams
Copy link
Collaborator

HughWilliams commented Mar 5, 2024

The Virtuoso restful HTTP ~/sparql-graph-crud-auth endpoint performance will never match or be close to the Virtuoso bulk loader which is optimised in the database engine for loading data and there are no specific configuration params for it. You could enable Virtuoso query logging to log the database activities when performing such operations, and see where the time is being spent.

Note that we also have a document on Transactional Bulk Loading of RDF Data into Virtuoso DBMS via the Jena RDF Framework, which you might want to review to see how this can be optimally done in Jena.

@TallTed TallTed changed the title Slow post performance on /sparql-graph-crud-auth Slow post performance on /sparql-graph-crud-auth Mar 5, 2024
@namedgraph
Copy link

@HughWilliams 15 seconds for under a 1000 triples sounds ridiculous though. We want to iterate and update thousands of graphs. Haven't been able to test it yet due to /sparql-graph-crud-auth auth issues (see #1304), but if this indeed is the case then the Graph Store Protocol is unusable.

@TallTed
Copy link
Collaborator

TallTed commented Jun 28, 2024

@ebremer — "latest dev version of Virtuoso" doesn't communicate well, especially over time. Please always provide the full version string including the git_head value, as reported on the commandline by virtuoso -?, or as easily retrieved via SPARQL.

@TallTed
Copy link
Collaborator

TallTed commented Jun 28, 2024

@namedgraph — It's not clear to me whether the "15 seconds for under a 1000 triples" assessment remains accurate, presuming @ebremer followed the previously linked advice in Transactional Bulk Loading of RDF Data into Virtuoso DBMS via the Jena RDF Framework. We look forward to hearing and discussing your results, upon resolution of #1304.

@ebremer
Copy link
Author

ebremer commented Jun 28, 2024

@HughWilliams Understood. I never imagined Virtuoso restful HTTP ~/sparql-graph-crud-auth endpoint performance would ever match or be close to the Virtuoso bulk loader. The bulk loader has faithfully loaded billions of triples into my triple store at a rapid rate. No complaints there!

I'll be revisiting the smaller loading via /sparql-graph-crud-auth soon as I have updates to perform. @TallTed I wasn't intending to be non-informative/unhelpful by leaving out the exact version I was using , I merely thought there was some basic tuning stuff I could look at first as I wasn't trying to insinuate any bugs at that time. But, I will include the string if it helps for reference currently or as a historical note.

@HughWilliams I'll take a look at the code you sent me before trying my update and report back what I find.

@namedgraph how many triples / named graphs are in your store? Mine is about 20 billion triples and another Virtuoso instance containing 11 billion triples of bibliographic data on a separate server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants