-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading vocabulary terms is extremely slow #442
Comments
My two cents regarding this issue, based on years of struggling with slow queries:
The reality is that SPARQL and all related technologies are really slow and not really usable for large databases, especially if there are complex queries involved. Regarding the OntoGrapher case specifically - there is a lot that we can do. There is one issue that is apparent right away - the initial load of SSP cache. If you run the loading SELECT query, it returns all vocabularies, their terms and diagrams. I believe there is a lot of redundant data though. As of right now the query returns 78K rows. If you omit diagrams, it returns 11K rows. If you omit terms, it returns 300 rows. Seems like there is some cartesian product of terms and diagrams going on, the total data is about 80MB, which is a lot. I believe that partitioning the query will help significantly. It's pretty clear that the current approach does not scale. I think it's OK to retrieve a list of vocabularies from the SSP cache, but I'd omit both diagrams and terms in the query since that is the kind of information that may not be needed initially. With that said, I was not able to reproduce such crazy loading times - I tried OG on the dev instance and was able to open a workspace with the 111/2009 vocabulary in the matter of seconds. |
Despite the loading query not changing, the load times over the past weeks/months have degraded to IMO unacceptable levels. For example, loading of a workspace with a single vocabulary,
111/2009
, takes 2.6 minutes. This tracks with both the deployed version and my local version (4th gen Core i7) - the deployed version is faster, but not by much. The memory is not the problem, running this query takes significant CPU time despite the query nor the version of graphDB changing.It would seem, then, that the culprit is the amount of data greatly increasing. However, the data in the workspace vocabulary contexts is not unreasonable nor is there any obvious bloat from OG or otherwise. Therefore, the first line of inquiry is whether the query can be optimized.
Also, it would be agreeable to include a larger amount of / more detailed loading information, just so the user knows the application isn't stuck.
The text was updated successfully, but these errors were encountered: