Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAM usage keeps increasing when deleting and reimporting data #1240

Open
gpicciuca opened this issue Jan 19, 2024 · 8 comments
Open

RAM usage keeps increasing when deleting and reimporting data #1240

gpicciuca opened this issue Jan 19, 2024 · 8 comments

Comments

@gpicciuca
Copy link

gpicciuca commented Jan 19, 2024

Hello there,
I'm experimenting a bit with Virtuoso as we're trying to figure out if it's a good choice for our Project and while doing some tests I stumbled upon a weird memory usage case.

In this particular case, the Queries I run are:
DELETE FROM DB.DBA.RDF_QUAD
DB.DBA.TTLP_MT (file_to_string_output ('/path/to/ttl_file.ttl'), '', 'http://localhost:8890/XXX')

The dataset is about 5,7 MB and contains roughly 80k triplets.

Everytime I run the above queries, the RAM usage of Virtuoso increases by about 0,5 MB. Occasionally a few (1-2 MB) get reclaimed, but other than that it keeps increasing.
The test I ran lasted 40 minutes. The queries above were executed at 5sec intervals continuosly and each time the RAM kept increasing by 0,3 - 0,6 MB (mostly 0,5 MB).
Virtuoso started at 99,2 MB and after the first run (delete + import) it increased to 237,1 MB.
After 7:30mins it was sitting at 302 MB.
After 10:50mins it was at 318,2 MB and after 40mins it was above 400MB.

I rebuild Virtuoso with debug symbols so that I could inspect the issue with Valgrind and these are the findings:

==471315== LEAK SUMMARY:
==471315== definitely lost: 467,890 bytes in 26,239 blocks
==471315== indirectly lost: 955,209 bytes in 41,848 blocks
==471315== possibly lost: 206,207,327 bytes in 314,533 blocks
==471315== still reachable: 17,579,054 bytes in 131,026 blocks
==471315== suppressed: 0 bytes in 0 blocks`

However, I also run the Valgrind Massif heap profiler, but there it shows constant allocations with occasional spikes which return to normal right after:
image

I'm running the develop/7 branch locally on Ubuntu 20.04.

Configuration is unchanged, running default values.

Any idea what could be the cause? Are there any known issues that have yet to be fixed perhaps?

P.S.: I cannot share the dataset as it's confidential (company stuff).

@HughWilliams
Copy link
Collaborator

So basically you are continuously deleting and loading the same RDF dataset ?

What does the output of the Linux top command and Virtuoso `status();" command run from "isql" report ?

Please also provide a copy of the virtuoso.ini file in use.

@gpicciuca
Copy link
Author

Correct. I'm continuosly deleting and reloading the same dataset.

Top shows that Virtuoso started at 0,2 MEM % (I have 64GB RAM). After the first import it reaches 0,5%.
After multiple deletions and imports it's at 0,6% (05:35mins runtime, queries run every 5sec)
image

I was using Ubuntu's System Monitor to check the RAM usage.

status(); in iSQL says:
image
image
image

And the virtuoso.ini:
virtuoso.ini.txt

@HughWilliams
Copy link
Collaborator

HughWilliams commented Jan 19, 2024

Looking at the virtuoso.ini config file, I note the following settings:

ThreadCleanupInterval		= 0
ResourcesCleanupInterval	= 0

Setting these to 0 results in no cleanup of thread and other memory resources, which can be construed as a memory leak, as detailed in the configuration parameters documentation. You can set both to 1 to force the clean up of unused threads/resources, and thereby reduce memory consumption by the Virtuoso server.

@gpicciuca
Copy link
Author

gpicciuca commented Jan 22, 2024

Looking at the virtuoso.ini config file, I note the following settings:

ThreadCleanupInterval		= 0
ResourcesCleanupInterval	= 0

Setting these to 0 results in no cleanup of thread and other memory resources, which can be construed as a memory leak, as detailed in the configuration parameters documentation. You can set both to 1 to force the clean up of unused threads/resources, and thereby reduce memory consumption by the Virtuoso server.

I gave it a try just now, setting both parameters to 1 in the ini file. Confirmed that the parameters are loaded via the web interface also:
image

but it's still only increasing the memory. Nothing is being free'd.
Resources are not freed either after I stop my stress-test tool and just let Virtuoso run on its own doing "nothing".

Here's a short recording:
https://github.com/openlink/virtuoso-opensource/assets/124195270/7814d00f-efdc-48ba-8f42-a1f9dac97801

The queries being executed are always the same as mentioned in the posts above as well as the dataset being loaded.

@HughWilliams
Copy link
Collaborator

HughWilliams commented Jan 22, 2024

I assume the Virtuoso instance was restarted when the INI file parameters where changed, such that they will take effect? (These settings do not take effect on a running instance without a restart, though the Conductor editor will immediately show the values have been changed in the INI file.)

Looking at your loop test case again, i.e.,

DELETE FROM DB.DBA.RDF_QUAD
DB.DBA.TTLP_MT (file_to_string_output ('/path/to/ttl_file.ttl'), '', 'http://localhost:8890/XXX')

This is a bad test case, as the RDF_QUAD table contains a number of system graphs that are used for managing the RDF Quad Store, that would have been deleted by the blanket DELETE FROM DB.DBA.RDF_QUAD query.

Even if you were to qualify it with the actual graph name being loaded, i.e., DELETE FROM DB.DBA.RDF_QUAD WHERE g = iri_to_id ('http://localhost:8890/XXX', there are other RDF-related tables that would be touched when loading the data with the TTLP_MT() function or SPARQL insert queries, and would not be cleaned. So, you should use the SPARQL CLEAR GRAPH <http://localhost:8890/XXX> query to remove the graph being loaded, which should clean all required the RDF-related tables.

You should probably also run the COMMIT WORK command after each iteration of the loop...

@gpicciuca
Copy link
Author

gpicciuca commented Jan 22, 2024

I have restarted the server multiple times between the tests, and as noted in the screenshot above the parameters were active in the running instance.

I re-run the tests and changed:

  • the DELETE query with SPARQL CLEAR GRAPH <http://localhost:8890/XXX> as you suggested;

I was also already commiting the transaction at the end of each cycle (with SQLEndTran(SQL_HANDLE_DBC, hdbc_, SQL_COMMIT)) but now I also added (in addition) a query running COMMIT WORK.

The actions on each loop iteration are now:

  1. SPARQL CLEAR GRAPH <graph name>
  2. DB.DBA.TTLP_MT(file_open(...), '', 'graph_name')
  3. SQLEndTran with SQL_COMMIT
  4. Manual Query: COMMIT WORK

Result: Memory still keeps increasing. The parameters ThreadCleanupInterval and ResourcesCleanupInterval are still set to 1 in the configuration file.

Also tested:

  • changing file_to_string_output with file_open, and vice-versa
  • changing TTLP_MT() with TTLP()

Made no difference.

Edit: Downloaded and compiled the last release v7.2.11 to see if the problem would manifest on there too and indeed, I have the same problem with that version. Same as on the current develop/7 branch.

@imitko
Copy link
Collaborator

imitko commented Jan 22, 2024

@gpicciuca

Please could you check the VmSize & VmRSS process stats, e.g.,

cat /proc/{*virtuoso-pid*}/status |grep Vm 

What are these statistics after continuous operation, and how do they change?

@gpicciuca
Copy link
Author

@imitko

I changed the cycle time to 1sec instead of 5sec just to speed things up a bit.

Here's a recording of how the VM values change over a period of 6-7 minutes:
https://github.com/openlink/virtuoso-opensource/assets/124195270/a3bd7f37-af64-4f7e-a73f-01b8ca13ffea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants