You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently working on configuring QLever for a project that involves indexing a substantial dataset with nearly 5 billion triples. The objective is to maximize the speed of the indexing process on a high-performance server with ~1 TB RAM.
Given the high RAM capacity, I am seeking advice on the optimal combination of parameters to significantly speed up the index creation process. Specifically, I would like guidance on the following:
Number of Triples per Batch: What would be the ideal setting considering the server's high RAM capacity?
STXXL Memory: How can we best utilize the available 1 TB of RAM?
Any Additional Parameters: Are there other settings or parameters that can be adjusted to further enhance the indexing speed?
The speed of the indexing process is a crucial factor for us. Any insights or recommendations on how to best leverage our server's capabilities would be greatly appreciated.
Thank you in advance for your support!
The text was updated successfully, but these errors were encountered:
@arcangelo7 Interesting question. As a baseline, can you run the index build with "num-triples-per-batch": 10000000 and STXXL_MEMORY = 10G and send us the output of qlever index-stats and maybe also attach the full index-log.txt file?
Yes, we are using NVMe SSDs. I initially set the "num-triples-per-batch" parameter to 1,000,000, but encountered error 129, indicating a SIGHUP signal, even though I was launching qlever index with nohup. I'm not certain if the issue was due to the number of triples per batch or other parameters that might have influenced the process, which I haven't fully investigated yet.
However, setting the "num-triples-per-batch" to 100,000 resolved the issue. Additionally, I configured the STXXL_MEMORY to 128G. Here are the results of the index stats and the entire index-log.txt file.
Command: index-stats
Breakdown of the time used for building the index, based on the timestamps for key lines in "oc_meta.index-log.txt"
Parse input : 31.0 min
Build vocabularies : 15.6 min
Convert to global IDs : 2.9 min
Permutation SPO & SOP : 10.1 min
Permutation OSP & OPS : 20.9 min
Permutation PSO & POS : 19.7 min
Text index : 93.2 min
TOTAL time : 193.4 min
Breakdown of the space used for building the index
Files index.* : 59.8 GB
Files vocabulary.* : 23.1 GB
Files text.* : 25.7 GB
TOTAL size : 108.6 GB
The content of the index-log.txt file is attached.
I am genuinely very satisfied with being able to recreate the entire index, including the text index, in a little over three hours. If I achieve better results, I will notify you in this issue.
Hello QLever Team,
I am currently working on configuring QLever for a project that involves indexing a substantial dataset with nearly 5 billion triples. The objective is to maximize the speed of the indexing process on a high-performance server with ~1 TB RAM.
Given the high RAM capacity, I am seeking advice on the optimal combination of parameters to significantly speed up the index creation process. Specifically, I would like guidance on the following:
The speed of the indexing process is a crucial factor for us. Any insights or recommendations on how to best leverage our server's capabilities would be greatly appreciated.
Thank you in advance for your support!
The text was updated successfully, but these errors were encountered: