GraphRAG performance enhacements #924

rbrugaro · 2024-11-20T19:44:06Z

Issue: When property graph store gets filled (~12K nodes, 15K relationships) insertion time in dataprep gets slow.
Extraction + insertion starts at ~30 sec and once it gets filled grows to (~12K nodes, 15K relationships) ~800 sec
Perf bottleneck this cypher call in llama-index to do node upsert:
https://github.com/run-llama/llama_index/blob/795bebc2bad31db51b854a5c062bedca42397630/llama-index-integrations/graph_stores/llama-index-graph-stores-neo4j/llama_index/graph_stores/neo4j/neo4j_property_graph.py#L334

WIP solution:

mode initialization out of detaprep and retrieve function so only performed once
...

Signed-off-by: Rita Brugarolas <[email protected]>

for more information, see https://pre-commit.ci

…o directly run build communities Signed-off-by: Rita Brugarolas <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Rita Brugarolas <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Rita Brugarolas <[email protected]>

Signed-off-by: rbrygaro <[email protected]>

for more information, see https://pre-commit.ci

eero-t · 2024-12-18T09:57:56Z

There are lot of dataprep backends and neoj4/llama is not the default one used in docker compose files and Helm charts.

Do the ones used by default have also similar bottleneck?

rbrugaro and others added 8 commits November 20, 2024 19:40

move graph and index to initialization func.fix trimming

66fe26f

Signed-off-by: Rita Brugarolas <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c34a0ac

for more information, see https://pre-commit.ci

disable schema_refresh from startup and added skip ingestion option t…

8898f57

…o directly run build communities Signed-off-by: Rita Brugarolas <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

92697aa

for more information, see https://pre-commit.ci

upgrade llama_index_graph_stores_neo4j

b336825

Signed-off-by: Rita Brugarolas <[email protected]>

fix llamaindex neo4j package dependency

1871015

Signed-off-by: Rita Brugarolas <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c651005

for more information, see https://pre-commit.ci

extend timeout to be able to process large document at once

4920771

Signed-off-by: Rita Brugarolas <[email protected]>

rbrugaro mentioned this pull request Dec 11, 2024

[Feature] GraphRAG perf improvement #1025

Open

joshuayao linked an issue Dec 12, 2024 that may be closed by this pull request

[Feature] GraphRAG perf improvement #1025

Open

rbrugaro and others added 2 commits December 16, 2024 06:12

Switch to OpenAILike to work w vllm/tgi, added concurrency and batching

078e04f

Signed-off-by: rbrygaro <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

bc64208

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GraphRAG performance enhacements #924

GraphRAG performance enhacements #924

rbrugaro commented Nov 20, 2024

eero-t commented Dec 18, 2024

GraphRAG performance enhacements #924

Are you sure you want to change the base?

GraphRAG performance enhacements #924

Conversation

rbrugaro commented Nov 20, 2024

eero-t commented Dec 18, 2024