-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share my config to change to your local LLM and embedding #374
Comments
Is that OK? I would also like to switch to a local ollama supported model |
I also changed it for Mixtral 8x7B under LM Studio and embeddings to Nomic using the config data from LMStudio |
Finally, it works. There's a bug when doing create_final_community_reports, it will use llm in setting instead of the llm under community_report in setting.yaml. Llama3 context window is only 8192, it is not enough to do summary for create_final_community_reports. So you must have a context window like 32k model.
|
ollama doesn't support OAI compatible embedding API , try use the llamap.cpp to server the model. |
What model do you recommend for the task? |
Mixtral has it 32k
W dniu pt., 5.07.2024 o 13:22 bernardmaltais ***@***.***>
napisał(a):
… Llama3 context window is only 8192, it is not enough to do summary for
create_final_community_reports. So you must have a context window like 32k
model.
What model do you recommend for the task?
—
Reply to this email directly, view it on GitHub
<#374 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A646JTIQQKUKDNG6IPVEV7LZKZ6WXAVCNFSM6AAAAABKMXEWRCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJQGY4TSNBUHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
i am using moonshot and qwen max. |
Are people finding that OSS models are strong enough to actually do meaningful work with the graphRAG approach in this repository? |
Hard. to tell. Even commercial models like gpt-3.5-turbo are not providing mind blowing results when compared to something like Google's NotebookLM A lot of the time GraphRAG fail to provide the correct answer, where NotebookLM nails it. Example GraphRAG global:
GraphRAG Global:
Same question with NobetookLM:
GraphRAG Global was totally wrong with the answer. Local was better... but NotebookLM provided a more relevant answer. Maybe if I used Chatgpt-4o it might do better... but I am not willing to pay the $ to discover it. |
I can understand the cost argument for development/testing reasons. What we’ve found so far is that use of OSS models leads to more noise in the knowledge graph and therefore a degradation in the overall quality of the graph that is constructed. With a subpar graph, you’re likely to see a wide range of issues in the query response. We encourage testing with other models but we find that the gpt-4 turbo and gpt-4o LLM’s provide the best quality in practice (at this time). When using models that produce low precision results, this can cause problems in the knowledge graph due to the noise that they introduce. With the GPT-4 family, those models are strongly biased toward precision and the noise is minimal (even when compared to gpt-3.5 turbo). For a better quality knowledge graph construction, also consider taking a closer look at the prompts generated from the auto-templating process. These prompts are a vital component of the graphrag approach. Our docs don’t currently cover this feature in detail but you can increase the quality of your knowledge graphs by manually reviewing the auto-generated prompts and editing/tuning them (if there are clear errors) to your own data before indexing. |
@jgbradley1 Thank you for the info. I did create the prompt for the document using the auto-generated feature of graphrag. Still performed less than expected. Probably because I used gpt-3.5 turbo for the whole process. |
What dataset have you indexed? Would be curious to run the process using GPT-4o and compare to NotebookLM (running on Gemini 1.5 Pro I believe). |
How can we modify code to support huggingface embedding too? |
Hi everyone, I'd like to share my configuration of using local LLM and embedding to run GraphRag. As for the LLM, I use Mistral-7B-Instruct-v0.3. It has 60K+ input length so it can handle the create_community_reports easily. for the embedding model, I use e5-mistral-7b-instruct, as this is the best open source sentence embedding I find through some literature review. All of the previous models can be served through vLLM, so you can build your local rag system with some speed boost provided by vLLM. Besides, there is a small issue lies in query phase. Since the GraphRag request the LLM server through OpenAI style, the "system" content is not capable for mistral model. However, you can import your customize chat template to overcome this issue. Here is the template I use: {%- for message in messages %}
{%- if message['role'] == 'system' -%}
{{- message['content'] -}}
{%- else -%}
{%- if message['role'] == 'user' -%}
{{-'[INST] ' + message['content'].rstrip() + ' [/INST]'-}}
{%- else -%}
{{-'' + message['content'] + '</s>' -}}
{%- endif -%}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
{{-''-}}
{%- endif -%} I have already run through the whole local process on the novel “A Christmas Coral”. Hope this message can help everyone who wants to build your own local GraphRag 🎉. |
Thanks for your sharing. I also want to use Mistral model, Could you please paste your settings.yaml file |
Please share ur settings.yaml file |
Hi everyone. Here is the settings.yaml I used encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
# model: gpt-4-turbo-preview
# model: "/data3/litian/Redemption/LLama-3/Meta-Llama-3-8B-Instruct"
model: "/data3/litian/Redemption/generativeModel/Mistral-7B-Instruct-v0.3"
# model: "/data3/litian/Redemption/generativeModel/Meta-Llama-3-8B-Instruct"
model_supports_json: false # recommended if this is available for your model.
# max_tokens: 4000
# request_timeout: 180.0
# api_base: https://<instance>.openai.azure.com
api_base: http://localhost:8000/v1
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
parallelization:
stagger: 0.3
# num_threads: 50 # the number of threads to use for parallel processing
async_mode: threaded # or asyncio
embeddings:
## parallelization: override the global parallelization settings for embeddings
async_mode: threaded # or asyncio
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding # or azure_openai_embedding
# model: text-embedding-3-small
model: "/data3/litian/Redemption/embeddingModel/test/e5-mistral-7b-instruct"
# api_base: https://<instance>.openai.azure.com
api_base: http://localhost:8001/v1
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional
chunks:
size: 300
overlap: 100
group_by_columns: [id] # by default, we don't allow chunks to cross documents
input:
type: file # or blob
file_type: text # or csv
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$"
cache:
type: file # or blob
base_dir: "cache"
# connection_string: <azure_blob_storage_connection_string>
# container_name: <azure_blob_storage_container_name>
storage:
type: file # or blob
base_dir: "output/${timestamp}/artifacts"
# connection_string: <azure_blob_storage_connection_string>
# container_name: <azure_blob_storage_container_name>
reporting:
type: file # or console, blob
base_dir: "output/${timestamp}/reports"
# connection_string: <azure_blob_storage_connection_string>
# container_name: <azure_blob_storage_container_name>
entity_extraction:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 0
summarize_descriptions:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
claim_extraction:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
# enabled: true
prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 0
community_reports:
## llm: override the global llm settings for this task
## parallelization: override the global parallelization settings for this task
## async_mode: override the global async_mode settings for this task
prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodes
# num_walks: 10
# walk_length: 40
# window_size: 2
# iterations: 3
# random_seed: 597832
umap:
enabled: false # if true, will generate UMAP embeddings for nodes
snapshots:
graphml: false
raw_entities: false
top_level_nodes: false
local_search:
# text_unit_prop: 0.5
# community_prop: 0.1
# conversation_history_max_turns: 5
# top_k_mapped_entities: 10
# top_k_relationships: 10
# max_tokens: 12000
global_search:
# max_tokens: 12000
# data_max_tokens: 12000
# map_max_tokens: 1000
# reduce_max_tokens: 2000
# concurrency: 32 |
Thanks for sharing! I have another question for your help, where to change the template you have pasted before? |
By vLLM you can use --chat-template to specify your own template. The bash script is shown as follow: base_model="/data3/litian/Redemption/generativeModel/Mistral-7B-Instruct-v0.3"
api_key="12345"
n_gpu=1
python -m vllm.entrypoints.openai.api_server \
--model ${base_model} \
--dtype float16 \
--tensor-parallel-size ${n_gpu} \
--api-key ${api_key} \
--enforce-eager \
--chat-template=./template/mistral.jinja |
Hello, I would like to know how to use vllm to start the embedding model。 I look at your setting.yaml |
``
Hi, Actually vLLM support e5-mistral-7b-instruct. I think this is the only embedding model that vLLM support officially (If I am wrong, please correct me 😊). You can start it through the following command: base_model="/data3/litian/Redemption/embeddingModel/test/e5-mistral-7b-instruct"
api_key="12345"
n_gpu=1
python -m vllm.entrypoints.openai.api_server --port 8001 --model ${base_model} --dtype auto --tensor-parallel-size ${n_gpu} --api-key ${api_key} |
this is a temp solution for local ollama |
Text output path: ragtest/input/sample.txt |
Consolidating alternate model issues here: #657 |
Consolidating alternate model issues here: #657 |
Hi, |
You can change your LLM to another one which has longer context window. Besides if you meet json format issue you can disable the json config in the yaml file. |
settings.yaml
config the llm to llama3 in groq or any other model compatible with OAI API.
Using the llama.cpp to server embedding API, it is compatible with OAI API.
The start command:
So the embedding in the setting config is
But....
⠦ GraphRAG Indexer
├── Loading Input (text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
├── create_base_entity_graph
├── create_final_entities
├── create_final_nodes
├── create_final_communities
├── join_text_units_to_entity_ids
├── create_final_relationships
├── join_text_units_to_relationship_ids
└── create_final_community_reports
❌ Errors occurred during the pipeline run, see logs for more details.
The text was updated successfully, but these errors were encountered: