[Issue]: <title> ❌ create_final_entities None AND ❌ Errors occurred during the pipeline run, see logs for more details. #623

zw-change · 2024-07-19T07:18:47Z

Describe the issue

and will be removed in a future version. Please use 'DataFrame.transpose' instead.
return bound(*args, **kwds)
获取嵌入向量时发生错误: [WinError 10061] 由于目标计算机积极拒绝，无法连接。
❌ create_final_entities

Steps to reproduce

ollama local instead the openai

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat # or azure_openai_chat
model: mistral
model_supports_json: true # recommended if this is available for your model.
api_base: http://192.168.0.17:11434/v1

max_tokens: 4000

request_timeout: 180.0

api_base: https://.openai.azure.com

api_version: 2024-02-15-preview

organization: <organization_id>

deployment_name: <azure_model_deployment_name>

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

max_retries: 10

max_retry_wait: 10.0

sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

concurrent_requests: 25 # the number of parallel inflight requests that may be made

temperature: 0 # temperature for sampling

top_p: 1 # top-p sampling

n: 1 # Number of completions to generate

parallelization:
stagger: 0.3

num_threads: 50 # the number of threads to use for parallel processing

async_mode: threaded # or asyncio

embeddings:

parallelization: override the global parallelization settings for embeddings

async_mode: threaded # or asyncio
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding # or azure_openai_embedding
model: nomic-embed-text
api_base: http://192.168.0.17:11434/v1
# api_base: https://.openai.azure.com
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
# tokens_per_minute: 150_000 # set a leaky bucket throttle
# requests_per_minute: 10_000 # set a leaky bucket throttle
# max_retries: 10
# max_retry_wait: 10.0
# sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
# concurrent_requests: 25 # the number of parallel inflight requests that may be made
# batch_size: 16 # the number of documents to send in a single request
# batch_max_tokens: 8191 # the maximum number of tokens to send in a single request
# target: required # or optional

chunks:
size: 1200
overlap: 100
group_by_columns: [id] # by default, we don't allow chunks to cross documents

input:
type: file # or blob
file_type: text # or csv
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\.txt$"

cache:
type: file # or blob
base_dir: "cache"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

storage:
type: file # or blob
base_dir: "output/${timestamp}/artifacts"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

reporting:
type: file # or console, blob
base_dir: "output/${timestamp}/reports"

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

entity_extraction:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/entity_extraction.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 1

summarize_descriptions:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/summarize_descriptions.txt"
max_length: 500

claim_extraction:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

enabled: true

prompt: "prompts/claim_extraction.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 1

community_reports:

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

prompt: "prompts/community_report.txt"
max_length: 2000
max_input_length: 8000

cluster_graph:
max_cluster_size: 10

embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodes

num_walks: 10

walk_length: 40

window_size: 2

iterations: 3

random_seed: 597832

umap:
enabled: false # if true, will generate UMAP embeddings for nodes

snapshots:
graphml: false
raw_entities: false
top_level_nodes: false

local_search:

text_unit_prop: 0.5

community_prop: 0.1

conversation_history_max_turns: 5

top_k_mapped_entities: 10

top_k_relationships: 10

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

global_search:

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

data_max_tokens: 12000

map_max_tokens: 1000

reduce_max_tokens: 2000

concurrency: 32

Logs and screenshots

{"type": "error", "data": "Error executing verb "text_embed" in create_final_entities: iteration over a 0-d array", "stack": "Traceback (most recent call last):\n File "C:\Users\admin\AppData\Local\pypoetry\Cache\virtualenvs\graphrag-Me9XHZ9h-py3.11\Lib\site-packages\datashaper\workflow\workflow.py", line 415, in _execute_verb\n result = await result\n ^^^^^^^^^^^^\n File "E:\pythonworkspace\Graph_RAG\graphrag\graphrag\index\verbs\text\embed\text_embed.py", line 105, in text_embed\n return await _text_embed_in_memory(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "E:\pythonworkspace\Graph_RAG\graphrag\graphrag\index\verbs\text\embed\text_embed.py", line 130, in _text_embed_in_memory\n result = await strategy_exec(texts, callbacks, cache, strategy_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "E:\pythonworkspace\Graph_RAG\graphrag\graphrag\index\verbs\text\embed\strategies\openai.py", line 62, in run\n embeddings = await _execute(llm, text_batches, ticker, semaphore)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "E:\pythonworkspace\Graph_RAG\graphrag\graphrag\index\verbs\text\embed\strategies\openai.py", line 108, in _execute\n return [item for sublist in results for item in sublist]\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "E:\pythonworkspace\Graph_RAG\graphrag\graphrag\index\verbs\text\embed\strategies\openai.py", line 108, in \n return [item for sublist in results for item in sublist]\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTypeError: iteration over a 0-d array\n", "source": "iteration over a 0-d array", "details": null}
{"type": "error", "data": "Error running pipeline!", "stack": "Traceback (most recent call last):\n File "E:\pythonworkspace\Graph_RAG\graphrag\graphrag\index\run.py", line 323, in run_pipeline\n result = await workflow.run(context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\admin\AppData\Local\pypoetry\Cache\virtualenvs\graphrag-Me9XHZ9h-py3.11\Lib\site-packages\datashaper\workflow\workflow.py", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\admin\AppData\Local\pypoetry\Cache\virtualenvs\graphrag-Me9XHZ9h-py3.11\Lib\site-packages\datashaper\workflow\workflow.py", line 415, in _execute_verb\n result = await result\n ^^^^^^^^^^^^\n File "E:\pythonworkspace\Graph_RAG\graphrag\graphrag\index\verbs\text\embed\text_embed.py", line 105, in text_embed\n return await _text_embed_in_memory(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "E:\pythonworkspace\Graph_RAG\graphrag\graphrag\index\verbs\text\embed\text_embed.py", line 130, in _text_embed_in_memory\n result = await strategy_exec(texts, callbacks, cache, strategy_args)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "E:\pythonworkspace\Graph_RAG\graphrag\graphrag\index\verbs\text\embed\strategies\openai.py", line 62, in run\n embeddings = await _execute(llm, text_batches, ticker, semaphore)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "E:\pythonworkspace\Graph_RAG\graphrag\graphrag\index\verbs\text\embed\strategies\openai.py", line 108, in _execute\n return [item for sublist in results for item in sublist]\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "E:\pythonworkspace\Graph_RAG\graphrag\graphrag\index\verbs\text\embed\strategies\openai.py", line 108, in \n return [item for sublist in results for item in sublist]\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTypeError: iteration over a 0-d array\n", "source": "iteration over a 0-d array", "details": null}
AND
14:56:36,162 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "text_embed" in create_final_entities: iteration over a 0-d array details=None
14:56:36,162 graphrag.index.run ERROR error running workflow create_final_entities
Traceback (most recent call last):
TypeError: iteration over a 0-d array
14:56:36,164 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None

Additional Information

GraphRAG Version:
Operating System:
Python Version:
Related Issues:

natoverse · 2024-07-22T20:36:04Z

Consolidating alternate model issues here: #657

zw-change added the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Jul 19, 2024

jgbradley1 added the community_support Issue handled by community members label Jul 22, 2024

natoverse removed the triage Default label assignment, indicates new issue needs reviewed by a maintainer label Jul 22, 2024

natoverse closed this as not planned Won't fix, can't repro, duplicate, stale Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: <title> ❌ create_final_entities None AND ❌ Errors occurred during the pipeline run, see logs for more details. #623

[Issue]: <title> ❌ create_final_entities None AND ❌ Errors occurred during the pipeline run, see logs for more details. #623

zw-change commented Jul 19, 2024

natoverse commented Jul 22, 2024

[Issue]: <title> ❌ create_final_entities None AND ❌ Errors occurred during the pipeline run, see logs for more details. #623

[Issue]: <title> ❌ create_final_entities None AND ❌ Errors occurred during the pipeline run, see logs for more details. #623

Comments

zw-change commented Jul 19, 2024

Describe the issue

Steps to reproduce

GraphRAG Config Used

max_tokens: 4000

request_timeout: 180.0

api_base: https://.openai.azure.com

api_version: 2024-02-15-preview

organization: <organization_id>

deployment_name: <azure_model_deployment_name>

tokens_per_minute: 150_000 # set a leaky bucket throttle

requests_per_minute: 10_000 # set a leaky bucket throttle

max_retries: 10

max_retry_wait: 10.0

sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times

concurrent_requests: 25 # the number of parallel inflight requests that may be made

temperature: 0 # temperature for sampling

top_p: 1 # top-p sampling

n: 1 # Number of completions to generate

num_threads: 50 # the number of threads to use for parallel processing

parallelization: override the global parallelization settings for embeddings

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

connection_string: <azure_blob_storage_connection_string>

container_name: <azure_blob_storage_container_name>

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

enabled: true

llm: override the global llm settings for this task

parallelization: override the global parallelization settings for this task

async_mode: override the global async_mode settings for this task

num_walks: 10

walk_length: 40

window_size: 2

iterations: 3

random_seed: 597832

text_unit_prop: 0.5

community_prop: 0.1

conversation_history_max_turns: 5

top_k_mapped_entities: 10

top_k_relationships: 10

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

llm_temperature: 0 # temperature for sampling

llm_top_p: 1 # top-p sampling

llm_n: 1 # Number of completions to generate

max_tokens: 12000

data_max_tokens: 12000

map_max_tokens: 1000

reduce_max_tokens: 2000

concurrency: 32

Logs and screenshots

Additional Information

natoverse commented Jul 22, 2024