Skip to content

Commit

Permalink
Merge branch 'main' into feat/optimize-community-reports
Browse files Browse the repository at this point in the history
  • Loading branch information
AlonsoGuevara authored Nov 30, 2024
2 parents 22ce24c + dad2176 commit a5c970a
Show file tree
Hide file tree
Showing 104 changed files with 1,238 additions and 523 deletions.
4 changes: 4 additions & 0 deletions .semversioner/next-release/patch-20241126215650769602.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"type": "patch",
"description": "Fix question gen."
}
4 changes: 4 additions & 0 deletions .semversioner/next-release/patch-20241127084633163555.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"type": "patch",
"description": "miscellaneous code cleanup and minor changes for better alignment of style across the codebase."
}
2 changes: 2 additions & 0 deletions dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ getcwd
fillna
noqa
dtypes
ints

# Azure
abfs
Expand Down Expand Up @@ -167,6 +168,7 @@ FIRUZABAD
Krohaara
KROHAARA
POKRALLY
René
Tazbah
TIRUZIA
Tiruzia
Expand Down
6 changes: 6 additions & 0 deletions docs/blog_posts.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,10 @@

By Bryan Li, Research Intern; [Ha Trinh](https://www.microsoft.com/en-us/research/people/trinhha/), Senior Data Scientist; [Darren Edge](https://www.microsoft.com/en-us/research/people/daedge/), Senior Director; [Jonathan Larson](https://www.microsoft.com/en-us/research/people/jolarso/), Senior Principal Data Architect</h6>

- [:octicons-arrow-right-24: __LazyGraphRAG: Setting a new standard for quality and cost__](https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/)

---
<h6>Published November 25, 2024

By [Darren Edge](https://www.microsoft.com/en-us/research/people/daedge/), Senior Director; [Ha Trinh](https://www.microsoft.com/en-us/research/people/trinhha/), Senior Data Scientist; [Jonathan Larson](https://www.microsoft.com/en-us/research/people/jolarso/), Senior Principal Data Architect</h6>
</div>
4 changes: 2 additions & 2 deletions docs/config/env_vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Text-Embeddings Customization

By default, the GraphRAG indexer will only emit embeddings required for our query methods. However, the model has embeddings defined for all plaintext fields, and these can be generated by setting the `GRAPHRAG_EMBEDDING_TARGET` environment variable to `all`.
By default, the GraphRAG indexer will only export embeddings required for our query methods. However, the model has embeddings defined for all plaintext fields, and these can be generated by setting the `GRAPHRAG_EMBEDDING_TARGET` environment variable to `all`.

If the embedding target is `all`, and you want to only embed a subset of these fields, you may specify which embeddings to skip using the `GRAPHRAG_EMBEDDING_SKIP` argument described below.

Expand Down Expand Up @@ -152,7 +152,7 @@ These settings control the data input used by the pipeline. Any settings with a

## Storage

This section controls the storage mechanism used by the pipeline used for emitting output tables.
This section controls the storage mechanism used by the pipeline used for exporting output tables.

| Parameter | Description | Type | Required or Optional | Default |
| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----- | -------------------- | ------- |
Expand Down
14 changes: 7 additions & 7 deletions docs/config/yaml.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ This is the base LLM configuration section. Other steps may override this config
- `async_mode` (see Async Mode top-level config)
- `batch_size` **int** - The maximum batch size to use.
- `batch_max_tokens` **int** - The maximum batch # of tokens.
- `target` **required|all|none** - Determines which set of embeddings to emit.
- `target` **required|all|none** - Determines which set of embeddings to export.
- `skip` **list[str]** - Which embeddings to skip. Only useful if target=all to customize the list.
- `vector_store` **dict** - The vector store to use. Configured for lancedb by default.
- `type` **str** - `lancedb` or `azure_ai_search`. Default=`lancedb`
Expand Down Expand Up @@ -203,7 +203,7 @@ This is the base LLM configuration section. Other steps may override this config

#### Fields

- `max_cluster_size` **int** - The maximum cluster size to emit.
- `max_cluster_size` **int** - The maximum cluster size to export.
- `strategy` **dict** - Fully override the cluster_graph strategy.

### embed_graph
Expand All @@ -228,11 +228,11 @@ This is the base LLM configuration section. Other steps may override this config

#### Fields

- `embeddings` **bool** - Emit embeddings snapshots to parquet.
- `graphml` **bool** - Emit graph snapshots to GraphML.
- `raw_entities` **bool** - Emit raw entity snapshots to JSON.
- `top_level_nodes` **bool** - Emit top-level-node snapshots to JSON.
- `transient` **bool** - Emit transient workflow tables snapshots to parquet.
- `embeddings` **bool** - Export embeddings snapshots to parquet.
- `graphml` **bool** - Export graph snapshots to GraphML.
- `raw_entities` **bool** - Export raw entity snapshots to JSON.
- `top_level_nodes` **bool** - Export top-level-node snapshots to JSON.
- `transient` **bool** - Export transient workflow tables snapshots to parquet.

### encoding_model

Expand Down
30 changes: 16 additions & 14 deletions docs/examples_notebooks/global_search.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,9 @@
"source": [
"### Load community reports as context for global search\n",
"\n",
"- Load all community reports in the `create_final_community_reports` table from the ire-indexing engine, to be used as context data for global search.\n",
"- Load entities from the `create_final_nodes` and `create_final_entities` tables from the ire-indexing engine, to be used for calculating community weights for context ranking. Note that this is optional (if no entities are provided, we will not calculate community weights and only use the rank attribute in the community reports table for context ranking)\n",
"- Load all communities in the `create_final_communites` table from the ire-indexing engine, to be used to reconstruct the community graph hierarchy for dynamic community selection."
"- Load all community reports in the `create_final_community_reports` table from the GraphRAG, to be used as context data for global search.\n",
"- Load entities from the `create_final_nodes` and `create_final_entities` tables from the GraphRAG, to be used for calculating community weights for context ranking. Note that this is optional (if no entities are provided, we will not calculate community weights and only use the rank attribute in the community reports table for context ranking)\n",
"- Load all communities in the `create_final_communites` table from the GraphRAG, to be used to reconstruct the community graph hierarchy for dynamic community selection."
]
},
{
Expand Down Expand Up @@ -379,21 +379,23 @@
"text": [
"### Overview of Cosmic Vocalization\n",
"\n",
"Cosmic Vocalization is a phenomenon that has garnered significant attention from various individuals and groups. It is perceived as a cosmic event with potential implications for security and interstellar communication. The Paranormal Military Squad is actively engaged with Cosmic Vocalization, indicating its strategic importance in security measures [Data: Reports (6)].\n",
"Cosmic Vocalization is a phenomenon that has garnered significant attention within the community, involving various individuals and groups. It is perceived as an interstellar event with potential implications for both communication and security.\n",
"\n",
"### Key Perspectives and Concerns\n",
"### Key Perspectives\n",
"\n",
"1. **Strategic Engagement**: The Paranormal Military Squad's involvement suggests that Cosmic Vocalization is not only a subject of interest but also a matter of strategic importance. This engagement highlights the potential security implications of these cosmic phenomena [Data: Reports (6)].\n",
"**Alex Mercer's Viewpoint** \n",
"Alex Mercer perceives Cosmic Vocalization as part of an interstellar duet, suggesting that it may be a responsive or communicative event. This perspective highlights the potential for Cosmic Vocalization to be part of a larger cosmic interaction or dialogue [Data: Reports (6)].\n",
"\n",
"2. **Community Interest**: Within the community, Cosmic Vocalization is a focal point of interest. Alex Mercer, for instance, perceives it as part of an interstellar duet, which suggests a responsive and perhaps communicative approach to these cosmic events [Data: Reports (6)].\n",
"**Taylor Cruz's Concerns** \n",
"Taylor Cruz raises concerns about the nature of Cosmic Vocalization, fearing it might be a homing tune. This adds a layer of urgency and potential threat, as it suggests that the vocalization could be attracting attention from unknown entities or forces [Data: Reports (6)].\n",
"\n",
"3. **Potential Threats**: Concerns have been raised by individuals like Taylor Cruz, who fears that Cosmic Vocalization might be a homing tune. This perspective adds a layer of urgency and suggests that there may be potential threats associated with these cosmic sounds [Data: Reports (6)].\n",
"### Involvement of the Paranormal Military Squad\n",
"\n",
"### Metaphorical Interpretation\n",
"The Paranormal Military Squad is actively engaged with Cosmic Vocalization, indicating its significance in security measures. Their involvement suggests that the phenomenon is not only of scientific interest but also of strategic importance, potentially impacting national or global security [Data: Reports (6)].\n",
"\n",
"The Universe is metaphorically treated as a concert hall by the Paranormal Military Squad, which suggests a broader perspective on how cosmic events are interpreted and responded to by human entities. This metaphorical view may influence how strategies and responses are formulated in relation to Cosmic Vocalization [Data: Reports (6)].\n",
"### Conclusion\n",
"\n",
"In summary, Cosmic Vocalization is a complex phenomenon involving strategic, communicative, and potentially threatening elements. The involvement of the Paranormal Military Squad and the concerns raised by community members underscore its significance and the need for careful consideration of its implications.\n"
"Cosmic Vocalization is a complex and multifaceted phenomenon that involves various stakeholders, each with their own perspectives and concerns. The involvement of both individuals like Alex Mercer and Taylor Cruz, as well as organized groups like the Paranormal Military Squad, underscores its importance and the need for further investigation and understanding.\n"
]
}
],
Expand Down Expand Up @@ -638,7 +640,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"LLM calls: 2. Prompt tokens: 11292. Output tokens: 606.\n"
"LLM calls: 2. Prompt tokens: 11237. Output tokens: 483.\n"
]
}
],
Expand All @@ -652,7 +654,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "graphrag",
"display_name": ".venv",
"language": "python",
"name": "python3"
},
Expand All @@ -666,7 +668,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
"version": "3.11.9"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit a5c970a

Please sign in to comment.