Skip to content

Commit

Permalink
Add version two updates (#1)
Browse files Browse the repository at this point in the history
  • Loading branch information
nfmoore authored Mar 25, 2024
1 parent 19fe1f8 commit 4d0cae5
Show file tree
Hide file tree
Showing 17 changed files with 689 additions and 1,025 deletions.
Binary file modified .github/docs/images/image-01.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
31 changes: 18 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ To run this project, you need to configure the following environment variables.
- `AZURE_SUBSCRIPTION_ID`: The Azure subscription ID to use for the deployment. For example, `00000000-0000-0000-0000-000000000000`.
- `AZURE_RESOURCE_GROUP_NAME`: The name of the resource group to use for the deployment. For example, `my-resource-group`.
- `AZURE_OPENAI_API_BASE`: The base URL for the Azure OpenAI API. For example, `https://my-resource.openai.azure.com/`.
- `AZURE_OPENAI_API_VERSION`: The version of the Azure OpenAI API. You must set this to `2023-09-01-preview`.
- `AZURE_OPENAI_API_VERSION`: The version of the Azure OpenAI API. You must set this to `2023-12-01-preview`.
- `AZURE_OPENAI_API_TYPE`: The type of the Azure OpenAI API. You must set this to `azure`.
- `AZURE_OPENAI_CHAT_DEPLOYMENT`: The name of the Azure OpenAI deployment to use for chat. For example, `gpt-35-turbo-16k-0613`.
- `AZURE_OPENAI_CHAT_MODEL`: The name of the Azure OpenAI model to use for chat. For example, `gpt-35-turbo-16k`.
Expand All @@ -80,15 +80,20 @@ To run this project, you need to configure the following environment variables.

### Configure the Azure AI Search service

To run this project, you need to configure the Azure AI Search service. You can do this using the Azure portal or the Azure CLI. This will populate Azure AI Search with a data source, an index, an indexer, and a skillset.
To run this project, you need to configure the Azure AI Search service. You can do this using the Azure portal or the Azure CLI. This will populate Azure AI Search with a data source, an index, an indexer, and a skillset.

All templates are provided in the `src/search/templates` folder and values for the variables, for example `{{ AZURE_OPENAI_API_BASE }}` are populated based on the environment variables.
All templates are provided in the `src/search/templates/product-info` folder and values for the variables, for example `{{ AZURE_OPENAI_API_BASE }}` are populated based on the environment variables.

To create these artifacts to configure the Azure AI Search service, you can use the following command:
To create these artifacts to configure the Azure AI Search service you can run the notebook `src/01-populate-index.ipynb`.

```bash
python -m ./src/search/main.py --search_templates_dir ./src/search/templates/
```
### Query the Azure AI Search service

This notebook illistrates two appraoches to query the Azure AI Search service:

1. Using a custom client implementing the retreival-augmented generation (RAG) pattern.
2. Using the Azure Open AI REST API.

To query the Azure AI Search service, you can run the notebook `src/02-query-index.ipynb`.

### Run streamlit app

Expand All @@ -106,13 +111,13 @@ If you want to deploy this app to Azure, you can containerise it using the `Dock

## Resources

- [Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/)
- [Azure AI Search](https://learn.microsoft.com/en-us/azure/search/)
- [Azure OpenAI](https://learn.microsoft.com/azure/ai-services/openai/)
- [Azure AI Search](https://learn.microsoft.com/azure/search/)
- [Streamlit](https://streamlit.io/)
- [Azure OpenAI Service REST API reference](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference)
- [Securely use Azure OpenAI on your data](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/use-your-data-securely)
- [Introduction to prompt engineering](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/prompt-engineering)
- [Prompt engineering techniques](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering?pivots=programming-language-chat-completions)
- [Azure OpenAI Service REST API reference](https://learn.microsoft.com/azure/ai-services/openai/reference)
- [Securely use Azure OpenAI on your data](https://learn.microsoft.com/azure/ai-services/openai/how-to/use-your-data-securely)
- [Introduction to prompt engineering](https://learn.microsoft.com/azure/ai-services/openai/concepts/prompt-engineering)
- [Prompt engineering techniques](https://learn.microsoft.com/azure/ai-services/openai/concepts/advanced-prompt-engineering?pivots=programming-language-chat-completions)

## License

Expand Down
3 changes: 2 additions & 1 deletion environment/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ ipykernel==6.29.2
Jinja2==3.1.3
requests==2.31.0
streamlit-extras==0.4.0
azure-identity==1.15.0
azure-identity==1.15.0
nltk==3.8.1
158 changes: 158 additions & 0 deletions notebooks/01-populate-index.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Populate Azure AI Search Index"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import dotenv\n",
"import sys\n",
"\n",
"dotenv.load_dotenv(\".env\")\n",
"sys.path.append(os.path.join(os.getcwd(), \"..\", \"src\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Approach 1: Pull-based\n",
"\n",
"The pull model uses indexers connecting to a supported data source, automatically uploading the data into your index. This is the recommended approach for data sources that are frequently updated."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from search.utilities import SearchClient\n",
"\n",
"# Create search client\n",
"search_client = SearchClient(\n",
" search_endpoint=os.environ[\"AZURE_AI_SEARCH_ENDPOINT\"],\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Generate list of variables to be used in templates\n",
"template_variables = {\n",
" key: value for key, value in os.environ.items() if key.startswith((\"AZURE\"))\n",
"}\n",
"\n",
"# Define template paths\n",
"base_path = os.path.join(os.getcwd(), \"..\", \"src\", \"search\", \"templates\")\n",
"datasource_template_path = os.path.join(base_path, \"product-info\", \"datasource.json\")\n",
"index_template_path = os.path.join(base_path, \"product-info\", \"index.json\")\n",
"skillset_template_path = os.path.join(base_path, \"product-info\", \"skillset.json\")\n",
"indexer_template_path = os.path.join(base_path, \"product-info\", \"indexer.json\")\n",
"\n",
"# List of search assets\n",
"assets = [\n",
" {\n",
" \"type\": \"indexes\",\n",
" \"name\": os.environ[\"AZURE_AI_SEARCH_INDEX_NAME\"],\n",
" \"template_path\": index_template_path,\n",
" \"template_variables\": template_variables,\n",
" },\n",
" {\n",
" \"type\": \"datasources\",\n",
" \"name\": os.environ[\"AZURE_AI_SEARCH_DATASOURCE_NAME\"],\n",
" \"template_path\": datasource_template_path,\n",
" \"template_variables\": template_variables,\n",
" },\n",
" {\n",
" \"type\": \"skillsets\",\n",
" \"name\": os.environ[\"AZURE_AI_SEARCH_SKILLSET_NAME\"],\n",
" \"template_path\": skillset_template_path,\n",
" \"template_variables\": template_variables,\n",
" },\n",
" {\n",
" \"type\": \"indexers\",\n",
" \"name\": os.environ[\"AZURE_AI_SEARCH_INDEXER_NAME\"],\n",
" \"template_path\": indexer_template_path,\n",
" \"template_variables\": template_variables,\n",
" },\n",
"]\n",
"\n",
"# Load search asset templates\n",
"search_client.load_search_management_asset_templates(assets)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Create the index\n",
"index_response = search_client.create_search_management_asset(asset_type=\"indexes\")\n",
"\n",
"# Create the data source\n",
"datasource_response = search_client.create_search_management_asset(asset_type=\"datasources\")\n",
"\n",
"# Create skillset to enhance the indexer\n",
"skillset_response = search_client.create_search_management_asset(asset_type=\"skillsets\")\n",
"\n",
"# Create the indexer\n",
"indexer_response = search_client.create_search_management_asset(asset_type=\"indexers\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# Run the indexer\n",
"indexer_run_response = search_client.run_indexer()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# Run the indexer with reset\n",
"indexer_run_reset_response = search_client.run_indexer(reset_flag=True)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "base",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
101 changes: 36 additions & 65 deletions notebooks/rag-orchestrator.ipynb → notebooks/02-llm-queries.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Generate model response"
"# LLM Queries with Knowledge Base Integration"
]
},
{
Expand All @@ -25,7 +25,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Custom RAG Queries"
"### Approach 1: Custom Client\n",
"\n",
"This approach will use the `RetrievalAugmentedGenerationClient` class defined in `src/rag/utilities.py`. This will NOT require a Microsoft managed private endpoint for private access."
]
},
{
Expand All @@ -34,15 +36,16 @@
"metadata": {},
"outputs": [],
"source": [
"from orchestration.utilities import OrchestrationClient\n",
"from rag.utilities import RetrievalAugmentedGenerationClient\n",
"\n",
"# Create orchestration client\n",
"orchestration_client = OrchestrationClient(\n",
"rag_client = RetrievalAugmentedGenerationClient(\n",
" open_ai_endpoint=os.getenv(\"AZURE_OPENAI_API_BASE\"),\n",
" open_ai_chat_deployment=os.getenv(\"AZURE_OPENAI_CHAT_DEPLOYMENT\"),\n",
" open_ai_embedding_deployment=os.getenv(\"AZURE_OPENAI_EMBEDDING_DEPLOYMENT\"),\n",
" search_endpoint=os.getenv(\"AZURE_AI_SEARCH_ENDPOINT\"),\n",
" search_index_name=os.getenv(\"AZURE_AI_SEARCH_INDEX_NAME\"),\n",
" system_prompt_configuration_file=\"../src/rag/configuration.yaml\"\n",
")"
]
},
Expand All @@ -52,15 +55,8 @@
"metadata": {},
"outputs": [],
"source": [
"# Generate chat response from initial user query\n",
"chat_history = {\n",
" \"messages\": [\n",
" {\"role\": \"user\", \"content\": \"Which tent is the most waterproof?\"},\n",
" ]\n",
"}\n",
"\n",
"chat_history = orchestration_client.generate_chat_response(chat_history)\n",
"print(chat_history[\"messages\"][-1][\"content\"])"
"message_history = []\n",
"message_history = rag_client.get_answer(\"Which tent is the most waterproof?\", message_history=message_history)"
]
},
{
Expand All @@ -69,22 +65,38 @@
"metadata": {},
"outputs": [],
"source": [
"# Generate chat response from follow-up user query\n",
"chat_history[\"messages\"].append(\n",
" {\"role\": \"user\", \"content\": \"Tell me more about the Alpine Explorer Tent?\"}\n",
")\n",
"\n",
"chat_history = orchestration_client.generate_chat_response(chat_history)\n",
"print(chat_history[\"messages\"][-1][\"content\"])"
"for message in message_history:\n",
" content = message['content'].split(\"Sources:\")[0].strip()\n",
" print(f\"{message['role'].title()}: {content}\\n\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"message_history = rag_client.get_answer(\"Tell me more about the Alpine Explorer Tent?\", message_history=message_history)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for message in message_history:\n",
" content = message['content'].split(\"Sources:\")[0].strip()\n",
" print(f\"{message['role'].title()}: {content}\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Azure OpenAI Service REST API\n",
"### Approach 2: Azure OpenAI Service REST API\n",
"\n",
"Note: this will require public access on Azure AI Search or a Microsft managed private endpoint for private access."
"This will require public access on Azure AI Search or a Microsoft managed private endpoint for private access."
]
},
{
Expand Down Expand Up @@ -134,48 +146,7 @@
" json=request_payload,\n",
")\n",
"\n",
"print(response.json())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"text = \"Based on the information provided, both the Alpine Explorer Tent and the TrailMaster X4 Tent are waterproof. The Alpine Explorer Tent has a rainfly with a waterproof rating of 3000mm [product_info_8.md], while the TrailMaster X4 Tent has a rainfly with a waterproof rating of 2000mm [product_info_1.md]. Therefore, both tents offer reliable protection against rain and moisture.\"\n",
"\n",
"text"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import re\n",
"\n",
"text = (\n",
" \"This document refers to [product_info_1.md] and [another_file.md]. More text here.\"\n",
")\n",
"\n",
"def replace_references(text: str) -> str:\n",
" # Regex to match references in the format [*.md]\n",
" regex = r\"\\[([^\\]]*.md)\\]\"\n",
"\n",
" # Replace matched references with modified references (appending \":blue\")\n",
" modified_text = re.sub(regex, r\"*:blue[\\1]*\", text)\n",
"\n",
" return modified_text\n",
"\n",
"# Regex to match references in the format [*.md]\n",
"regex = r\"\\[([^\\]]*.md)\\]\"\n",
"\n",
"# Replace matched references with modified references (appending \":blue\")\n",
"modified_text = re.sub(regex, r\"*:blue[\\1]*\", text)\n",
"\n",
"print(modified_text)"
"print(response.json()[\"choices\"][0][\"message\"][\"content\"])"
]
},
{
Expand All @@ -202,7 +173,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.7"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 4d0cae5

Please sign in to comment.