-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #91 from cohere-ai/rag-connector
Add example RAG with quickstart connectors
- Loading branch information
Showing
11 changed files
with
2,376 additions
and
0 deletions.
There are no files selected for viewing
332 changes: 332 additions & 0 deletions
332
examples/chat_rag_quickstart_connector/chat-rag-part-5.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# For Service Account Authentication | ||
GDRIVE_SERVICE_ACCOUNT_INFO='YOUR_GDRIVE_SERVICE_ACCOUNT_INFO' | ||
GDRIVE_CONNECTOR_API_KEY="YOUR_GDRIVE_CONNECTOR_API_KEY" | ||
# Optional Configuration | ||
# GDRIVE_FOLDER_ID= | ||
# GDRIVE_SEARCH_LIMIT= |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
credentials.json | ||
token.json | ||
|
56 changes: 56 additions & 0 deletions
56
examples/chat_rag_quickstart_connector/gdrive/.openapi/api.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
openapi: 3.0.3 | ||
info: | ||
title: Search Connector API | ||
version: 0.0.1 | ||
paths: | ||
/search: | ||
post: | ||
description: >- | ||
<p>Searches the connected data source for documents related to the query and returns a set of key-value pairs representing the found documents.</p> | ||
operationId: search | ||
summary: Perform a search | ||
security: | ||
- api_key: [] | ||
requestBody: | ||
required: true | ||
content: | ||
application/json: | ||
schema: | ||
type: object | ||
required: | ||
- query | ||
properties: | ||
query: | ||
description: >- | ||
A plain-text query string to be used to search for relevant documents. | ||
type: string | ||
minLength: 1 | ||
example: | ||
query: embeddings | ||
responses: | ||
"200": | ||
description: Successful response | ||
content: | ||
application/json: | ||
schema: | ||
type: object | ||
properties: | ||
results: | ||
type: array | ||
items: | ||
type: object | ||
additionalProperties: | ||
type: string | ||
"400": | ||
description: Bad request | ||
"401": | ||
description: Unauthorized | ||
default: | ||
description: Error response | ||
|
||
components: | ||
securitySchemes: | ||
api_key: | ||
type: http | ||
scheme: bearer | ||
x-bearerInfoFunc: provider.app.apikey_auth |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# Demo Google Drive Connector | ||
|
||
This is a demo Google Drive connector used in LLM University's Chat with Retrieval-Augmented Generation module. Read the [article](https://txt.cohere.com/rag-chatbot-connector/) to learn more. | ||
|
||
## Limitations | ||
|
||
Currently this connector can only search for Google Docs, Google Sheets, and Google Slide files. It does not support searching for other file types. | ||
|
||
## Authentication | ||
|
||
This connector supports two types of authentication: Service Account and OAuth. | ||
|
||
### Service Account | ||
|
||
For service account authentication this connector requires two environment variables: | ||
|
||
#### `GDRIVE_SERVICE_ACCOUNT_INFO` | ||
|
||
The `GDRIVE_SERVICE_ACCOUNT_INFO` variable should contain the JSON content of the service account credentials file. To get the credentials file, follow these steps: | ||
|
||
1. [Create a project in Google Cloud Console](https://cloud.google.com/resource-manager/docs/creating-managing-projects). | ||
2. [Create a service account](https://cloud.google.com/iam/docs/creating-managing-service-accounts) and [activate the Google Drive API](https://console.cloud.google.com/apis/api/drive.googleapis.com) in the Google Cloud Console. | ||
3. [Create a service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) and download the credentials file as JSON. The credentials file should look like this: | ||
|
||
```json | ||
{ | ||
"type": "service_account", | ||
"project_id": "{project id}", | ||
"private_key_id": "{private_key_id}", | ||
"private_key": "{private_key}", | ||
"client_email": "{client_email}", | ||
"client_id": "{client_id}", | ||
"auth_uri": "{auth_uri}", | ||
"token_uri": "{token_uri}", | ||
"auth_provider_x509_cert_url": "{auth_provider_x509_cert_url}", | ||
"client_x509_cert_url": "{client_x509_cert_url}", | ||
"universe_domain": "{universe_domain}" | ||
} | ||
``` | ||
|
||
4. Convert the JSON credentails to a string through `json.dumps(credentials)` and save the result in the `GDRIVE_SERVICE_ACCOUNT_INFO` environment variable. | ||
5. Make sure to [share the folder(s) you want to search with the service account email address](https://support.google.com/a/answer/7337554?hl=en). | ||
|
||
#### `GDRIVE_CONNECTOR_API_KEY` | ||
|
||
The `GDRIVE_CONNECTOR_API_KEY` should contain an API key for the connector. This value must be present in the `Authorization` header for all requests to the connector. | ||
|
||
### OAuth | ||
|
||
When using OAuth for authentication, the connector does not require any additional environment variables. Instead, the OAuth flow should occur outside of the Connector and Cohere's API will forward the user's access token to this connector through the `Authorization` header. | ||
|
||
With OAuth the connector will be able to search any Google Drive folders that the user has access to. | ||
|
||
## Optional Configuration | ||
|
||
This connector also supports a few optional environment variables to configure the search: | ||
|
||
1. `GDRIVE_SEARCH_LIMIT` - Number of results to return. Default is 10. | ||
2. `GDRIVE_FOLDER_ID` - ID of the folder to search in. If not provided, the search will be performed in the whole drive. | ||
|
||
## Development | ||
|
||
Create a virtual environment and install dependencies with poetry. We recommend using in-project virtual environments: | ||
|
||
```bash | ||
$ poetry config virtualenvs.in-project true | ||
$ poetry install --no-root | ||
``` | ||
|
||
Next, start up the search connector server: | ||
|
||
```bash | ||
$ poetry shell | ||
$ flask --app provider --debug run --port 5000 | ||
``` | ||
|
||
and check with curl to see that everything works: | ||
|
||
```bash | ||
$ curl --request POST \ | ||
--url http://localhost:5000/search \ | ||
--header 'Content-Type: application/json' \ | ||
--data '{ | ||
"query": "stainless propane griddle" | ||
}' | ||
``` | ||
|
||
Alternatively, load up the Swagger UI and try out the API from a browser: http://localhost:5000/ui/ |
Oops, something went wrong.