Skip to content

Commit

Permalink
Small fixes to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
maks-operlejn-ds committed Oct 12, 2023
1 parent e965ccd commit 02854a5
Showing 1 changed file with 4 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"source": [
"# QA with private data protection\n",
"\n",
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/use_cases/question_answering/qa_privacy_protection.ipynb)\n",
"[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/guides/privacy/presidio_data_anonymization/qa_privacy_protection.ipynb)\n",
"\n",
"\n",
"In this notebook, we will look at building a basic system for question answering, based on private data. Before feeding the LLM with this data, we need to protect it so that it doesn't go to an external API (e.g. OpenAI, Anthropic). Then, after receiving the model output, we would like the data to be restored to its original form. Below you can observe an example flow of this QA system:\n",
Expand Down Expand Up @@ -643,6 +643,8 @@
"from langchain.vectorstores import FAISS\n",
"\n",
"# 2. Load the data: In our case data's already loaded\n",
"documents = [Document(page_content=document_content)]\n",
"\n",
"# 3. Anonymize the data before indexing\n",
"for doc in documents:\n",
" doc.page_content = anonymizer.anonymize(doc.page_content)\n",
Expand Down Expand Up @@ -839,6 +841,7 @@
"metadata": {},
"outputs": [],
"source": [
"documents = [Document(page_content=document_content)]\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)\n",
"chunks = text_splitter.split_documents(documents)\n",
"\n",
Expand Down

0 comments on commit 02854a5

Please sign in to comment.