Skip to content

Commit

Permalink
[BUG]: Chat with your documents example exhibits flaky retrieval (#1203)
Browse files Browse the repository at this point in the history
## Description of changes

*Summarize the changes made by this PR.*
In #1115 @BChip noticed
flaky retrieval performance.

The issue was difficult to replicate because of nondeterminism inherent
in the HNSW graph construction on loading, but I was able to track it
down through repeated testing.

The issue is caused by ingesting all the empty lines in the document,
which make up 50% of all the lines in each file, which outputs the same
embedding for all of them, causing the HNSW graph to sometimes be
degenerate.

The fix is to skip the empty lines. We should consider how we can
mitigate this in the future since this is not easy to detect after the
fact, and is likely to be something users run into.

## Test plan

Failures no longer occur after manual invocation. 

## Documentation Changes
N/A
  • Loading branch information
atroyn authored Oct 4, 2023
1 parent de2f05a commit fc4c8b5
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions examples/chat_with_your_documents/load_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@ def main(
):
# Strip whitespace and append the line to the documents list
line = line.strip()
# Skip empty lines
if len(line) == 0:
continue
documents.append(line)
metadatas.append({"filename": filename, "line_number": line_number})

Expand Down

0 comments on commit fc4c8b5

Please sign in to comment.