v3.0.10: fix embedding token length bug
- was erroneously comparing token length to byte count in an assert
Break query into chunks, create embeddings, find similar chunks
summary of diff --git a/v3/core/chunk.go b/v3/core/chunk.go
- Break the query into chunks using `chunksFromString` instead of directly creating embeddings from the query
- Append each chunk's text to `queryStrings` array for embedding
- Create embeddings from the `queryStrings` array
- Check if the `embeddings` array is empty and return if true
- Calculate the average of the embeddings using `util.MeanVector`
- Use the averaged embedding to find the most similar chunks
Add comments to improve Cli, reference gitea/tea, consider urfave
summary of diff --git a/v3/core/cli.go b/v3/core/cli.go
- Add comments to the Cli function noting potential improvements by referencing gitea/tea's approach and considering the use of urfave over kong
Import envi, add token count checks, enhance debugging
summary of diff --git a/v3/core/document.go b/v3/core/document.go
- Import `github.com/stevegt/envi` to use environment variables
- Add checks to verify chunk text length using token count before setting chunks and for new chunks
- Assert that token count is below `g.embeddingTokenLimit` for both existing chunks and new chunks to prevent exceeding limits
- Utilize `envi.Bool` to conditionally perform debug checks based on the `DEBUG` environment variable being set
- Enhance debugging by ensuring chunk token counts do not exceed defined limits
Update grokker.go version from 3.0.9 to 3.0.10
summary of diff --git a/v3/core/grokker.go b/v3/core/grokker.go
- Update version from 3.0.9 to 3.0.10 in grokker.go
Remove comments and enable debug logging in createEmbeddings
summary of diff --git a/v3/core/openai.go b/v3/core/openai.go
- Remove unnecessary comments about exceeding max tokens in `createEmbeddings` function
- Enable debug logging for creating embeddings for each text chunk in `createEmbeddings` function
Move go-diff to own block; add envi v0.2.0 to require block
summary of diff --git a/v3/go.mod b/v3/go.mod
- Move `github.com/sergi/go-diff v1.3.1` into its own require block
- Add `github.com/stevegt/envi v0.2.0` to the require block
Add envi v0.2.0 and goadapt v0.0.13 module info to go.sum
summary of diff --git a/v3/go.sum b/v3/go.sum
- Add github.com/stevegt/envi v0.2.0 checksum and module information
- Add github.com/stevegt/goadapt v0.0.13 module information