Skip to content

Commit

Permalink
NumPy decoding docs, plus extra tests for llm.encode/decode
Browse files Browse the repository at this point in the history
  • Loading branch information
simonw committed Sep 14, 2023
1 parent d70c0db commit 356fcb7
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 0 deletions.
9 changes: 9 additions & 0 deletions docs/embeddings/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,12 @@ def decode(binary):
```

These functions are available as `llm.encode()` and `llm.decode()`.

If you are using [NumPy](https://numpy.org/) you can decode one of these binary values like this:

```python
import numpy as np

numpy_array = np.frombuffer(value, "<f4")
```
The `<f4` format string here ensures NumPy will treat the data as a little-endian sequence of 32-bit floats.
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ def get_long_description():
extras_require={
"test": [
"pytest",
"numpy",
"requests-mock",
"cogapp",
"mypy",
Expand Down
19 changes: 19 additions & 0 deletions tests/test_encode_decode.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
import llm
import pytest
import numpy as np


@pytest.mark.parametrize(
"array",
(
(0.0, 1.0, 1.5),
(3423.0, 222.0, -1234.5),
),
)
def test_roundtrip(array):
encoded = llm.encode(array)
decoded = llm.decode(encoded)
assert decoded == array
# Try with numpy as well
numpy_decoded = np.frombuffer(encoded, "<f4")
assert tuple(numpy_decoded.tolist()) == array

0 comments on commit 356fcb7

Please sign in to comment.