Release 0.9

Refs #192, #209, #211, #213, #215, #217, #218, #219, #222 Closes #205
simonw · Sep 4, 2023 · 5efb300 · 5efb300
1 parent e6e1da3
commit 5efb300
Show file tree

Hide file tree

Showing 5 changed files with 81 additions and 7 deletions.
diff --git a/docs/changelog.md b/docs/changelog.md
@@ -1,5 +1,35 @@
 # Changelog
 
+(v0_9)=
+## 0.9 (2023-09-03)
+
+The big new feature in this release is support for **embeddings**.
+
+{ref}`Embedding models <embeddings>` take a piece of text - a word, sentence, paragraph or even a whole article, and convert that into an array of floating point numbers. [#185](https://github.com/simonw/llm/issues/185)
+
+This embedding vector can be thought of as representing a position in many-dimensional-space, where the distance between two vectors represents how semantically similar they are to each other within the content of a language model.
+
+Embeddings can be used to find **related documents**, and also to implement **semantic search** - where a user can search for a phrase and get back results that are semantically similar to that phrase even if they do not share any exact keywords.
+
+LLM now provides both CLI and Python APIs for working with embeddings. Embedding models are defined by plugins, so you can install additional models using the {ref}`plugins mechanism <installing-plugins>`.
+
+The first two embedding models supported by LLM are:
+
+- OpenAI's [ada-002](https://platform.openai.com/docs/guides/embeddings) embedding model, available via an inexpensive API if you set an OpenAI key using `llm keys set openai`.
+- The [sentence-transformers](https://www.sbert.net/) family of models, available via the new [llm-sentence-transformers](https://github.com/simonw/llm-sentence-transformers) plugin.
+
+See {ref}`embeddings-cli` for detailed instructions on working with embeddings using LLM.
+
+The new commands for working with embeddings are:
+
+- **{ref}`llm embed <embeddings-cli-embed>`** - calculate embeddings for content and return them to the console or store them in a SQLite database.
+- **{ref}`llm embed-multi <embeddings-cli-embed-multi>`** - run bulk embeddings for multiple strings, using input from a CSV, TSV or JSON file, data from a SQLite database or data found by scanning the filesystem. [#215](https://github.com/simonw/llm/issues/215)
+- **{ref}`llm similar <embeddings-cli-similar>`** - run similarity searches against your stored embeddings - starting with a search phrase or finding content related to a previously stored vector. [#190](https://github.com/simonw/llm/issues/190)
+- **{ref}`llm embed-models <embeddings-cli-embed-models>`** - list available embedding models.
+- **{ref}`llm embed-db <help-embed-db>`** - commands for inspecting and working with the default embeddings SQLite database.
+
+There's also a new {ref}`llm.Collection <embeddings-python-collections>` class for creating and searching collections of embedding from Python code, and a {ref}`llm.get_embedding_model() <embeddings-python-api>` interface for embedding strings directly. [#191](https://github.com/simonw/llm/issues/191)
+
 (v0_8_1)=
 ## 0.8.1 (2023-08-31)
 

diff --git a/docs/embeddings/cli.md b/docs/embeddings/cli.md
@@ -3,7 +3,7 @@
 
 LLM provides command-line utilities for calculating and storing embeddings for pieces of content.
 
-(embeddings-llm-embed)=
+(embeddings-cli-embed)=
 ## llm embed
 
 The `llm embed` command can be used to calculate embedding vectors for a string of content. These can be returned directly to the terminal, stored in a SQLite database, or both.
@@ -110,7 +110,7 @@ llm similar phrases -c 'hound'
 {"id": "hound", "score": 0.8484683588631485, "content": "my happy hound", "metadata": {"name": "Hound"}}
 ```
 
-(embeddings-llm-embed-multi)=
+(embeddings-cli-embed-multi)=
 ## llm embed-multi
 
 The `llm embed` command embeds a single string at a time.
@@ -130,7 +130,7 @@ All three mechanisms support these options:
 - `--store` to store the original content in the embeddings table in addition to the embedding vector
 - `--prefix` to prepend a prefix to the stored ID of each item
 
-(embeddings-llm-embed-multi-csv-etc)=
+(embeddings-cli-embed-multi-csv-etc)=
 ### Embedding data from a CSV, TSV or JSON file
 
 You can embed data from a CSV, TSV or JSON file using the `-i/--input` option.
@@ -188,7 +188,7 @@ llm embed-multi items \
   --store
 ```
 
-(embeddings-llm-embed-multi-sqlite)=
+(embeddings-cli-embed-multi-sqlite)=
 ### Embedding data from a SQLite database
 
 You can embed data from a SQLite database using `--sql`, optionally combined with `--attach` to attach an additional database.
@@ -213,7 +213,7 @@ llm embed-multi docs \
   -m ada-002
 ```
 
-(embeddings-llm-embed-multi-directories)=
+(embeddings-cli-embed-multi-directories)=
 ### Embedding data from files in directories
 
 LLM can embed the content of every text file in a specified directory, using the file's path and name as the ID.

diff --git a/docs/embeddings/writing-plugins.md b/docs/embeddings/writing-plugins.md
@@ -37,7 +37,7 @@ class SentenceTransformerModel(llm.EmbeddingModel):
         results = self._model.encode(texts)
         return (list(map(float, result)) for result in results)
 ```
-Once installed, the model provided by this plugin can be used with the {ref}`llm embed <embeddings-llm-embed>` command like this:
+Once installed, the model provided by this plugin can be used with the {ref}`llm embed <embeddings-cli-embed>` command like this:
 
 ```bash
 cat file.txt | llm embed -m sentence-transformers/all-MiniLM-L6-v2