You may decide to combine pgai with Timescaledb Background Actions.
and you can also use the pgai
extension to run background actions.
There are some security concerns when running background actions, specially because you may need to access some external service, like OpenAI API.
You should be aware of how you're storing your keys and how it can be accessed.
The background action should be run by a user that has the necessary permissions but it also needs to access some shared memory with the api key or receive it via payload.
If you store it in the payload, make sure you're using a secure connection and also that your secrets are renewed frequently.
You can also setup this settings at your server level, for example, you can add a new config to the system:
echo "ai.openai_api_key=${API_KEY}" >> /path/to/postgresql.conf
If you're running as admin, you can also use ALTER SYSTEM
to persist the configuration
directly from your session:
SET ai.openai_api_key TO :'OPENAI_API_KEY' ;
ALTER SYSTEM SET ai.openai_api_key TO :'OPENAI_API_KEY' ;
Let's imagine you're receiving documents and want to process the embedings in the
background. You can use the pgai
extension to run the embedding in the
background action and store the result later.
Want to run it? Check the full script here.
First install pgvectorscale extension:
CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;
You can create a function that will be called by the background action:
CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;
This table will store the documents and their embeddings:
CREATE TABLE IF NOT EXISTS document_embedding (
id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
metadata JSONB,
contents TEXT,
embedding VECTOR(1536)
);
Let's also use our DiskANN index to speed up the search:
CREATE INDEX document_embedding_idx ON document_embedding
USING diskann (embedding);
Now, we're going to build a background job with timescaledb action that will populate the embedding column with the embeddings of the contents:
ALTER SYSTEM
approach, you can remove the parameter.
CREATE OR REPLACE FUNCTION populate_embedding(job_id int, config jsonb) returns void as $$
DECLARE
r record;
api_key text;
BEGIN
api_key := config->>'api_key';
FOR r IN select id, contents from document_embedding where embedding IS NULL LIMIT 1 FOR UPDATE SKIP LOCKED LOOP
update document_embedding
set embedding = openai_embed('text-embedding-ada-002', r.contents, api_key)
SELECT id
where id = r.id;
END LOOP;
END;
$$ language plpgsql;
Schedule the job to run every 10 seconds:
SELECT add_job('populate_embedding','10s', fixed_schedule => true,
config => format('{"api_key": "%s"}', :'OPENAI_API_KEY')::jsonb);
Now you can insert a new document:
insert into document_embedding (contents) values
('Test document embedding');
Wait at least 10 seconds and then check the table:
select * from document_embedding;