Skip to content

Commit

Permalink
Add point docs page
Browse files Browse the repository at this point in the history
  • Loading branch information
nuric committed Apr 19, 2024
1 parent dc7d487 commit bfda49f
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 3 deletions.
2 changes: 1 addition & 1 deletion docs/content/docs/concepts/collection.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Collection
weight: 1
weight: 2
---

Work in progress.
44 changes: 42 additions & 2 deletions docs/content/docs/concepts/point.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,46 @@
---
title: Point
weight: 2
weight: 1
---

Working in progress.
# Point / Document

A point, document is the smallest unit of data in SemaDB. It is a JSON object that is stored as part of a [collection]({{< ref "collection" >}}). A point can have any number of fields and the fields can be of any type. Usually there is a limit on the total size of the point that can be stored. If you are self-hosting you can adjust this limit in the configuration file, but it is recommended you have a limit. Here is an example point / document:

```json
{
"city": "Edinburgh",
"country": "Scotland",
"areaCode": 0131,
"population": 500000,
"embedding": [0.1, 0.2, 0.3, 0.4, 0.5,]
}
```

> While the fields can be of arbitrary types, the fields that are used for [indexing]({{< ref "indexing" >}}) need to be in the format the index expects. For example, if you have an integer index for population field, the population field should be an integer. If you have a text index for city field, the city field should be a string.
## Point ID

Each point has a unit [universally unique identifier](https://en.wikipedia.org/wiki/Universally_unique_identifier). This is used to identify the point in the collection. This ID may be provided using `_id` when inserting points or let the server generate UUID4 for you. If you provide an ID, ensure it is unique in the collection:

```json
{
"_id": "f7b3b3b4-3b7b-4b3b-8b3b-3b7b3b7b3b7b",
"...": "...",
}
```

**What if I have existing data with integer or other string IDs?** In this case we recommend you store it as an integer or string field in the point itself as well as storing the SemaDB point ID in the external database. This way you can easily link the data in the external database to the data in SemaDB. This creates a **cross-reference** between the external database and SemaDB:

```mermaid
graph TD
Point["Point / Document"] --> UUID["SemaDB UUID"]
UUID --> SemaDB[(SemaDB)]
Point --> External["External ID"]
UUID --> ExternalDB
External --> ExternalDB[(Database)]
External --> SemaDB
```

In this setup, you can pre-generate the UUIDs for the points and store them in the external database. When you insert the points into SemaDB, you can use the same UUIDs.

0 comments on commit bfda49f

Please sign in to comment.