-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
43 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
--- | ||
title: Collection | ||
weight: 1 | ||
weight: 2 | ||
--- | ||
|
||
Work in progress. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,46 @@ | ||
--- | ||
title: Point | ||
weight: 2 | ||
weight: 1 | ||
--- | ||
|
||
Working in progress. | ||
# Point / Document | ||
|
||
A point, document is the smallest unit of data in SemaDB. It is a JSON object that is stored as part of a [collection]({{< ref "collection" >}}). A point can have any number of fields and the fields can be of any type. Usually there is a limit on the total size of the point that can be stored. If you are self-hosting you can adjust this limit in the configuration file, but it is recommended you have a limit. Here is an example point / document: | ||
|
||
```json | ||
{ | ||
"city": "Edinburgh", | ||
"country": "Scotland", | ||
"areaCode": 0131, | ||
"population": 500000, | ||
"embedding": [0.1, 0.2, 0.3, 0.4, 0.5,] | ||
} | ||
``` | ||
|
||
> While the fields can be of arbitrary types, the fields that are used for [indexing]({{< ref "indexing" >}}) need to be in the format the index expects. For example, if you have an integer index for population field, the population field should be an integer. If you have a text index for city field, the city field should be a string. | ||
## Point ID | ||
|
||
Each point has a unit [universally unique identifier](https://en.wikipedia.org/wiki/Universally_unique_identifier). This is used to identify the point in the collection. This ID may be provided using `_id` when inserting points or let the server generate UUID4 for you. If you provide an ID, ensure it is unique in the collection: | ||
|
||
```json | ||
{ | ||
"_id": "f7b3b3b4-3b7b-4b3b-8b3b-3b7b3b7b3b7b", | ||
"...": "...", | ||
} | ||
``` | ||
|
||
**What if I have existing data with integer or other string IDs?** In this case we recommend you store it as an integer or string field in the point itself as well as storing the SemaDB point ID in the external database. This way you can easily link the data in the external database to the data in SemaDB. This creates a **cross-reference** between the external database and SemaDB: | ||
|
||
```mermaid | ||
graph TD | ||
Point["Point / Document"] --> UUID["SemaDB UUID"] | ||
UUID --> SemaDB[(SemaDB)] | ||
Point --> External["External ID"] | ||
UUID --> ExternalDB | ||
External --> ExternalDB[(Database)] | ||
External --> SemaDB | ||
``` | ||
|
||
In this setup, you can pre-generate the UUIDs for the points and store them in the external database. When you insert the points into SemaDB, you can use the same UUIDs. | ||
|