Skip to content

Commit

Permalink
Merge branch 'v1-28/main' into v1-28/rbac
Browse files Browse the repository at this point in the history
  • Loading branch information
databyjp committed Dec 11, 2024
2 parents a795750 + c68620d commit 14fe8a7
Show file tree
Hide file tree
Showing 16 changed files with 201 additions and 37 deletions.
35 changes: 35 additions & 0 deletions _includes/code/howto/manage-data.collections.py
Original file line number Diff line number Diff line change
Expand Up @@ -565,6 +565,41 @@

client.close()

# ==============================================
# ===== ALL REPLICATION SETTINGS
# ==============================================

# Connect to a setting with 3 replicas
client = weaviate.connect_to_local(
port=8180 # Port for demo setup with 3 replicas
)

# Clean slate
client.collections.delete("Article")

# START AllReplicationSettings
from weaviate.classes.config import Configure, ReplicationDeletionStrategy

client.collections.create(
"Article",
# highlight-start
replication_config=Configure.replication(
factor=3,
async_enabled=True, # Enable asynchronous repair
deletion_strategy=ReplicationDeletionStrategy.TIME_BASED_RESOLUTION, # Added in v1.28; Set the deletion conflict resolution strategy
)
# highlight-end
)
# END AllReplicationSettings

# Test
collection = client.collections.get("Article")
config = collection.config.get()
assert config.replication_config.async_enabled == True
assert config.replication_config.deletion_strategy == ReplicationDeletionStrategy.TIME_BASED_RESOLUTION

client.close()

# ====================
# ===== SHARDING =====
# ====================
Expand Down
3 changes: 2 additions & 1 deletion _includes/tokenization_definition.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@
| `field` | Index the whole field after trimming whitespace characters. | `Hello, (beautiful) world` |
| `trigram` | Split the property as rolling trigrams. | `Hel`, `ell`, `llo`, `lo,`, ... |
| `gse` | Use the `gse` tokenizer to split the property. | [See `gse` docs](https://pkg.go.dev/github.com/go-ego/gse#section-readme) |
| `kagome_kr` | Use the `Kagome` tokenizer with a Korean dictionary to split the property. | [See `kagome` docs](https://github.com/ikawaha/kagome) and the [Korean dictionary](https://github.com/ikawaha/kagome-dict-ko) |
| `kagome_ja` | Use the `Kagome` tokenizer with a Japanese (IPA) dictionary to split the property. | [See `kagome` docs](https://github.com/ikawaha/kagome) and the [dictionary](https://github.com/ikawaha/kagome-dict/). |
| `kagome_kr` | Use the `Kagome` tokenizer with a Korean dictionary to split the property. | [See `kagome` docs](https://github.com/ikawaha/kagome) and the [Korean dictionary](https://github.com/ikawaha/kagome-dict-ko). |
4 changes: 3 additions & 1 deletion developers/academy/py/tokenization/200_options.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,9 @@ Weaviate provides `gse` and `trigram` (from `v1.24`) and `kagome_kr` (from `v1.2

`gse` implements the "Jieba" algorithm, which is a popular Chinese text segmentation algorithm. `trigram` splits text into all possible trigrams, which can be useful for languages like Japanese.

`kagome_kr` uses the [`Kagome` tokenizer](https://github.com/ikawaha/kagome?tab=readme-ov-file) with a Korean MeCab ([mecab-ko-dic](https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/)) dictionary to split the property text. This is useful for Korean text.
`kagome_ja` uses the [`Kagome` tokenizer](https://github.com/ikawaha/kagome?tab=readme-ov-file) with a Japanese [MeCab IPA](https://github.com/ikawaha/kagome-dict/) dictionary to split Japanese property text.

`kagome_kr` uses the [`Kagome` tokenizer](https://github.com/ikawaha/kagome?tab=readme-ov-file) with a Korean MeCab ([mecab-ko-dic](https://bitbucket.org/eunjeon/mecab-ko-dic/src/master/)) dictionary to split Korean property text.

## Questions and feedback

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,61 @@ For speed, data operations on a tenant occur independently of any tenant activit

## Repairs

When Weaviate detects inconsistent data across replicas, it attempts to repair the out of sync data.
In distributed systems like Weaviate, object replicas can become inconsistent due to any number of reasons - network issues, node failures, or timing conflicts. When Weaviate detects inconsistent data across replicas, it attempts to repair the out of sync data.

Starting in v1.26, Weaviate adds [async replication](#async-replication) to proactively detect inconsistencies. In earlier versions, Weaviate uses a [repair-on-read](#repair-on-read) strategy to repair inconsistencies at read time.
Weaviate uses [async replication](#async-replication), [deletion resolution](#deletion-resolution-strategies) and [repair-on-read](#repair-on-read) strategies to maintain consistency across replicas.

### Async replication

:::info Added in `v1.26`
:::

Async replication runs in the background. It uses a Merkle tree algorithm to monitor and compare the state of nodes within a cluster. If the algorithm identifies an inconsistency, it resyncs the data on the inconsistent node.

Repair-on-read works well with one or two isolated repairs. Async replication is effective in situations where there are many inconsistencies. For example, if an offline node misses a series of updates, async replication quickly restores consistency when the node returns to service.

Async replication supplements the repair-on-read mechanism. If a node becomes inconsistent between sync checks, the repair-on-read mechanism catches the problem at read time.

To activate async replication, set `asyncEnabled` to true in the [`replicationConfig` section of your collection definition](../../manage-data/collections.mdx#replication-settings).

### Deletion resolution strategies

:::info Added in `v1.28`
:::

When an object is present on some replicas but not others, this can be because a creation has not yet been propagated to all replicas, or because a deletion has not yet been propagated to all replicas. It is important to distinguish between these two cases.

Deletion resolution works alongside async replication and repair-on-read to ensure consistent handling of deleted objects across the cluster. For each collection, you can set one of the following deletion resolution strategies:

- `NoAutomatedResolution`
- `DeleteOnConflict`
- `TimeBasedResolution`

#### `NoAutomatedResolution`

This is the default setting, and the only setting available in Weaviate versions prior to `v1.28`. In this mode, Weaviate does not treat deletion conflicts as a special case. If an object is present on some replicas but not others, Weaviate may potentially restore the object on the replicas where it is missing.

#### `DeleteOnConflict`

A deletion conflict in `deleteOnConflict` is always resolved by deleting the object on all replicas.

To do so, Weaviate updates an object as a deleted object on a replica upon receiving a deletion request, rather than removing all traces of the object.

#### `TimeBasedResolution`

A deletion conflict in `timeBasedResolution` is resolved based on the timestamp of the deletion request, in comparison to any subsequent updates to the object such as a creation or an update.

If the deletion request has a timestamp that is later than the timestamp of any subsequent updates, the object is deleted on all replicas. If the deletion request has a timestamp that is earlier than the timestamp of any subsequent updates, the later updates are applied to all replicas.

For example:
- If an object is deleted at timestamp 100 and then recreated at timestamp 90, the recreation wins
- If an object is deleted at timestamp 100 and then recreated at timestamp 110, the deletion wins

#### Choosing a strategy

- Use `NoAutomatedResolution` when you want maximum control and handle conflicts manually
- Use `DeleteOnConflict` when you want to ensure deletions are always honored
- Use `TimeBasedResolution` when you want the most recent operation to take precedence

### Repair-on-read

Expand All @@ -178,19 +230,6 @@ The read repair process also depends on the read and write consistency levels us

Repairs only happen on read, so they do not create a lot of background overhead. While nodes are in an inconsistent state, read operations with consistency level of `ONE` may return stale data.

### Async replication

:::info Added in `v1.26`
:::

Async replication runs in the background. It uses a Merkle tree algorithm to monitor and compare the state of nodes within a cluster. If the algorithm identifies an inconsistency, it resyncs the data on the inconsistent node.

Repair-on-read works well with one or two isolated repairs. Async replication is effective in situations where there are many inconsistencies. For example, if an offline node misses a series of updates, async replication quickly restores consistency when the node returns to service.

Async replication supplements the repair-on-read mechanism. If a node becomes inconsistent between sync checks, the repair-on-read mechanism catches the problem at read time.

To activate async replication, set `asyncEnabled` to true in the [`replicationConfig` section of your collection definition](../../manage-data/collections.mdx#replication-settings).

## Related pages
- [API References | GraphQL | Get | Consistency Levels](../../api/graphql/get.md#consistency-levels)
- [API References | REST | Objects](/developers/weaviate/api/rest#tag/objects)
Expand Down
1 change: 1 addition & 0 deletions developers/weaviate/config-refs/env-vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ default hostname has changed and a single node cluster believes there are suppos
| `ENABLE_API_BASED_MODULES` | Enable all API-based modules. (Experimental as of `v1.26.0`) | `boolean` | `true` |
| `ENABLE_MODULES` | Specify Weaviate modules to enable | `string - comma separated names` | `text2vec-openai,generative-openai` |
| `ENABLE_TOKENIZER_GSE` | Enable the [`GSE` tokenizer](../config-refs/schema/index.md#gse-and-trigram-tokenization-methods) for use | `boolean` | `true` |
| `ENABLE_TOKENIZER_KAGOME_JA` | Enable the [`Kagome` tokenizer for Japanese](../config-refs/schema/index.md#kagome_ja-tokenization-method) for use (Experimental as of `v1.28.0`) | `boolean` | `true` |
| `ENABLE_TOKENIZER_KAGOME_KR` | Enable the [`Kagome` tokenizer for Korean](../config-refs/schema/index.md#kagome_kr-tokenization-method) for use (Experimental as of `v1.25.7`) | `boolean` | `true` |
| `GODEBUG` | Controls debugging variables within the runtime. [See official Go docs](https://pkg.go.dev/runtime). | `string - comma-separated list of name=val pairs` | `gctrace=1` |
| `GOMAXPROCS` | Set the maximum number of threads that can be executing simultaneously. If this value is set, it be respected by `LIMIT_RESOURCES`. | `string - number` | `NUMBER_OF_CPU_CORES` |
Expand Down
28 changes: 25 additions & 3 deletions developers/weaviate/config-refs/schema/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,25 @@ The `gse` tokenizer is not loaded by default to save resources. To use it, set t
- `"素早い茶色の狐が怠けた犬を飛び越えた"`: `["素早", "素早い", "早い", "茶色", "の", "狐", "が", "怠け", "けた", "犬", "を", "飛び", "飛び越え", "越え", "た", "素早い茶色の狐が怠けた犬を飛び越えた"]`
- `"すばやいちゃいろのきつねがなまけたいぬをとびこえた"`: `["すばや", "すばやい", "やい", "いち", "ちゃ", "ちゃい", "ちゃいろ", "いろ", "のき", "きつ", "きつね", "つね", "ねが", "がな", "なま", "なまけ", "まけ", "けた", "けたい", "たい", "いぬ", "を", "とび", "とびこえ", "こえ", "た", "すばやいちゃいろのきつねがなまけたいぬをとびこえた"]`

### `kagome_ja` tokenization method

:::caution Experimental feature
Available starting in `v1.28.0`. This is an experimental feature. Use with caution.
:::

For Japanese text, `kagome_ja` tokenization method is also available. This uses the [`Kagome` tokenizer](https://github.com/ikawaha/kagome?tab=readme-ov-file) with a Japanese [MeCab IPA](https://github.com/ikawaha/kagome-dict/) dictionary to split the property text.

The `kagome_ja` tokenizer is not loaded by default to save resources. To use it, set the environment variable `ENABLE_TOKENIZER_KAGOME_JA` to `true` on the Weaviate instance.

`kagome_ja` tokenization examples:

- `"春の夜の夢はうつつよりもかなしき 夏の夜の夢はうつつに似たり 秋の夜の夢はうつつを超え 冬の夜の夢は心に響く 山のあなたに小さな村が見える 川の音が静かに耳に届く 風が木々を通り抜ける音 星空の下、すべてが平和である"`:
- [`"春", "の", "夜", "の", "夢", "は", "うつつ", "より", "も", "かなしき", "\n\t", "夏", "の", "夜", "の", "夢", "は", "うつつ", "に", "似", "たり", "\n\t", "秋", "の", "夜", "の", "夢", "は", "うつつ", "を", "超え", "\n\t", "冬", "の", "夜", "の", "夢", "は", "心", "に", "響く", "\n\n\t", "山", "の", "あなた", "に", "小さな", "村", "が", "見える", "\n\t", "川", "の", "音", "が", "静か", "に", "耳", "に", "届く", "\n\t", "風", "が", "木々", "を", "通り抜ける", "音", "\n\t", "星空", "の", "下", "、", "すべて", "が", "平和", "で", "ある"`]
- `"素早い茶色の狐が怠けた犬を飛び越えた"`:
- `["素早い", "茶色", "の", "狐", "が", "怠け", "た", "犬", "を", "飛び越え", "た"]`
- `"すばやいちゃいろのきつねがなまけたいぬをとびこえた"`:
- `["すばやい", "ちゃ", "いろ", "の", "きつね", "が", "なまけ", "た", "いぬ", "を", "とびこえ", "た"]`

### `kagome_kr` tokenization method

:::caution Experimental feature
Expand All @@ -558,9 +577,12 @@ The `kagome_kr` tokenizer is not loaded by default to save resources. To use it,

`kagome_kr` tokenization examples:

- `"아버지가방에들어가신다"`: `["아버지", "가", "방", "에", "들어가", "신다"]`
- `"아버지가 방에 들어가신다"`: `["아버지", "가", "방", "에", "들어가", "신다"]`
- `"결정하겠다"`: `["결정", "하", "겠", "다"]`
- `"아버지가방에들어가신다"`:
- `["아버지", "가", "방", "에", "들어가", "신다"]`
- `"아버지가 방에 들어가신다"`:
- `["아버지", "가", "방", "에", "들어가", "신다"]`
- `"결정하겠다"`:
- `["결정", "하", "겠", "다"]`

### Inverted index types

Expand Down
1 change: 1 addition & 0 deletions developers/weaviate/configuration/authorization.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Authorization & RBAC
title: Authorization & RBAC
sidebar_position: 35
image: og/docs/configuration.jpg
# tags: ['authorization']
Expand Down
69 changes: 66 additions & 3 deletions developers/weaviate/manage-data/collections.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -838,11 +838,74 @@ import RaftRFChangeWarning from '/_includes/1-25-replication-factor.mdx';

<RaftRFChangeWarning/>

Configure replication per collection.
Configure replication settings, such as [async replication](../concepts/replication-architecture/consistency.md#async-replication) and [deletion resolution strategy](../concepts/replication-architecture/consistency.md#deletion-resolution-strategies).

import ReplicationConfigWithAsyncRepair from '/_includes/code/configuration/replication-consistency.mdx';
<Tabs groupId="languages">
<TabItem value="py" label="Python Client v4">
<FilteredTextBlock
text={PyCode}
startMarker="# START AllReplicationSettings"
endMarker="# END AllReplicationSettings"
language="py"
docRefs={[
"weaviate.collections.classes.html#weaviate.collections.classes.config.Configure"
]}
/>
</TabItem>

<ReplicationConfigWithAsyncRepair />
<TabItem value="py3" label="Python Client v3">
<FilteredTextBlock
text={PyCodeV3}
startMarker="# START AsyncRepair"
endMarker="# END AsyncRepair"
language="pyv3"
/>
</TabItem>

<TabItem value="js" label="JS/TS Client v3">
<FilteredTextBlock
text={TSCode}
startMarker="// START AsyncRepair"
endMarker="// END AsyncRepair"
language="ts"
/>
</TabItem>

<TabItem value="js2" label="JS/TS Client v2">
<FilteredTextBlock
text={TSCodeLegacy}
startMarker="// START AsyncRepair"
endMarker="// END AsyncRepair"
language="tsv2"
/>
</TabItem>
<TabItem value="curl" label="cURL">

```bash
curl \
-X POST \
-H "Content-Type: application/json" \
-d '{
"class": "Article",
"properties": [
{
"dataType": [
"string"
],
"description": "Title of the article",
"name": "title"
}
],
"replicationConfig": {
"factor": 3,
"asyncEnabled": true
}
}' \
http://localhost:8080/v1/schema
```

</TabItem>
</Tabs>

<details>
<summary>
Expand Down
4 changes: 2 additions & 2 deletions developers/weaviate/model-providers/cohere/embeddings.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,8 @@ Provide the API key to Weaviate using one of the following methods:
endMarker="# END CohereInstantiation"
language="py"
docRefs={[
"connect_to_weaviate_cloud",
"auth.Auth",
"weaviate.html#weaviate.connect_to_weaviate_cloud",
"weaviate.html#weaviate.auth.Auth",
]}
/>
</TabItem>
Expand Down
8 changes: 4 additions & 4 deletions src/components/Documentation/FilteredTextBlock.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ const GITHUB_CONFIG = {
const DOC_SYSTEMS = {
py: {
baseUrl:
'https://weaviate-python-client.readthedocs.io/en/stable/weaviate.html',
constructUrl: (baseUrl, ref) => `${baseUrl}#weaviate.${ref}`,
'https://weaviate-python-client.readthedocs.io/en/stable',
constructUrl: (baseUrl, ref) => `${baseUrl}/${ref}`,
icon: '/img/site/logo-py.svg',
},
pyv3: {
baseUrl:
'https://weaviate-python-client.readthedocs.io/en/v3.26.2/weaviate.html',
constructUrl: (baseUrl, ref) => `${baseUrl}#weaviate.${ref}`,
'https://weaviate-python-client.readthedocs.io/en/v3.26.2',
constructUrl: (baseUrl, ref) => `${baseUrl}/${ref}`,
icon: '/img/site/logo-py.svg',
},
ts: {
Expand Down
2 changes: 1 addition & 1 deletion tests/docker-compose-anon-2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ services:
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.27.5
image: cr.weaviate.io/semitechnologies/weaviate:1.28.0-rc.0
ports:
- 8090:8080
- 50061:50051
Expand Down
2 changes: 1 addition & 1 deletion tests/docker-compose-anon-bind.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ services:
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.27.5
image: cr.weaviate.io/semitechnologies/weaviate:1.28.0-rc.0
ports:
- 8380:8080
- 50351:50051
Expand Down
2 changes: 1 addition & 1 deletion tests/docker-compose-anon-clip.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ services:
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.27.5
image: cr.weaviate.io/semitechnologies/weaviate:1.28.0-rc.0
ports:
- 8280:8080
- 50251:50051
Expand Down
2 changes: 1 addition & 1 deletion tests/docker-compose-anon.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ services:
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.27.5
image: cr.weaviate.io/semitechnologies/weaviate:1.28.0-rc.0
ports:
- 8080:8080
- 50051:50051
Expand Down
6 changes: 3 additions & 3 deletions tests/docker-compose-three-nodes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ services:
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.27.5
image: cr.weaviate.io/semitechnologies/weaviate:1.28.0-rc.0
restart: on-failure:0
ports:
- "8180:8080"
Expand Down Expand Up @@ -39,7 +39,7 @@ services:
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.27.5
image: cr.weaviate.io/semitechnologies/weaviate:1.28.0-rc.0
restart: on-failure:0
ports:
- "8181:8080"
Expand Down Expand Up @@ -71,7 +71,7 @@ services:
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.27.5
image: cr.weaviate.io/semitechnologies/weaviate:1.28.0-rc.0
restart: on-failure:0
ports:
- "8182:8080"
Expand Down
2 changes: 1 addition & 1 deletion tests/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ services:
- '8080'
- --scheme
- http
image: cr.weaviate.io/semitechnologies/weaviate:1.27.5
image: cr.weaviate.io/semitechnologies/weaviate:1.28.0-rc.0
ports:
- 8099:8080
- 50052:50051
Expand Down

0 comments on commit 14fe8a7

Please sign in to comment.