Skip to content

Commit

Permalink
Merge branch 'main' into nw_declare
Browse files Browse the repository at this point in the history
  • Loading branch information
nwoolmer authored Nov 28, 2024
2 parents 8e939c4 + 8dd512a commit 31959d6
Show file tree
Hide file tree
Showing 10 changed files with 1,099 additions and 339 deletions.
56 changes: 23 additions & 33 deletions documentation/concept/replication.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,19 @@ The architecture consists of:
file system
- Any number of **replica** instances

<Screenshot
alt="Primary into object store. Object store into 1-n replicas."
src="images/docs/concepts/replication-basic.webp"
/>
```mermaid
graph TD
primary[Primary]
objectStore[Object Store]
replica1[Replica 1]
replica2[Replica 2]
replicaN[Replica ..N]
primary --> objectStore
objectStore --> replica1
objectStore --> replica2
objectStore --> replicaN
```

## Supported Object Stores

Expand All @@ -75,26 +84,14 @@ HDFS support is on our roadmap.

We also plan to support other object stores such as Google Cloud Storage:

<Screenshot
alt="AWS S3, Azure, and NFS first. Then HDFS. Then Google Cloud Storage."
src="images/docs/concepts/replication-support-timeline.webp"
/>

<!--
The sequential diagram above was created by https://mermaid.live/, using the Halloween Theme
Source:
```mermaid
timeline
Currently : AWS S3
: Azure Blob Store
: NFS Filesystem
Next-up : HDFS
Later on : Google Cloud Storage
<Screenshot
alt="AWS S3, Azure, and NFS first. Then HDFS. Then Google Cloud Storage."
src="images/docs/concepts/replication-support-timeline.webp"
/>
-->
```

Something missing? Want to see it sooner? [Contact us](/enterprise/contact)!

Expand Down Expand Up @@ -122,21 +119,6 @@ examples.
src="images/docs/concepts/replication-streams.webp"
/>

<!--
The diagram above was created by https://mermaid.live/, using the Halloween Theme
Source:
graph TD
primary[(primary)]
object_store((object<br/>store))
replica_1[(replica 1)]
replica_2[(replica 2)]
replica_N[(replica ..N)]
primary - -> object_store #Note: dashes should have no spaces in between
object_store - -> replica_1
object_store - -> replica_2
object_store - -> replica_N
-->

In addition, the same object store can be re-used for multiple replication
clusters. To do so, provide a unique `DB_INSTANCE_NAME` value.

Expand All @@ -155,3 +137,11 @@ up to date manually and are often generally just inserted during the database
tables setup phase.

This limitation will be lifted in the future.

## Multi-primary ingestion

[QuestDB Enterprise](/enterprise/) supports multi-primary ingestion, where
multiple primaries can write to the same database.

See the [Multi-primary ingestion](/docs/operations/multi-primary-ingestion/)
page for more information.
105 changes: 105 additions & 0 deletions documentation/operations/multi-primary-ingestion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
title: Multi-primary ingestion
sidebar_label: Multi-primary ingestion
description:
Instructions and advice on performing multi-primary ingestion within QuestDB Enterprise.
---

QuestDB Enterprise's multi-primary ingestion enables **strong throughput** and **high availability** via concurrent writes across multiple primaries. With the addition of **automatic failover**, it provides reliable ingestion under most failure and zero-downtime update scenarios.

Multi-primary ingestion applies multiple primary nodes to handle parallel ingestion operations. Each primary node can sustain up to **5 million rows per second** as per [Time Series Benchmark Suite (TSBS)](https://github.com/questdb/tsbs) benchmarks.

This document explains multi-primary ingestion, and provides details on the use of FoundationDB for metadata coordination, a sequencer for conflict resolution, and failover mechanisms for added reliability.

:::note

Multi-primary ingestion is coming soon to QuestDB Enterprise.

:::

## How multi-primary ingestion works

### The role of the sequencer

QuestDB Enterprise provides write consistency across multiple primary nodes via a **sequencer**. The sequencer coordinates transactions by assigning **monotonic transaction numbers** to writes.

Each primary behaves as both a primary and a replica. It will write data into the [object store](/concept/replication/#supported-object-stores) and read data from the [object store](/concept/replication/#supported-object-stores) written by the other primaries to catch up with any changes.

This approach is already used by QuestDB, even on single-instance configuration, for managing writes via **WAL files** and has been extended to support multi-primary setups:

```mermaid
sequenceDiagram
participant Client
participant Sequencer
participant Table
Client->>Sequencer: Write Request (Data, TxID)
Sequencer->>Table: Check Current Sequence
alt TxID matches current sequence
note over Sequencer,Table: Transaction is valid
Sequencer->>Table: Write Data
Table-->>Sequencer: Write Success
Sequencer-->>Client: Acknowledge Success
else TxID does not match sequence
note over Sequencer: Transaction conflict detected
Sequencer-->>Client: Conflict Error (Retry with updated TxID)
end
```

Broken down, here is how it works:

* **Transaction sequencing:** Each write operation is assigned a unique transaction ID. The sequencer ensures that this ID is always **incremental** and consistent for the target table. Transaction sequencing is managed **per table**, so each table has an independent sequence.

* **Optimistic locking:** When a write request is received, the sequencer validates the transaction number. If the transaction number is lower than the current sequence for the table, the write is rejected with a conflict error.

* **Error handling:** If a transaction conflict occurs, QuestDB Enterprise notifies the client with an error message. The client can retry the transaction with updated data.

### Distributed sequencer with FoundationDB

In a multi-primary configuration, the sequencer operates in a **distributed mode**. It does so with [FoundationDB](https://www.foundationdb.org/) as the backend for storing transaction metadata. This enables synchronization across primaries.

- **Cluster-wide coordination:** The metadata about transaction IDs for each table is maintained in FoundationDB. This allows all primaries to verify and update sequence numbers without conflicts.

- **Dynamic primary selection:** Clients can send writes directly to a specific primary node or to the cluster address. When writing to the cluster address, data is automatically routed to the most optimal primary node based on the current load and availability.

- **Conflict resolution:** As in single-primary setups, transaction conflicts are handled by rejecting conflicting writes and informing the client to resend the data.

With FoundationDB for metadata coordination, and a robust sequencer for conflict resolution, QuestDB Enterprise provides the resiliency your team needs to scale with peace of mind.

## Automatic failover

QuestDB Enterprise's multi-primary architecture also supports **automatic failover** to ensure high availability in case of primary node failures. This feature minimizes downtime and maintains ingestion continuity.

### Failover workflow

1. **Primary and replica registration:**
- All **primaries** and **replicas** register with the FoundationDB coordinator on startup.
- This provides the cluster with a complete view of the topology, including all active primaries and replicas.
1. **Failure detection:**
- The cluster continuously monitors the health of primary nodes.
- If a primary becomes unresponsive due to hardware issues, a restart, or other failures, the cluster initiates a failover.
1. **Replica promotion:**
- A suitable replica is automatically promoted to **primary** based on its synchronization status and availability.
- The new primary information is propagated to all connected clients using the official QuestDB Enterprise SDK.
- If the client is connected to the cluster address, writes are seamlessly redirected to the new primary.
1. **Reintegration of the failed primary:**
- The failed primary undergoes a recovery process, catching up with the latest transactions using the **Write Ahead Log (WAL)** from the cluster.
- Once synchronized, the recovered primary is reinstated, and the promoted replica returns to its previous role as a replica.

## Configuration and topology awareness

When configuring a QuestDB Enterprise cluster with multi-primary ingestion:

- **Coordinator setup:** Both primary nodes and replicas must be configured to communicate with the FoundationDB coordinator.

- **Client awareness:** Clients using the official QuestDB Enterprise SDK receive the cluster topology on connection. This allows them to:
- Select a specific primary for writes.
- Send data to the cluster address for automatic routing.

- **Seamless failover:** During failover events, clients are automatically informed of topology changes, ensuring uninterrupted ingestion.

## Replication

QuestDB Enterprise's multi-primary ingestion is built alongside its primary-replica replication capabilities.

See the [Replication](/docs/operations/replication/) page for more information.
8 changes: 8 additions & 0 deletions documentation/operations/replication.md
Original file line number Diff line number Diff line change
Expand Up @@ -330,3 +330,11 @@ daily ones due to the computational and IO demands of applying WAL files.
For systems with high daily data injection, daily snapshots are recommended.
Infrequent snapshots or long snapshot periods, such as 60 days with 30-day WAL
expiration, may prevent successful database restoration.

## Multi-primary ingestion

[QuestDB Enterprise](/enterprise/) supports multi-primary ingestion, where
multiple primaries can write to the same database.

See the [Multi-primary ingestion](/docs/operations/multi-primary-ingestion/) page for
more information.
2 changes: 1 addition & 1 deletion documentation/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Choose from the following options:

### Docker

To use Docker, one must have Docker. You can installation find guides for your
To use Docker, one must have Docker. You can find installation guides for your
platform on the [official documentation](https://docs.docker.com/get-docker/).

Once Docker is installed, you will need to pull QuestDB's image from
Expand Down
4 changes: 2 additions & 2 deletions documentation/reference/sql/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,7 @@ Either:
Parquet files can be read and thus queried by QuestDB.

QuestDB is shipped with a demo Parquet file, `trades.parquet`, which can be
queried using the `parquet_read` function.
queried using the `read_parquet` function.

Example:

Expand All @@ -417,7 +417,7 @@ WHERE

The trades.parquet file is located in the `import` subdirectory inside the
QuestDB root directory. Drop your own Parquet files to the import directory and
query them using the `parquet_read()` function.
query them using the `read_parquet()` function.

You can change the allowed directory by setting the `cairo.sql.copy.root`
configuration key.
Expand Down
6 changes: 6 additions & 0 deletions documentation/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,12 @@ module.exports = {
id: "operations/replication",
customProps: { tag: "Enterprise" },
},
{
id: "operations/multi-primary-ingestion",
type: "doc",
label: "Multi-primary ingestion",
customProps: { tag: "Enterprise" },
},
{
type: "doc",
id: "operations/rbac",
Expand Down
22 changes: 22 additions & 0 deletions documentation/third-party-tools/kafka/questdb-kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,28 @@ Note that deduplication requires designated timestamps extracted either from
message payload or Kafka message metadata. See the
[Designated timestamps](#designated-timestamps) section for more information.


#### Dead Letter Queue

When messages cannot be processed due to non-recoverable errors, such as invalid data formats or schema mismatches, the
connector can send these failed messages to a Dead Letter Queue (DLQ). This prevents the entire connector from stopping
when it encounters problematic messages. To enable this feature, configure the Dead Letter Queue in your Kafka Connect
worker configuration using the `errors.tolerance`, `errors.deadletterqueue.topic.name`, and
`errors.deadletterqueue.topic.replication.factor` properties. For example:

```properties
errors.tolerance=all
errors.deadletterqueue.topic.name=dlq-questdb
errors.deadletterqueue.topic.replication.factor=1
```

When configured, messages that fail to be processed will be sent to the specified DLQ topic along with error details,
allowing for later inspection and troubleshooting. This is particularly useful in production environments where data
quality might vary and you need to ensure continuous operation while investigating problematic messages.

See [Confluent article](https://developer.confluent.io/courses/kafka-connect/error-handling-and-dead-letter-queues/) about DLQ.


### Latency considerations

The connector waits for a batch of messages to accumulate before sending them to
Expand Down
22 changes: 22 additions & 0 deletions docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,28 @@ const config = {
],
},
],
function (context, options) {
return {
name: 'development-redirects',
configureWebpack(config, isServer, utils) {
if (process.env.NODE_ENV === 'development') {
return {
devServer: {
onBeforeSetupMiddleware: function (devServer) {
devServer.app.get('*', function (req, res, next) {
// If path doesn't start with /docs, redirect to Next.js
if (!req.path.startsWith('/docs/')) {
return res.redirect(`http://localhost:3000${req.path}`)
}
next()
})
}
}
}
}
}
}
}
].filter(Boolean),

themeConfig: {
Expand Down
18 changes: 9 additions & 9 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@
"private": true,
"license": "Apache-2.0",
"scripts": {
"start": "cross-env docusaurus start",
"start": "cross-env docusaurus start --port 3001",
"prebuild": "docusaurus clear && node ./scripts/cleanup-guidelines",
"build": "cross-env NO_UPDATE_NOTIFIER=true USE_SIMPLE_CSS_MINIFIER=true PWA_SW_CUSTOM= docusaurus build",
"deploy": "docusaurus deploy",
"serve": "docusaurus serve"
},
"dependencies": {
"@docusaurus/faster": "^3.6.1",
"@docusaurus/theme-mermaid": "3.6.1",
"@docusaurus/faster": "^3.6.3",
"@docusaurus/theme-mermaid": "^3.6.3",
"@headlessui/react": "^2.2.0",
"@heroicons/react": "2.2.0",
"@mdx-js/react": "3.1.0",
Expand All @@ -39,12 +39,12 @@
"unist-util-visit": "5.0.0"
},
"devDependencies": {
"@docusaurus/core": "3.6.1",
"@docusaurus/module-type-aliases": "3.6.1",
"@docusaurus/plugin-pwa": "3.6.1",
"@docusaurus/preset-classic": "3.6.1",
"@docusaurus/tsconfig": "3.6.1",
"@docusaurus/types": "3.6.1",
"@docusaurus/core": "^3.6.3",
"@docusaurus/module-type-aliases": "^3.6.3",
"@docusaurus/plugin-pwa": "^3.6.3",
"@docusaurus/preset-classic": "^3.6.3",
"@docusaurus/tsconfig": "^3.6.3",
"@docusaurus/types": "^3.6.3",
"@types/react": "18.3.12",
"@types/react-helmet": "6.1.11",
"@types/react-router-config": "5.0.11",
Expand Down
Loading

0 comments on commit 31959d6

Please sign in to comment.