diff --git a/introduction.md b/introduction.md index 4e1bd846..bee6f5f3 100644 --- a/introduction.md +++ b/introduction.md @@ -75,23 +75,14 @@ Greatest hits include: ## Benefits of QuestDB {#benefits} -Time series data is seen increasingly in use cases across: +To avoid ingestion bottlenecks, high performance data ingestion is essential. -- finance -- internet of things (IoT) -- e-commerce -- security -- blockchain -- many other emerging technical industries +But performance is only part of the story. -As more time bound data is generated, high performance data reception is -essential to avoid ingestion bottlenecks. +Efficiency measures how well a database performs relative to its available +resources. -The right data store greatly simplifies code costly infrastructure sprawl and -spend. - -But to be _the right one_, the storage engine must be both high performance and -efficient: +QuestDB, on maximal hardware, significantly outperforms peers: +However, on less robust hardware the difference is even more pronounced, as seen +in the following benchmark. + +Even on hardware as light as a Raspberry Pi 5, QuestDB outperforms competitors +on stronger hardware: + + + Beyond performance and efficiency, with a specialized [time-series database](/glossary/time-series-database/), you don't need to worry about: diff --git a/reference/function/finance.md b/reference/function/finance.md index 0470248e..49ee16dd 100644 --- a/reference/function/finance.md +++ b/reference/function/finance.md @@ -12,8 +12,10 @@ Trade price calculation. `l2price(target_quantity, quantity_1, price_1, quantity_2, price_2, ..., quantity_n, price_n)` -Consider `quantity_1`, `price_1`, `quantity_2`, `price_2`, ..., `quantity_n`, `price_n` to be either -side of an order book with `n` price levels. Then, the return value of the function is the average trade price of a market order executed with the size of `target_quantity` against the book. +Consider `quantity_1`, `price_1`, `quantity_2`, `price_2`, ..., `quantity_n`, +`price_n` to be either side of an order book with `n` price levels. Then, the +return value of the function is the average trade price of a market order +executed with the size of `target_quantity` against the book. Let's take the below order book as an example. @@ -25,8 +27,8 @@ Let's take the below order book as an example. | 21 | 13.70 | 15.10 | 12 | | 18 | 13.40 | | | -A _buy market order_ with the size of 50 would wipe out the first two price levels of -the _Ask_ side of the book, and would also trade on the third level. +A _buy market order_ with the size of 50 would wipe out the first two price +levels of the _Ask_ side of the book, and would also trade on the third level. The full price of the trade: @@ -40,8 +42,8 @@ $$ \$732.60 / 50 = \$14.652 $$ -This average trade price is the output of the function when executed with the parameters taken from -the above example: +This average trade price is the output of the function when executed with the +parameters taken from the above example: ```questdb-sql select l2price(50, 14, 14.50, 16, 14.60, 23, 14.80, 12, 15.10); @@ -53,21 +55,23 @@ select l2price(50, 14, 14.50, 16, 14.60, 23, 14.80, 12, 15.10); ### Parameters -The function takes a `target quantity`, and a variable number of `quantity`/`price` pairs. Each -represents a price level of the order book. +The function takes a `target quantity`, and a variable number of +`quantity`/`price` pairs. Each represents a price level of the order book. -Each parameter is expected to be a double, or convertible to double (float, long, int, short, byte). +Each parameter is expected to be a double, or convertible to double (float, +long, int, short, byte). - `target_quantity`: The size of a hypothetical market order to be filled. -- `quantity*`: The number of instruments available at the corresponding price levels. +- `quantity*`: The number of instruments available at the corresponding price + levels. - `price*`: Price levels of the order book. ### Return value The function returns with a `double`, representing the average trade price. -Returns null if the price is not calculable. For example, if the target quantity cannot be filled, -or there is incomplete data in the set (nulls). +Returns null if the price is not calculable. For example, if the target quantity +cannot be filled, or there is incomplete data in the set (nulls). ### Examples @@ -148,10 +152,160 @@ SELECT mid(1.5760, 1.5763) | :------ | | 1.57615 | +## regr_slope + +`regr_slope(y, x)` - Calculates the slope of the linear regression line for the +given numeric columns y (dependent variable) and x (independent variable). + +- The function requires at least two valid (x, y) pairs to compute the slope. + - If fewer than two pairs are available, the function returns null. +- Supported data types for x and y include `double`, `float`, and `integer` + types. +- The regr_slope function can be used with other statistical aggregation + functions like `corr` or `covar_samp`. +- The order of arguments in `regr_slope(y, x)` matters. + - Ensure that y is the dependent variable and x is the independent variable. + +### Calculation + +The slope $b_1$ of the regression line $y = b_0 + b_1 x$ is calculated using the +formula: + +$$ +b_1 = \frac{N \sum (xy) - \sum x \sum y}{N \sum (x^2) - (\sum x)^2} +$$ + +Where: + +- $N$ is the number of valid data points. +- $\sum (xy)$ is the sum of the products of $x$ and $y$. +- $\sum x$ and $\sum y$ is the sums of $x$ and $y$ values, respectively. +- $\sum (x^2)$ is the sum of the squares of $x$ values. + +### Arguments + +- y: A numeric column representing the dependent variable. +- x: A numeric column representing the independent variable. + +### Return value + +Return value type is `double`. + +The function returns the slope of the regression line, indicating how much y +changes for a unit change in x. + +### Examples + +#### Calculate the regression slope between two variables + +Suppose you have a table measurements with the following data: + +| x | y | +| --- | --- | +| 1.0 | 2.0 | +| 2.0 | 3.0 | +| 3.0 | 5.0 | +| 4.0 | 4.0 | +| 5.0 | 6.0 | + +You can calculate the slope of the regression line between y and x using: + +```questdb-sql +SELECT regr_slope(y, x) AS slope FROM measurements; +``` + +Result: + +| slope | +| ----- | +| 0.8 | + +Or: The slope of 0.8 indicates that for each unit increase in x, y increases by +0.8 units on average. + +#### Calculate the regression slope grouped by a category + +Consider a table sales_data: + +| category | advertising_spend | sales | +| -------- | ----------------- | ----- | +| A | 1000 | 15000 | +| A | 2000 | 22000 | +| A | 3000 | 28000 | +| B | 1500 | 18000 | +| B | 2500 | 26000 | +| B | 3500 | 31000 | + +Calculate the regression slope of `sales` versus `advertising_spend` for each +category: + +```questdb-sql +SELECT category, regr_slope(sales, advertising_spend) AS slope FROM sales_data +GROUP BY category; +``` + +Result: + +| category | slope | +| -------- | ----- | +| A | 8.5 | +| B | 7.0 | + +Or: + +- In category A, for every additional unit spent on advertising, sales increase + by 8.5 units on average. +- In category B, the increase is 7.0 units per advertising unit spent. + +#### Handling null values + +If your data contains null values, `regr_slope()` will ignore those rows: + +```questdb +SELECT regr_slope(y, x) AS slope FROM ( SELECT 1 AS x, 2 AS y UNION ALL SELECT +2, NULL UNION ALL SELECT NULL, 4 UNION ALL SELECT 4, 5 ); +``` + +Result: + +| slope | +| ----- | +| 0.8 | + +Only the rows where both x and y are not null are considered in the calculation. + +#### Calculating beta + +Assuming you have a table `stock_returns` with daily returns for a specific +stock and the market index: + +| date | stock_return | market_return | +| ---------- | ------------ | ------------- | +| 2023-01-01 | 0.5 | 0.4 | +| 2023-01-02 | -0.2 | -0.1 | +| 2023-01-03 | 0.3 | 0.2 | +| ... | ... | ... | + +Calculate the stock's beta coefficient: + +```questdb-sql +SELECT regr_slope(stock_return, market_return) AS beta FROM stock_returns; +``` + +| beta | +| ---- | +| 1.2 | + +Or: A beta of 1.2 suggests the stock is 20% more volatile than the market. + +Remember: The order of arguments in `regr_slope(y, x)` matters. + +Ensure that y is the dependent variable and x is the independent variable. + ## spread_bps -`spread_bps(bid, ask)` - calculates the quoted bid-ask spread, based on the highest bidding price, -and the lowest asking price. +`spread_bps(bid, ask)` - calculates the quoted bid-ask spread, based on the +highest bidding price, and the lowest asking price. The result is provided in basis points, and the calculation is: @@ -185,7 +339,8 @@ SELECT spread_bps(1.5760, 1.5763) ## vwap `vwap(price, quantity)` - Calculates the volume-weighted average price (VWAP) -based on the given price and quantity columns. This is defined by the following formula: +based on the given price and quantity columns. This is defined by the following +formula: $$ \text{vwap} = diff --git a/reference/function/meta.md b/reference/function/meta.md index 6840722c..3e40198c 100644 --- a/reference/function/meta.md +++ b/reference/function/meta.md @@ -237,6 +237,81 @@ tables() WHERE partitionBy = 'DAY' | --- | -------- | ------------------- | ----------- | ------------------ | ---------- | ------------- | ----- | | 1 | my_table | ts | DAY | 500000 | true | my_table | false | +## table_storage + +`table_storage()` - Returns a table containing information about the storage and +structure of all user tables in the database. + +Provides detailed storage information about all user tables within QuestDB. It +returns one row per table, including information about partitioning, row counts, +and disk usage. + +- The `table_storage()` function excludes system tables; it only lists + user-created tables. +- The `diskSize` value represents the total size of all files associated with + the table on disk, including data, index, and metadata files. +- The `partitionBy` column indicates the partitioning strategy used for the + table. It can be `NONE` if the table is not partitioned. + +**Return values:** + +The function returns the following columns: + +- `tableName` (`string`): The name of the table. +- `walEnabled` (`boolean`): Indicates whether Write-Ahead Logging (WAL) is + enabled for the table. +- `partitionBy` (`string`): The partitioning type of the table (e.g., NONE, DAY, + MONTH, YEAR, etc.). +- `partitionCount` (`long`): The number of partitions the table has. +- `rowCount` (`long`): The total number of rows in the table. +- `diskSize` (`long`): The total disk space used by the table, in bytes. + +**Examples:** + +Retrieve storage information for all tables. + +```questdb-sql title="Checking our demo tables" demo +SELECT * FROM table_storage(); +``` + +- The query retrieves storage details for all tables in the database. +- The `diskSize` column shows the total disk space used by each table in bytes. + +| tableName | walEnabled | partitionBy | partitionCount | rowCount | diskSize | +| -------------- | ---------- | ----------- | -------------- | ---------- | ------------ | +| trips | true | MONTH | 126 | 1634599313 | 261536158948 | +| AAPL_orderbook | true | HOUR | 16 | 3024878 | 2149403527 | +| weather | false | NONE | 1 | 137627 | 9972598 | +| trades | true | DAY | 954 | 1000848308 | 32764798760 | +| ethblocks_json | true | DAY | 3328 | 20688364 | 28311960478 | + +
+ +Filter tables with WAL enabled. + +```questdb-sql title="WAL only tables" demo +SELECT tableName, rowCount, diskSize +FROM table_storage() +WHERE walEnabled = true; +``` + +| tableName | rowCount | diskSize | +| -------------- | ---------- | ------------ | +| trips | 1634599313 | 261536158948 | +| AAPL_orderbook | 3024878 | 2149403527 | +| trades | 1000850255 | 32764804264 | +| ethblocks_json | 20688364 | 28311960478 | + +
+ +Show tables partitioned by `HOUR`. + +```questdb-sql title="Show tables partitioned by hour" demo +SELECT tableName, partitionCount, rowCount +FROM table_storage() +WHERE partitionBy = 'HOUR'; +``` + ## wal_tables `wal_tables()` returns the WAL status for all @@ -305,7 +380,7 @@ table_columns('my_table'); ``` | column | type | indexed | indexBlockCapacity | symbolCached | symbolCapacity | designated | upsertKey | -|--------|-----------|---------|--------------------|--------------|----------------|------------|-----------| +| ------ | --------- | ------- | ------------------ | ------------ | -------------- | ---------- | --------- | | symb | SYMBOL | true | 1048576 | false | 256 | false | false | | price | DOUBLE | false | 0 | false | 0 | false | false | | ts | TIMESTAMP | false | 0 | false | 0 | true | false | @@ -324,7 +399,7 @@ SELECT type, count() FROM table_columns('my_table'); ``` | type | count | -|-----------|-------| +| --------- | ----- | | SYMBOL | 1 | | DOUBLE | 1 | | TIMESTAMP | 1 | @@ -438,15 +513,15 @@ SELECT pg_catalog.version(); | ------------------------------------------------------------------- | | PostgreSQL 12.3, compiled by Visual C++ build 1914, 64-bit, QuestDB | - ## hydrate_table_metadata('table1', 'table2' ...) -`hydrate_table_metadata' re-reads table metadata from disk to update the static metadata cache. +`hydrate_table_metadata' re-reads table metadata from disk to update the static +metadata cache. :::warning -This function should only be used when directed by QuestDB support. Misuse could cause corruption of the metadata -cache, requiring the database to be restarted. +This function should only be used when directed by QuestDB support. Misuse could +cause corruption of the metadata cache, requiring the database to be restarted. ::: @@ -454,13 +529,12 @@ cache, requiring the database to be restarted. A variable list of strings, corresponding to table names. -Alternatively, a single asterisk, '*', representing all tables. +Alternatively, a single asterisk, '\*', representing all tables. **Return value:** Returns `boolean`. `true` if successful, `false` if unsuccessful. - **Examples:** Simply pass table names as arguments to the function. @@ -470,10 +544,9 @@ SELECT hydrate_table_metadata('trades', 'trips') ``` | hydrate_table_metadata | -|------------------------| +| ---------------------- | | true | - If you want to re-read metadata for all user tables, simply use an asterisk: ``` diff --git a/third-party-tools/airbyte.md b/third-party-tools/airbyte.md new file mode 100644 index 00000000..cf429bcf --- /dev/null +++ b/third-party-tools/airbyte.md @@ -0,0 +1,305 @@ +--- +title: Integrating Airbyte with QuestDB +description: Learn how to use Airbyte to collect data from various sources and send it to QuestDB for efficient time-series data storage and analysis. +--- + +# Integrating Airbyte with QuestDB + +[Airbyte](https://airbyte.com/) is an open-source ETL platform designed to help you sync data from a wide range of sources into your desired destinations. With its extensive library of connectors, Airbyte allows users to build scalable data pipelines effortlessly. This guide will walk you through the process of integrating Airbyte with QuestDB, enabling efficient storage and querying of data in a high-performance time-series database. + +## Table of Contents + +- [Prerequisites](#prerequisites) +- [Configuring Airbyte](#configuring-airbyte) + - [Adding a Source](#adding-a-source) + - [Adding a Destination](#adding-a-destination) + - [Generating Configuration](#generating-configuration) +- [Example Inputs](#example-inputs) + - [PostgreSQL Source Configuration](#postgresql-source-configuration) + - [Writing to QuestDB](#writing-to-questdb) + - [Running the Connection](#running-the-connection) + - [Monitoring Sync Status](#monitoring-sync-status) + - [Debugging Tips](#debugging-tips) +- [Verifying the Integration](#verifying-the-integration) +- [Best Practices](#best-practices) +- [Troubleshooting](#troubleshooting) +- [Summary](#summary) + +## Prerequisites + +Before you begin, ensure you have the following: + +- **QuestDB** must be running and accessible. Check out the [QuestDB quick start guide](/docs/get-started/docker/). +- **Airbyte** installed using Docker, or locally via [Homebrew](https://brew.sh/) on macOS. For detailed installation instructions, refer to the [official Airbyte documentation](https://docs.airbyte.com/deploying-airbyte/local-deployment). + +## Configuring Airbyte + +Airbyte uses a user-friendly interface for configuration, allowing you to easily set up your sources and destinations. Once Airbyte is running, navigate to the Airbyte dashboard at `http://localhost:8000`. + +### Adding a source + +1. Click on the **Sources** tab. +2. Select the desired source from the list of available connectors. +3. Fill in the necessary configuration details, such as connection strings, credentials, and any other required fields. + +### Adding a destination + +1. Click on the **Destinations** tab. +2. Select **QuestDB** as your destination. +3. Enter the required connection details for QuestDB: + - **Host**: `localhost` or your QuestDB server IP + - **Port**: `8812` + - **User**: `admin` + - **Password**: (Leave empty if using default settings) + +### Generating configuration + +Airbyte allows you to test your configuration after entering the details. Click on **Test Connection** to ensure that the connection to the source and destination is successful. Once confirmed, you can proceed to create a connection between the source and destination. + +## Example inputs + +In this section, we will demonstrate how to set up Airbyte to extract data from a PostgreSQL source and send it to QuestDB. + +### PostgreSQL source configuration + +1. **Setting Up the Source**: + To set up the source connector, you will need the following configuration details: + +```json +{ + "sourceType": "postgresql", + "configuration": { + "host": "your_postgres_host", + "port": 5432, + "database": "your_database", + "username": "your_username", + "password": "your_password", + "ssl": false, + "table_name": "your_table" + } +} +``` + +2. **Sample Data Extraction**: + You can run a sample query to test the connection and see the kind of data that will be extracted: + +```sql +SELECT * FROM your_table LIMIT 10; +``` + +### Writing to QuestDB + +After configuring your PostgreSQL source, the next step is to configure the destination to send the extracted data to QuestDB. + +#### QuestDB destination configuration + +1. **Setting Up the Destination**: + Here's how to configure the destination connector for QuestDB: + +```json +{ + "destinationType": "questdb", + "configuration": { + "host": "localhost", + "port": 8812, + "database": "your_database", + "username": "admin", + "password": "", + "table": "your_table", + "batch_size": 1000 + } +} +``` + +2. **Data Format**: + QuestDB expects the data in a specific format. Here's an example of how the data might look when being sent: + +```json +[ + { + "column1": "value1", + "column2": 123, + "column3": "2023-10-19T12:00:00Z" + }, + { + "column1": "value2", + "column2": 456, + "column3": "2023-10-19T12:05:00Z" + } +] +``` + +### Running the connection + +Once you have both the source and destination configured, you can create a connection between them. + +1. **Creating the Connection**: + In the Airbyte dashboard, navigate to the **Connections** tab, and create a new connection with the following details: + +```json +{ + "sourceId": "your_postgres_source_id", + "destinationId": "your_questdb_destination_id", + "syncMode": "full_refresh", + "schedule": { + "units": 1, + "timeUnit": "hours" + } +} +``` + +### Monitoring sync status + +Once the sync process is initiated, you can monitor its status directly in the Airbyte dashboard. + +#### Example sync status output + +```json +{ + "status": "COMPLETED", + "records_transferred": 150, + "start_time": "2023-10-19T12:00:00Z", + "end_time": "2023-10-19T12:10:00Z", + "errors": [] +} +``` + +### Debugging tips + +If you encounter issues during the sync process, consider the following debugging steps: + +- **Check Connection Settings**: Ensure the host, port, and authentication details are correct in both the source and destination configurations. +- **Review Logs**: Check the Airbyte logs for any error messages. Logs can provide insight into connection failures or data format issues. +- **Test Queries**: Use the query tools available in your PostgreSQL and QuestDB interfaces to test individual queries and see what data is being extracted or sent. + +## Verifying the integration + +1. **Access the QuestDB Web Console**: + Navigate to the [QuestDB Web Console](/docs/develop/web-console/) at `http://127.0.0.1:9000/`. Once you're on the console, check the Schema Navigator in the top left. You should see two new tables: + + - `cpu` (generated from `inputs.cpu`) + - `mem` (generated from `inputs.mem`) + +2. **Run a Query on the `cpu` Table**: + To verify that data is being correctly written to the `cpu` table, type the following query in the editor and click **RUN**: + +```sql +SELECT * FROM cpu; +``` + +3. **Inspect the Results**: + After running the query, the results should display columns for each metric collected by the [Telegraf](/docs/third-party-tools/telegraf/) plugin for monitoring CPU usage, such as: + + - `usage_idle` + - `usage_user` + - `usage_system` + - `usage_iowait` + + Here's an example of what the results may look like: + +```plaintext +| timestamp | usage_idle | usage_user | usage_system | usage_iowait | +|----------------------|------------|------------|--------------|--------------| +| 2024-10-19T12:00:00Z | 60 | 30 | 10 | 0 | +| 2024-10-19T12:05:00Z | 58 | 31 | 9 | 2 | +``` + +4. **Run a Query on the `mem` Table**: + Similarly, you can check the `mem` table by running the following query: + +```sql +SELECT * FROM mem; +``` + + This will display memory usage statistics collected by Telegraf, which might include: + + - `total` + - `available` + - `used` + - `free` + +5. **Sample Memory Query Results**: + +```plaintext +| timestamp | total | available | used | free | +|----------------------|------------|-----------|-------------|-------------| +| 2024-10-19T12:00:00Z | 8000 | 3000 | 4000 | 1000 | +| 2024-10-19T12:05:00Z | 8000 | 2900 | 4100 | 1000 | +``` + +## Best practices + +To ensure a smooth and efficient integration between Airbyte and QuestDB, consider the following best practices: + +1. **Optimize Data Load Frequency** + - Use batch processing to reduce load on QuestDB. + - Implement incremental sync where possible to only load new or changed data. + +```yaml +# Example Airbyte configuration for incremental sync +sync_mode: incremental +``` + +2. **Data Types and Schema Alignment** + - Ensure data types in Airbyte's source match the destination schema in QuestDB. + - Pre-define tables in QuestDB before starting the sync. + +```sql +-- Example SQL for creating a table in QuestDB +CREATE TABLE my_table ( + id INT, + name STRING, + created_at TIMESTAMP +) timestamp(created_at); +``` + +3. **Use Connection Retry Logic** + - Configure Airbyte to retry connections in case of temporary failures. + +```yaml +# Example Airbyte connection configuration with retries +retries: 3 +``` + +4. **Monitor Resource Utilization** + - Keep an eye on CPU and memory usage on both Airbyte and QuestDB. + - Enable logging in Airbyte to track data sync processes. + +5. **Query Optimization** + - Use indexing in QuestDB to speed up queries on frequently accessed columns. + +```sql +-- Example SQL for creating an index +CREATE INDEX ON my_table (name); +``` + +6. **Data Validation** + - Implement post-load checks to verify data integrity. + +```sql +-- Example SQL for counting records after load +SELECT COUNT(*) FROM my_table; +``` + +## Troubleshooting + +If you do not see the expected tables or data in QuestDB: + +- **Check Airbyte Logs**: Ensure there are no errors in the Airbyte dashboard regarding the sync process. +- **Verify Configuration**: Revisit both the source and destination configurations to ensure they match the expected settings. +- **Consult QuestDB Logs**: Check the QuestDB logs for any error messages indicating issues with data ingestion. + +## Summary + +This guide outlines the integration of **Airbyte** with **QuestDB**, enabling seamless data ingestion from various sources into QuestDB for efficient querying and analytics. Key topics covered include: + +- **Prerequisites**: Ensure both Airbyte and QuestDB are properly installed and configured. +- **Configuring Airbyte**: Set up Airbyte to connect with your desired data sources and QuestDB as the destination. +- **Data Validation**: Verify data integrity post-load and ensure the expected records are ingested. +- **Best Practices**: Optimize data loads, monitor resource utilization, and use indexing for efficient querying. + +For further details and resources, refer to the following links: + +- [Airbyte Documentation](https://docs.airbyte.com/) +- [QuestDB Web Console Guide](/docs/develop/web-console/) +- [Airbyte GitHub Repository](https://github.com/airbytehq/airbyte) diff --git a/third-party-tools/overview.md b/third-party-tools/overview.md index 28b0ae4c..4199d034 100644 --- a/third-party-tools/overview.md +++ b/third-party-tools/overview.md @@ -35,7 +35,9 @@ integrations: efficiently. - [Apache Spark](/docs/third-party-tools/spark/): Handle complex data processing tasks at scale. - +- [Airbyte](/docs/third-party-tools/airbyte/): Performs streamlined ETL across + diverse data sources. + ## Analytics and Processing Enhance your data analysis and processing capabilities with QuestDB through