From 0a7d6bcd8ffcd23347626a8291cd4fa77dbccb8e Mon Sep 17 00:00:00 2001 From: goodroot <9484709+goodroot@users.noreply.github.com> Date: Thu, 17 Oct 2024 09:06:18 -0700 Subject: [PATCH] Add regr_slope and table_storage (#65) --- introduction.md | 32 +++--- reference/function/finance.md | 185 +++++++++++++++++++++++++++++++--- reference/function/meta.md | 93 +++++++++++++++-- 3 files changed, 271 insertions(+), 39 deletions(-) diff --git a/introduction.md b/introduction.md index 4e1bd846..bee6f5f3 100644 --- a/introduction.md +++ b/introduction.md @@ -75,23 +75,14 @@ Greatest hits include: ## Benefits of QuestDB {#benefits} -Time series data is seen increasingly in use cases across: +To avoid ingestion bottlenecks, high performance data ingestion is essential. -- finance -- internet of things (IoT) -- e-commerce -- security -- blockchain -- many other emerging technical industries +But performance is only part of the story. -As more time bound data is generated, high performance data reception is -essential to avoid ingestion bottlenecks. +Efficiency measures how well a database performs relative to its available +resources. -The right data store greatly simplifies code costly infrastructure sprawl and -spend. - -But to be _the right one_, the storage engine must be both high performance and -efficient: +QuestDB, on maximal hardware, significantly outperforms peers: +However, on less robust hardware the difference is even more pronounced, as seen +in the following benchmark. + +Even on hardware as light as a Raspberry Pi 5, QuestDB outperforms competitors +on stronger hardware: + + + Beyond performance and efficiency, with a specialized [time-series database](/glossary/time-series-database/), you don't need to worry about: diff --git a/reference/function/finance.md b/reference/function/finance.md index 0470248e..49ee16dd 100644 --- a/reference/function/finance.md +++ b/reference/function/finance.md @@ -12,8 +12,10 @@ Trade price calculation. `l2price(target_quantity, quantity_1, price_1, quantity_2, price_2, ..., quantity_n, price_n)` -Consider `quantity_1`, `price_1`, `quantity_2`, `price_2`, ..., `quantity_n`, `price_n` to be either -side of an order book with `n` price levels. Then, the return value of the function is the average trade price of a market order executed with the size of `target_quantity` against the book. +Consider `quantity_1`, `price_1`, `quantity_2`, `price_2`, ..., `quantity_n`, +`price_n` to be either side of an order book with `n` price levels. Then, the +return value of the function is the average trade price of a market order +executed with the size of `target_quantity` against the book. Let's take the below order book as an example. @@ -25,8 +27,8 @@ Let's take the below order book as an example. | 21 | 13.70 | 15.10 | 12 | | 18 | 13.40 | | | -A _buy market order_ with the size of 50 would wipe out the first two price levels of -the _Ask_ side of the book, and would also trade on the third level. +A _buy market order_ with the size of 50 would wipe out the first two price +levels of the _Ask_ side of the book, and would also trade on the third level. The full price of the trade: @@ -40,8 +42,8 @@ $$ \$732.60 / 50 = \$14.652 $$ -This average trade price is the output of the function when executed with the parameters taken from -the above example: +This average trade price is the output of the function when executed with the +parameters taken from the above example: ```questdb-sql select l2price(50, 14, 14.50, 16, 14.60, 23, 14.80, 12, 15.10); @@ -53,21 +55,23 @@ select l2price(50, 14, 14.50, 16, 14.60, 23, 14.80, 12, 15.10); ### Parameters -The function takes a `target quantity`, and a variable number of `quantity`/`price` pairs. Each -represents a price level of the order book. +The function takes a `target quantity`, and a variable number of +`quantity`/`price` pairs. Each represents a price level of the order book. -Each parameter is expected to be a double, or convertible to double (float, long, int, short, byte). +Each parameter is expected to be a double, or convertible to double (float, +long, int, short, byte). - `target_quantity`: The size of a hypothetical market order to be filled. -- `quantity*`: The number of instruments available at the corresponding price levels. +- `quantity*`: The number of instruments available at the corresponding price + levels. - `price*`: Price levels of the order book. ### Return value The function returns with a `double`, representing the average trade price. -Returns null if the price is not calculable. For example, if the target quantity cannot be filled, -or there is incomplete data in the set (nulls). +Returns null if the price is not calculable. For example, if the target quantity +cannot be filled, or there is incomplete data in the set (nulls). ### Examples @@ -148,10 +152,160 @@ SELECT mid(1.5760, 1.5763) | :------ | | 1.57615 | +## regr_slope + +`regr_slope(y, x)` - Calculates the slope of the linear regression line for the +given numeric columns y (dependent variable) and x (independent variable). + +- The function requires at least two valid (x, y) pairs to compute the slope. + - If fewer than two pairs are available, the function returns null. +- Supported data types for x and y include `double`, `float`, and `integer` + types. +- The regr_slope function can be used with other statistical aggregation + functions like `corr` or `covar_samp`. +- The order of arguments in `regr_slope(y, x)` matters. + - Ensure that y is the dependent variable and x is the independent variable. + +### Calculation + +The slope $b_1$ of the regression line $y = b_0 + b_1 x$ is calculated using the +formula: + +$$ +b_1 = \frac{N \sum (xy) - \sum x \sum y}{N \sum (x^2) - (\sum x)^2} +$$ + +Where: + +- $N$ is the number of valid data points. +- $\sum (xy)$ is the sum of the products of $x$ and $y$. +- $\sum x$ and $\sum y$ is the sums of $x$ and $y$ values, respectively. +- $\sum (x^2)$ is the sum of the squares of $x$ values. + +### Arguments + +- y: A numeric column representing the dependent variable. +- x: A numeric column representing the independent variable. + +### Return value + +Return value type is `double`. + +The function returns the slope of the regression line, indicating how much y +changes for a unit change in x. + +### Examples + +#### Calculate the regression slope between two variables + +Suppose you have a table measurements with the following data: + +| x | y | +| --- | --- | +| 1.0 | 2.0 | +| 2.0 | 3.0 | +| 3.0 | 5.0 | +| 4.0 | 4.0 | +| 5.0 | 6.0 | + +You can calculate the slope of the regression line between y and x using: + +```questdb-sql +SELECT regr_slope(y, x) AS slope FROM measurements; +``` + +Result: + +| slope | +| ----- | +| 0.8 | + +Or: The slope of 0.8 indicates that for each unit increase in x, y increases by +0.8 units on average. + +#### Calculate the regression slope grouped by a category + +Consider a table sales_data: + +| category | advertising_spend | sales | +| -------- | ----------------- | ----- | +| A | 1000 | 15000 | +| A | 2000 | 22000 | +| A | 3000 | 28000 | +| B | 1500 | 18000 | +| B | 2500 | 26000 | +| B | 3500 | 31000 | + +Calculate the regression slope of `sales` versus `advertising_spend` for each +category: + +```questdb-sql +SELECT category, regr_slope(sales, advertising_spend) AS slope FROM sales_data +GROUP BY category; +``` + +Result: + +| category | slope | +| -------- | ----- | +| A | 8.5 | +| B | 7.0 | + +Or: + +- In category A, for every additional unit spent on advertising, sales increase + by 8.5 units on average. +- In category B, the increase is 7.0 units per advertising unit spent. + +#### Handling null values + +If your data contains null values, `regr_slope()` will ignore those rows: + +```questdb +SELECT regr_slope(y, x) AS slope FROM ( SELECT 1 AS x, 2 AS y UNION ALL SELECT +2, NULL UNION ALL SELECT NULL, 4 UNION ALL SELECT 4, 5 ); +``` + +Result: + +| slope | +| ----- | +| 0.8 | + +Only the rows where both x and y are not null are considered in the calculation. + +#### Calculating beta + +Assuming you have a table `stock_returns` with daily returns for a specific +stock and the market index: + +| date | stock_return | market_return | +| ---------- | ------------ | ------------- | +| 2023-01-01 | 0.5 | 0.4 | +| 2023-01-02 | -0.2 | -0.1 | +| 2023-01-03 | 0.3 | 0.2 | +| ... | ... | ... | + +Calculate the stock's beta coefficient: + +```questdb-sql +SELECT regr_slope(stock_return, market_return) AS beta FROM stock_returns; +``` + +| beta | +| ---- | +| 1.2 | + +Or: A beta of 1.2 suggests the stock is 20% more volatile than the market. + +Remember: The order of arguments in `regr_slope(y, x)` matters. + +Ensure that y is the dependent variable and x is the independent variable. + ## spread_bps -`spread_bps(bid, ask)` - calculates the quoted bid-ask spread, based on the highest bidding price, -and the lowest asking price. +`spread_bps(bid, ask)` - calculates the quoted bid-ask spread, based on the +highest bidding price, and the lowest asking price. The result is provided in basis points, and the calculation is: @@ -185,7 +339,8 @@ SELECT spread_bps(1.5760, 1.5763) ## vwap `vwap(price, quantity)` - Calculates the volume-weighted average price (VWAP) -based on the given price and quantity columns. This is defined by the following formula: +based on the given price and quantity columns. This is defined by the following +formula: $$ \text{vwap} = diff --git a/reference/function/meta.md b/reference/function/meta.md index 6840722c..3e40198c 100644 --- a/reference/function/meta.md +++ b/reference/function/meta.md @@ -237,6 +237,81 @@ tables() WHERE partitionBy = 'DAY' | --- | -------- | ------------------- | ----------- | ------------------ | ---------- | ------------- | ----- | | 1 | my_table | ts | DAY | 500000 | true | my_table | false | +## table_storage + +`table_storage()` - Returns a table containing information about the storage and +structure of all user tables in the database. + +Provides detailed storage information about all user tables within QuestDB. It +returns one row per table, including information about partitioning, row counts, +and disk usage. + +- The `table_storage()` function excludes system tables; it only lists + user-created tables. +- The `diskSize` value represents the total size of all files associated with + the table on disk, including data, index, and metadata files. +- The `partitionBy` column indicates the partitioning strategy used for the + table. It can be `NONE` if the table is not partitioned. + +**Return values:** + +The function returns the following columns: + +- `tableName` (`string`): The name of the table. +- `walEnabled` (`boolean`): Indicates whether Write-Ahead Logging (WAL) is + enabled for the table. +- `partitionBy` (`string`): The partitioning type of the table (e.g., NONE, DAY, + MONTH, YEAR, etc.). +- `partitionCount` (`long`): The number of partitions the table has. +- `rowCount` (`long`): The total number of rows in the table. +- `diskSize` (`long`): The total disk space used by the table, in bytes. + +**Examples:** + +Retrieve storage information for all tables. + +```questdb-sql title="Checking our demo tables" demo +SELECT * FROM table_storage(); +``` + +- The query retrieves storage details for all tables in the database. +- The `diskSize` column shows the total disk space used by each table in bytes. + +| tableName | walEnabled | partitionBy | partitionCount | rowCount | diskSize | +| -------------- | ---------- | ----------- | -------------- | ---------- | ------------ | +| trips | true | MONTH | 126 | 1634599313 | 261536158948 | +| AAPL_orderbook | true | HOUR | 16 | 3024878 | 2149403527 | +| weather | false | NONE | 1 | 137627 | 9972598 | +| trades | true | DAY | 954 | 1000848308 | 32764798760 | +| ethblocks_json | true | DAY | 3328 | 20688364 | 28311960478 | + +
+ +Filter tables with WAL enabled. + +```questdb-sql title="WAL only tables" demo +SELECT tableName, rowCount, diskSize +FROM table_storage() +WHERE walEnabled = true; +``` + +| tableName | rowCount | diskSize | +| -------------- | ---------- | ------------ | +| trips | 1634599313 | 261536158948 | +| AAPL_orderbook | 3024878 | 2149403527 | +| trades | 1000850255 | 32764804264 | +| ethblocks_json | 20688364 | 28311960478 | + +
+ +Show tables partitioned by `HOUR`. + +```questdb-sql title="Show tables partitioned by hour" demo +SELECT tableName, partitionCount, rowCount +FROM table_storage() +WHERE partitionBy = 'HOUR'; +``` + ## wal_tables `wal_tables()` returns the WAL status for all @@ -305,7 +380,7 @@ table_columns('my_table'); ``` | column | type | indexed | indexBlockCapacity | symbolCached | symbolCapacity | designated | upsertKey | -|--------|-----------|---------|--------------------|--------------|----------------|------------|-----------| +| ------ | --------- | ------- | ------------------ | ------------ | -------------- | ---------- | --------- | | symb | SYMBOL | true | 1048576 | false | 256 | false | false | | price | DOUBLE | false | 0 | false | 0 | false | false | | ts | TIMESTAMP | false | 0 | false | 0 | true | false | @@ -324,7 +399,7 @@ SELECT type, count() FROM table_columns('my_table'); ``` | type | count | -|-----------|-------| +| --------- | ----- | | SYMBOL | 1 | | DOUBLE | 1 | | TIMESTAMP | 1 | @@ -438,15 +513,15 @@ SELECT pg_catalog.version(); | ------------------------------------------------------------------- | | PostgreSQL 12.3, compiled by Visual C++ build 1914, 64-bit, QuestDB | - ## hydrate_table_metadata('table1', 'table2' ...) -`hydrate_table_metadata' re-reads table metadata from disk to update the static metadata cache. +`hydrate_table_metadata' re-reads table metadata from disk to update the static +metadata cache. :::warning -This function should only be used when directed by QuestDB support. Misuse could cause corruption of the metadata -cache, requiring the database to be restarted. +This function should only be used when directed by QuestDB support. Misuse could +cause corruption of the metadata cache, requiring the database to be restarted. ::: @@ -454,13 +529,12 @@ cache, requiring the database to be restarted. A variable list of strings, corresponding to table names. -Alternatively, a single asterisk, '*', representing all tables. +Alternatively, a single asterisk, '\*', representing all tables. **Return value:** Returns `boolean`. `true` if successful, `false` if unsuccessful. - **Examples:** Simply pass table names as arguments to the function. @@ -470,10 +544,9 @@ SELECT hydrate_table_metadata('trades', 'trips') ``` | hydrate_table_metadata | -|------------------------| +| ---------------------- | | true | - If you want to re-read metadata for all user tables, simply use an asterisk: ```