From 0a7d6bcd8ffcd23347626a8291cd4fa77dbccb8e Mon Sep 17 00:00:00 2001
From: goodroot <9484709+goodroot@users.noreply.github.com>
Date: Thu, 17 Oct 2024 09:06:18 -0700
Subject: [PATCH] Add regr_slope and table_storage (#65)
---
introduction.md | 32 +++---
reference/function/finance.md | 185 +++++++++++++++++++++++++++++++---
reference/function/meta.md | 93 +++++++++++++++--
3 files changed, 271 insertions(+), 39 deletions(-)
diff --git a/introduction.md b/introduction.md
index 4e1bd846..bee6f5f3 100644
--- a/introduction.md
+++ b/introduction.md
@@ -75,23 +75,14 @@ Greatest hits include:
## Benefits of QuestDB {#benefits}
-Time series data is seen increasingly in use cases across:
+To avoid ingestion bottlenecks, high performance data ingestion is essential.
-- finance
-- internet of things (IoT)
-- e-commerce
-- security
-- blockchain
-- many other emerging technical industries
+But performance is only part of the story.
-As more time bound data is generated, high performance data reception is
-essential to avoid ingestion bottlenecks.
+Efficiency measures how well a database performs relative to its available
+resources.
-The right data store greatly simplifies code costly infrastructure sprawl and
-spend.
-
-But to be _the right one_, the storage engine must be both high performance and
-efficient:
+QuestDB, on maximal hardware, significantly outperforms peers:
+However, on less robust hardware the difference is even more pronounced, as seen
+in the following benchmark.
+
+Even on hardware as light as a Raspberry Pi 5, QuestDB outperforms competitors
+on stronger hardware:
+
+
+
Beyond performance and efficiency, with a specialized
[time-series database](/glossary/time-series-database/), you don't need to worry
about:
diff --git a/reference/function/finance.md b/reference/function/finance.md
index 0470248e..49ee16dd 100644
--- a/reference/function/finance.md
+++ b/reference/function/finance.md
@@ -12,8 +12,10 @@ Trade price calculation.
`l2price(target_quantity, quantity_1, price_1, quantity_2, price_2, ..., quantity_n, price_n)`
-Consider `quantity_1`, `price_1`, `quantity_2`, `price_2`, ..., `quantity_n`, `price_n` to be either
-side of an order book with `n` price levels. Then, the return value of the function is the average trade price of a market order executed with the size of `target_quantity` against the book.
+Consider `quantity_1`, `price_1`, `quantity_2`, `price_2`, ..., `quantity_n`,
+`price_n` to be either side of an order book with `n` price levels. Then, the
+return value of the function is the average trade price of a market order
+executed with the size of `target_quantity` against the book.
Let's take the below order book as an example.
@@ -25,8 +27,8 @@ Let's take the below order book as an example.
| 21 | 13.70 | 15.10 | 12 |
| 18 | 13.40 | | |
-A _buy market order_ with the size of 50 would wipe out the first two price levels of
-the _Ask_ side of the book, and would also trade on the third level.
+A _buy market order_ with the size of 50 would wipe out the first two price
+levels of the _Ask_ side of the book, and would also trade on the third level.
The full price of the trade:
@@ -40,8 +42,8 @@ $$
\$732.60 / 50 = \$14.652
$$
-This average trade price is the output of the function when executed with the parameters taken from
-the above example:
+This average trade price is the output of the function when executed with the
+parameters taken from the above example:
```questdb-sql
select l2price(50, 14, 14.50, 16, 14.60, 23, 14.80, 12, 15.10);
@@ -53,21 +55,23 @@ select l2price(50, 14, 14.50, 16, 14.60, 23, 14.80, 12, 15.10);
### Parameters
-The function takes a `target quantity`, and a variable number of `quantity`/`price` pairs. Each
-represents a price level of the order book.
+The function takes a `target quantity`, and a variable number of
+`quantity`/`price` pairs. Each represents a price level of the order book.
-Each parameter is expected to be a double, or convertible to double (float, long, int, short, byte).
+Each parameter is expected to be a double, or convertible to double (float,
+long, int, short, byte).
- `target_quantity`: The size of a hypothetical market order to be filled.
-- `quantity*`: The number of instruments available at the corresponding price levels.
+- `quantity*`: The number of instruments available at the corresponding price
+ levels.
- `price*`: Price levels of the order book.
### Return value
The function returns with a `double`, representing the average trade price.
-Returns null if the price is not calculable. For example, if the target quantity cannot be filled,
-or there is incomplete data in the set (nulls).
+Returns null if the price is not calculable. For example, if the target quantity
+cannot be filled, or there is incomplete data in the set (nulls).
### Examples
@@ -148,10 +152,160 @@ SELECT mid(1.5760, 1.5763)
| :------ |
| 1.57615 |
+## regr_slope
+
+`regr_slope(y, x)` - Calculates the slope of the linear regression line for the
+given numeric columns y (dependent variable) and x (independent variable).
+
+- The function requires at least two valid (x, y) pairs to compute the slope.
+ - If fewer than two pairs are available, the function returns null.
+- Supported data types for x and y include `double`, `float`, and `integer`
+ types.
+- The regr_slope function can be used with other statistical aggregation
+ functions like `corr` or `covar_samp`.
+- The order of arguments in `regr_slope(y, x)` matters.
+ - Ensure that y is the dependent variable and x is the independent variable.
+
+### Calculation
+
+The slope $b_1$ of the regression line $y = b_0 + b_1 x$ is calculated using the
+formula:
+
+$$
+b_1 = \frac{N \sum (xy) - \sum x \sum y}{N \sum (x^2) - (\sum x)^2}
+$$
+
+Where:
+
+- $N$ is the number of valid data points.
+- $\sum (xy)$ is the sum of the products of $x$ and $y$.
+- $\sum x$ and $\sum y$ is the sums of $x$ and $y$ values, respectively.
+- $\sum (x^2)$ is the sum of the squares of $x$ values.
+
+### Arguments
+
+- y: A numeric column representing the dependent variable.
+- x: A numeric column representing the independent variable.
+
+### Return value
+
+Return value type is `double`.
+
+The function returns the slope of the regression line, indicating how much y
+changes for a unit change in x.
+
+### Examples
+
+#### Calculate the regression slope between two variables
+
+Suppose you have a table measurements with the following data:
+
+| x | y |
+| --- | --- |
+| 1.0 | 2.0 |
+| 2.0 | 3.0 |
+| 3.0 | 5.0 |
+| 4.0 | 4.0 |
+| 5.0 | 6.0 |
+
+You can calculate the slope of the regression line between y and x using:
+
+```questdb-sql
+SELECT regr_slope(y, x) AS slope FROM measurements;
+```
+
+Result:
+
+| slope |
+| ----- |
+| 0.8 |
+
+Or: The slope of 0.8 indicates that for each unit increase in x, y increases by
+0.8 units on average.
+
+#### Calculate the regression slope grouped by a category
+
+Consider a table sales_data:
+
+| category | advertising_spend | sales |
+| -------- | ----------------- | ----- |
+| A | 1000 | 15000 |
+| A | 2000 | 22000 |
+| A | 3000 | 28000 |
+| B | 1500 | 18000 |
+| B | 2500 | 26000 |
+| B | 3500 | 31000 |
+
+Calculate the regression slope of `sales` versus `advertising_spend` for each
+category:
+
+```questdb-sql
+SELECT category, regr_slope(sales, advertising_spend) AS slope FROM sales_data
+GROUP BY category;
+```
+
+Result:
+
+| category | slope |
+| -------- | ----- |
+| A | 8.5 |
+| B | 7.0 |
+
+Or:
+
+- In category A, for every additional unit spent on advertising, sales increase
+ by 8.5 units on average.
+- In category B, the increase is 7.0 units per advertising unit spent.
+
+#### Handling null values
+
+If your data contains null values, `regr_slope()` will ignore those rows:
+
+```questdb
+SELECT regr_slope(y, x) AS slope FROM ( SELECT 1 AS x, 2 AS y UNION ALL SELECT
+2, NULL UNION ALL SELECT NULL, 4 UNION ALL SELECT 4, 5 );
+```
+
+Result:
+
+| slope |
+| ----- |
+| 0.8 |
+
+Only the rows where both x and y are not null are considered in the calculation.
+
+#### Calculating beta
+
+Assuming you have a table `stock_returns` with daily returns for a specific
+stock and the market index:
+
+| date | stock_return | market_return |
+| ---------- | ------------ | ------------- |
+| 2023-01-01 | 0.5 | 0.4 |
+| 2023-01-02 | -0.2 | -0.1 |
+| 2023-01-03 | 0.3 | 0.2 |
+| ... | ... | ... |
+
+Calculate the stock's beta coefficient:
+
+```questdb-sql
+SELECT regr_slope(stock_return, market_return) AS beta FROM stock_returns;
+```
+
+| beta |
+| ---- |
+| 1.2 |
+
+Or: A beta of 1.2 suggests the stock is 20% more volatile than the market.
+
+Remember: The order of arguments in `regr_slope(y, x)` matters.
+
+Ensure that y is the dependent variable and x is the independent variable.
+
## spread_bps
-`spread_bps(bid, ask)` - calculates the quoted bid-ask spread, based on the highest bidding price,
-and the lowest asking price.
+`spread_bps(bid, ask)` - calculates the quoted bid-ask spread, based on the
+highest bidding price, and the lowest asking price.
The result is provided in basis points, and the calculation is:
@@ -185,7 +339,8 @@ SELECT spread_bps(1.5760, 1.5763)
## vwap
`vwap(price, quantity)` - Calculates the volume-weighted average price (VWAP)
-based on the given price and quantity columns. This is defined by the following formula:
+based on the given price and quantity columns. This is defined by the following
+formula:
$$
\text{vwap} =
diff --git a/reference/function/meta.md b/reference/function/meta.md
index 6840722c..3e40198c 100644
--- a/reference/function/meta.md
+++ b/reference/function/meta.md
@@ -237,6 +237,81 @@ tables() WHERE partitionBy = 'DAY'
| --- | -------- | ------------------- | ----------- | ------------------ | ---------- | ------------- | ----- |
| 1 | my_table | ts | DAY | 500000 | true | my_table | false |
+## table_storage
+
+`table_storage()` - Returns a table containing information about the storage and
+structure of all user tables in the database.
+
+Provides detailed storage information about all user tables within QuestDB. It
+returns one row per table, including information about partitioning, row counts,
+and disk usage.
+
+- The `table_storage()` function excludes system tables; it only lists
+ user-created tables.
+- The `diskSize` value represents the total size of all files associated with
+ the table on disk, including data, index, and metadata files.
+- The `partitionBy` column indicates the partitioning strategy used for the
+ table. It can be `NONE` if the table is not partitioned.
+
+**Return values:**
+
+The function returns the following columns:
+
+- `tableName` (`string`): The name of the table.
+- `walEnabled` (`boolean`): Indicates whether Write-Ahead Logging (WAL) is
+ enabled for the table.
+- `partitionBy` (`string`): The partitioning type of the table (e.g., NONE, DAY,
+ MONTH, YEAR, etc.).
+- `partitionCount` (`long`): The number of partitions the table has.
+- `rowCount` (`long`): The total number of rows in the table.
+- `diskSize` (`long`): The total disk space used by the table, in bytes.
+
+**Examples:**
+
+Retrieve storage information for all tables.
+
+```questdb-sql title="Checking our demo tables" demo
+SELECT * FROM table_storage();
+```
+
+- The query retrieves storage details for all tables in the database.
+- The `diskSize` column shows the total disk space used by each table in bytes.
+
+| tableName | walEnabled | partitionBy | partitionCount | rowCount | diskSize |
+| -------------- | ---------- | ----------- | -------------- | ---------- | ------------ |
+| trips | true | MONTH | 126 | 1634599313 | 261536158948 |
+| AAPL_orderbook | true | HOUR | 16 | 3024878 | 2149403527 |
+| weather | false | NONE | 1 | 137627 | 9972598 |
+| trades | true | DAY | 954 | 1000848308 | 32764798760 |
+| ethblocks_json | true | DAY | 3328 | 20688364 | 28311960478 |
+
+
+
+Filter tables with WAL enabled.
+
+```questdb-sql title="WAL only tables" demo
+SELECT tableName, rowCount, diskSize
+FROM table_storage()
+WHERE walEnabled = true;
+```
+
+| tableName | rowCount | diskSize |
+| -------------- | ---------- | ------------ |
+| trips | 1634599313 | 261536158948 |
+| AAPL_orderbook | 3024878 | 2149403527 |
+| trades | 1000850255 | 32764804264 |
+| ethblocks_json | 20688364 | 28311960478 |
+
+
+
+Show tables partitioned by `HOUR`.
+
+```questdb-sql title="Show tables partitioned by hour" demo
+SELECT tableName, partitionCount, rowCount
+FROM table_storage()
+WHERE partitionBy = 'HOUR';
+```
+
## wal_tables
`wal_tables()` returns the WAL status for all
@@ -305,7 +380,7 @@ table_columns('my_table');
```
| column | type | indexed | indexBlockCapacity | symbolCached | symbolCapacity | designated | upsertKey |
-|--------|-----------|---------|--------------------|--------------|----------------|------------|-----------|
+| ------ | --------- | ------- | ------------------ | ------------ | -------------- | ---------- | --------- |
| symb | SYMBOL | true | 1048576 | false | 256 | false | false |
| price | DOUBLE | false | 0 | false | 0 | false | false |
| ts | TIMESTAMP | false | 0 | false | 0 | true | false |
@@ -324,7 +399,7 @@ SELECT type, count() FROM table_columns('my_table');
```
| type | count |
-|-----------|-------|
+| --------- | ----- |
| SYMBOL | 1 |
| DOUBLE | 1 |
| TIMESTAMP | 1 |
@@ -438,15 +513,15 @@ SELECT pg_catalog.version();
| ------------------------------------------------------------------- |
| PostgreSQL 12.3, compiled by Visual C++ build 1914, 64-bit, QuestDB |
-
## hydrate_table_metadata('table1', 'table2' ...)
-`hydrate_table_metadata' re-reads table metadata from disk to update the static metadata cache.
+`hydrate_table_metadata' re-reads table metadata from disk to update the static
+metadata cache.
:::warning
-This function should only be used when directed by QuestDB support. Misuse could cause corruption of the metadata
-cache, requiring the database to be restarted.
+This function should only be used when directed by QuestDB support. Misuse could
+cause corruption of the metadata cache, requiring the database to be restarted.
:::
@@ -454,13 +529,12 @@ cache, requiring the database to be restarted.
A variable list of strings, corresponding to table names.
-Alternatively, a single asterisk, '*', representing all tables.
+Alternatively, a single asterisk, '\*', representing all tables.
**Return value:**
Returns `boolean`. `true` if successful, `false` if unsuccessful.
-
**Examples:**
Simply pass table names as arguments to the function.
@@ -470,10 +544,9 @@ SELECT hydrate_table_metadata('trades', 'trips')
```
| hydrate_table_metadata |
-|------------------------|
+| ---------------------- |
| true |
-
If you want to re-read metadata for all user tables, simply use an asterisk:
```