Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes for SAMPLE BY FROM-TO #27

Merged
merged 9 commits into from
Jul 25, 2024
71 changes: 70 additions & 1 deletion reference/sql/sample-by.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ use of the [FILL](#fill-options) keyword to specify a fill behavior.

### SAMPLE BY keywords

todo: railroad update
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be blocked here. A fix will take some thinking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can remove this comment and merge without changing the railroad?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nwoolmer Agreed.


![Flow chart showing the syntax of the SAMPLE BY keywords](/img/docs/diagrams/sampleBy.svg)

### FILL keywords
Expand Down Expand Up @@ -56,6 +58,71 @@ trades per hour:
SELECT ts, count() FROM trades SAMPLE BY 1h
```


## FROM-TO

todo: blog link

:::note

This syntax extension was added in QuestDB 8.1.0, and therefore is unavailable for versions prior to this.
nwoolmer marked this conversation as resolved.
Show resolved Hide resolved

Please see the new blog for more information.

:::

When using `SAMPLE BY` with `FILL`, missing rows within the result set can be filled with pre-determined values.
nwoolmer marked this conversation as resolved.
Show resolved Hide resolved

However, this will only fill rows between existing data in the data set, and cannot be used to fill
nwoolmer marked this conversation as resolved.
Show resolved Hide resolved
rows outside of this range.

To fill outside the bounds of the existing data, you can specify a fill range using a `FROM-TO` clause.

#### Syntax

The resulting shape of the query is specified using `FROM` and `TO`:
nwoolmer marked this conversation as resolved.
Show resolved Hide resolved

```questdb-sql title='Pre-filling trip data' demo
SELECT pickup_datetime as t, count
FROM trips
SAMPLE BY 1d FROM '2008-12-28' TO '2009-01-05' FILL(NULL)
```

Since there are no rows prior to 2009, these rows are filled automatically.
nwoolmer marked this conversation as resolved.
Show resolved Hide resolved

This is distinct from the `WHERE` clause with a simple rule of thumb -
`WHERE` controls what data flows in, `FROM-TO` controls what data flows out.

Both `FROM` and `TO` can be used in isolation to solely pre-fill or post-fill data. If `FROM` is not provided,
then the lower bound is the start of the dataset, aligned to calendar, and vice versa when `TO` is omitted.
nwoolmer marked this conversation as resolved.
Show resolved Hide resolved

#### `WHERE` clause optimisation

If the user does not provide a `WHERE` clause, or the `WHERE` clause does not consider the designated timestamp,
QuestDB will add one for you, matching the `FROM-TO` interval.

This means that the query will run optimally, and avoid touching data not relevant to the result.

Therefore, the prior query will be compiled into something similar to this:
nwoolmer marked this conversation as resolved.
Show resolved Hide resolved

```questdb-sql title='Pre-filling trip data with WHERE optimisation' demo
SELECT pickup_datetime as t, count
FROM trips
WHERE pickup_datetime >= '2008-12-28'
AND pickup_datetime < '2009-01-05'
SAMPLE BY 1d FROM '2008-12-28' TO '2009-01-05' FILL(NULL)
```

#### Limitations

Here are the current limits to this feature.

- This syntax is not compatible with `FILL(PREV)` or `FILL(LINEAR)`.
- This syntax is for `ALIGN TO CALENDAR` only (default alignment).
- Any specified `OFFSET` will not be considered.
nwoolmer marked this conversation as resolved.
Show resolved Hide resolved
- This syntax is for non-keyed `SAMPLE BY` i.e. only designated timestamp and aggregate columns.


## Fill options

The `FILL` keyword is optional and expects one or more `fillOption` strategies
Expand Down Expand Up @@ -192,7 +259,7 @@ below.

:::note

Since QuestDB v7.4.0, the default behaviour for `ALIGN TO` has been changed. If you do not specify
Since QuestDB v7.4.0, the default behaviour for `ALIGN TO` has changed. If you do not specify
an explicit alignment, `SAMPLE BY` expressions will use `ALIGN TO CALENDAR` behaviour.

The prior default behaviour can be retained by specifying `ALIGN TO FIRST OBSERVATION` on a `SAMPLE BY` query.
Expand Down Expand Up @@ -414,6 +481,7 @@ The sample then begins from `Europe/London` at `2021-10-31T02:00:00.000000Z`:
| 2021-10-31T04:00:00.000000Z | 3 |
| 2021-10-31T05:00:00.000000Z | 2 |


## Examples

Assume the following table `trades`:
Expand Down Expand Up @@ -476,6 +544,7 @@ SELECT ts, avg(quantity*price) FROM trades SAMPLE BY 1d ALIGN TO CALENDAR;
| 2021-05-31T00:00:00.000000Z | 1000.5 |
| 2021-06-01T00:00:00.000000Z | 8007.2 |


## See also

This section includes links to additional information such as tutorials:
Expand Down
Loading