Skip to content

Commit

Permalink
docs: Document expression for aggregate args
Browse files Browse the repository at this point in the history
  • Loading branch information
lutter committed Mar 1, 2024
1 parent 46dd38d commit a15b903
Showing 1 changed file with 40 additions and 10 deletions.
50 changes: 40 additions & 10 deletions docs/aggregations.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,11 @@ _dimensions_, and fields with the `@aggregate` directive are called
_aggregates_. A timeseries type really represents many timeseries, one for
each combination of values for the dimensions.

The same timeseries can be used for multiple aggregations. For example, the
`Stats` aggregation could also be formed by aggregating over the `TokenData`
timeseries. Since `Stats` doesn't have a `token` dimension, all aggregates
will be formed across all tokens.

Each `@aggregate` by default starts at 0 for each new bucket and therefore
just aggregates over the time interval for the bucket. The `@aggregate`
directive also accepts a boolean flag `cumulative` that indicates whether
Expand All @@ -97,10 +102,6 @@ the entire timeseries up to the end of the time interval for the bucket.
aggregations, and it doesn't seem like it used in practice, we won't
initially support it. (same for variance, stddev etc.)

**TODO** The timeseries type can be simplified for some situations if
aggregations can be done over expressions, for example over `priceUSD *
amount` to track `totalVolumeUSD`

**TODO** It might be necessary to allow `@aggregate` fields that are only
used for some intervals. We could allow that with syntax like
`@aggregate(fn: .., arg: .., interval: "day")`
Expand Down Expand Up @@ -132,7 +133,10 @@ annotation. These attributes must be of a numeric type (`Int`, `Int8`,
`BigInt`, or `BigDecimal`) The annotation must have two arguments:

- `fn`: the name of an aggregation function
- `arg`: the name of an attribute in the timeseries type
- `arg`: the name of an attribute in the timeseries type, or an expression
using only constants and attributes of the timeseries type

#### Aggregation functions

The following aggregation functions are currently supported:

Expand All @@ -149,16 +153,42 @@ The `first` and `last` aggregation function calculate the first and last
value in an interval by sorting the data by `id`; `graph-node` enforces
correctness here by automatically setting the `id` for timeseries entities.

#### Aggregation expressions

The `arg` can be the name of any attribute in the timeseries type, or an
expression using only constants and attributes of the timeseries type such
as `price * amount` or `greatest(amount0, amount1)`. Expressions use SQL
syntax and support a subset of builtin SQL functions, operators, and other
constructs.

Supported operators are `+`, `-`, `*`, `/`, `%`, `^`, `=`, `!=`, `<`, `<=`,
`>`, `>=`, `<->`, `and`, `or`, and `not`. In addition the operators `is
[not] {null|true|false}`, and `is [not] distinct from` are supported.

The supported SQL functions are the [math
functions](https://www.postgresql.org/docs/current/functions-math.html)
`abs`, `ceil`, `ceiling`, `div`, `floor`, `gcd`, `lcm`, `mod`, `power`,
`sign`, and the [conditional
functions](https://www.postgresql.org/docs/current/functions-conditional.html)
`coalesce`, `nullif`, `greatest`, and `least`.

The
[statement](https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-CASE)
`case when .. else .. end` is also supported.

Some examples of valid expressions, assuming the underlying timeseries
contains the mentioned fields:

- Aggregate the value of a token: `@aggregate(fn: "sum", arg: "priceUSD * amount")`
- Aggregate the maximum positive amount of two different amounts:
`@aggregate(fn: "max", arg: "greatest(amount0, amount1, 0)")`
- Conditionally sum an amount: `@aggregate(fn: "sum", arg: "case when amount0 > amount1 then amount0 else 0 end")`

## Querying

_This section is not implemented yet, and will require a bit more thought
about details_

**TODO** As written, timeseries points like `TokenData` can be queried like
any other entity. It would be nice to restrict how these data points can be
queried, maybe even forbid it, as that would give us more latitude in how we
store that data.

We create a toplevel query field for each aggregation. That query field
accepts the following arguments:

Expand Down

0 comments on commit a15b903

Please sign in to comment.