You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For sequential data we currently report coherence only via auto-correlation. But we also want to measure longer temporal coherence by introducing two additional metrics / charts:
Sequences per Category: For each discretized column, we calculate the share of sequences, that contain that category at least once. This is then being displayed as a chart like the a univariate (categorical) distribution. We then normalize these values to sum up to 1, and calculate again half the L1-distance (=TVD) as metric, to be consistent with the other accuracy metrics, and to be bound to [0, 1].
Categories per Sequence: For each sequence we calculate the number of distinct (discretized) categories, divided by the total number of discretized categories. Again, we show this as a chart like a univariate (numerical) distributions. We then normalize again to sum up to 1, and calculate half the L1-distance as metric.
These charts shall be two sub-sections to the Coherence section of the report. The metrics are calculated for each attribute, and displayed as part of the chart title. The column-level coherence metric in the accuracy table is then the average across auto-correlation, sequences-per-category, and categories-per-sequence. The overall coherence metric is then still the average coherence across all columns.
For the calculation of the metric we shall consider a maximum of 100 (randomly selected) events per sequence.
The text was updated successfully, but these errors were encountered:
mplatzer
changed the title
feature request: Users per Category and Categories per User as additional Coherence charts/metrics
feature request: Sequences per Category and Categories per Sequence as additional coherence charts + metrics
Dec 2, 2024
For sequential data we currently report coherence only via auto-correlation. But we also want to measure longer temporal coherence by introducing two additional metrics / charts:
Sequences per Category: For each discretized column, we calculate the share of sequences, that contain that category at least once. This is then being displayed as a chart like the a univariate (categorical) distribution. We then normalize these values to sum up to 1, and calculate again half the L1-distance (=TVD) as metric, to be consistent with the other accuracy metrics, and to be bound to [0, 1].
Categories per Sequence: For each sequence we calculate the number of distinct (discretized) categories, divided by the total number of discretized categories. Again, we show this as a chart like a univariate (numerical) distributions. We then normalize again to sum up to 1, and calculate half the L1-distance as metric.
These charts shall be two sub-sections to the Coherence section of the report. The metrics are calculated for each attribute, and displayed as part of the chart title. The column-level coherence metric in the accuracy table is then the average across auto-correlation, sequences-per-category, and categories-per-sequence. The overall coherence metric is then still the average coherence across all columns.
For the calculation of the metric we shall consider a maximum of 100 (randomly selected) events per sequence.
The text was updated successfully, but these errors were encountered: