Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: Sequences per Category and Categories per Sequence as additional coherence charts + metrics #32

Open
mplatzer opened this issue Dec 2, 2024 · 0 comments

Comments

@mplatzer
Copy link
Contributor

mplatzer commented Dec 2, 2024

For sequential data we currently report coherence only via auto-correlation. But we also want to measure longer temporal coherence by introducing two additional metrics / charts:

  • Sequences per Category: For each discretized column, we calculate the share of sequences, that contain that category at least once. This is then being displayed as a chart like the a univariate (categorical) distribution. We then normalize these values to sum up to 1, and calculate again half the L1-distance (=TVD) as metric, to be consistent with the other accuracy metrics, and to be bound to [0, 1].

  • Categories per Sequence: For each sequence we calculate the number of distinct (discretized) categories, divided by the total number of discretized categories. Again, we show this as a chart like a univariate (numerical) distributions. We then normalize again to sum up to 1, and calculate half the L1-distance as metric.

These charts shall be two sub-sections to the Coherence section of the report. The metrics are calculated for each attribute, and displayed as part of the chart title. The column-level coherence metric in the accuracy table is then the average across auto-correlation, sequences-per-category, and categories-per-sequence. The overall coherence metric is then still the average coherence across all columns.

For the calculation of the metric we shall consider a maximum of 100 (randomly selected) events per sequence.

@mplatzer mplatzer changed the title feature request: Users per Category and Categories per User as additional Coherence charts/metrics feature request: Sequences per Category and Categories per Sequence as additional coherence charts + metrics Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant