-
Notifications
You must be signed in to change notification settings - Fork 1
Data Tables
ERAS has 3 main tables that it pulls from. ERAS also heavily relies on Timescale's feature of Continuous Aggregates (which are materialized views of our 3 main tables bucketed by time).
ERAS also uses Composite Primary Keys to define our hypertables and in the schema. Composite primary keys are utilized to ensure the chunk table partitions of our hypertables are setup correctly https://docs.timescale.com/timescaledb/latest/quick-start/ruby/#step-3-create-a-table.
Main Tables
Column Name | Type | Description | Nullable? | Composite Primary Key? |
---|---|---|---|---|
comment_id | integer | Id of the Talk comment | No | ✔️ |
event_time | timestamp |
created_at of Talk comment |
No | ✔️ |
updated_at | timestamp |
updated_at of Talk comment |
No | |
project_id | integer |
project_id of the Talk comment (sometimes null, because some Talk comments are not about a specific project) |
Yes | |
user_id | integer |
user_id of talk comment |
Yes |
Column Name | Type | Description | Nullable? | Composite Primary Key? | Default |
---|---|---|---|---|---|
classification_id | integer | Id of classification | No | ✔️ | |
event_time | timestamp |
created_at of classification |
No | ✔️ | |
classification_updated_at | timestamp |
updated_at of classification |
No | ||
started_at | timestamp |
started_at of classification. Comes from metadata of classification, set by volunteer's machine |
Yes | ||
finished_at | timestamp |
finished_at of classification. Comes from metadata of classification, set by volunteer's machine |
Yes | ||
project_id | integer |
project_id of the classification |
Yes | ||
workflow_id | integer |
workflow_id of the classification |
Yes | ||
user_id | integer |
user_id of classification |
Yes | ||
user_group_ids | array[int] | Array of user group ids that the classifying user belongs to | Yes | [] | |
session_time | float |
finished_at - started_at time in seconds |
Yes |
Currently, continuous aggregates cannot be created on classification_events
grouping by user_group_id
, since classification_event.user_group_ids
is an array. A workaround suggested by Timescale is to flatten our data if we want to group by user_group_id
. Because of this, we created a "faux join" table that links a classification to the multiple user groups a classifying user belongs to.
We only create classification_user_groups
when a classification's metadata.user_group_ids
array is not empty.
More motivations for this "faux join" mapping table can be found in Slack thread here: https://zooniverse.slack.com/archives/C010QAPB67J/p1682390755896589?thread_ts=1682096381.364729&cid=C010QAPB67J
Column Name | Type | Description | Nullable? | Composite Primary Key? | Default |
---|---|---|---|---|---|
classification_id | integer | Id of classification | Yes | ||
event_time | timestamp |
created_at of classification |
No | ||
user_group_id | integer | id of a user_group that a classification's user belongs to | No | ||
session_time | float | Comes from a classification's metadata's finished_at - started_at time in seconds |
Yes | ||
project_id | integer |
project_id of the classification |
Yes | ||
workflow_id | integer |
workflow_id of the classification |
Yes | ||
user_id | integer |
user_id of classification |
Yes |
Are you... Ready for it?
Home
User Groups Stats Visibility Levels
Data Tables
Mappings from Source Tables to ERAS
Continuous Aggregates
Decisions Around Session Time Caps