Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1085 Ecocounter Open Data Schema+DAG+QC plots #1096

Open
wants to merge 80 commits into
base: master
Choose a base branch
from

Conversation

gabrielwol
Copy link
Collaborator

@gabrielwol gabrielwol commented Nov 8, 2024

What this pull request accomplishes:

Does not need significant review:

  • R plots

  • Misc.

    • Add counter to sites table (commonly used ID field in Ecocounter communications/dashboard)
      • updated ecocounter_pull.py, pull_data_from_api.py.
    • Add direction_main field to flows: easily group dominant/contraflow together
    • Fixed anomalous_range join conditions (Use only one of site_id and flow_id)
      • counts, counts_calibrated

Issue(s) this solves:

What, in particular, needs to reviewed:

What needs to be done by a sysadmin after this PR is merged

  • Delete counts_old (saved the old version of counts for monitoring project dependency)
  • Delete discontinuities table

@gabrielwol gabrielwol self-assigned this Nov 8, 2024
@gabrielwol gabrielwol linked an issue Nov 8, 2024 that may be closed by this pull request
@@ -106,46 +96,93 @@ def reminder_message(ds = None, **context):
wait_till_10th.doc_md = """
Wait until the 10th day of the month to export data. Alternatively mark task as success to proceed immediately.
"""

@task()
def get_years(ds=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we add this for mapped task naming? It was using ds before right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the insert_and_download_data task_group is mapped over the output of get_years. The purpose is to run the exports for last two months (which may be two separate years), since new data frequently arrives within ~30 days.
image

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to remove sensitivity_changes since it's superseded by sensitivity_history.
Not sure why flow_id = 101042942 has a stray date_range = (,) .
The readme should mention:

  • sensitivity_history is manually updated based on emails with with Eco-counter Technical Support Specialist (Derek Yates as of 2024).
  • what the numbered settings mean (least to most selective, or not standardized)
  • Eco-counter must confirm what the current and new settings are whenever a sensitivity is adjusted, along with the date the change is applied.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to remove sensitivity_changes since it's superseded by sensitivity_history.

Not sure why flow_id = 101042942 has a stray date_range = (,) .

That flow_id never had any data. Deleted the entry.

The readme should mention...

Adding!

Copy link

@A-DUYVESTYN A-DUYVESTYN Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to separate scooter flows calibration factors from bike flows? Currently we apply the same "total volume" factor to both, but we will have separate validation factors for these veh types soon. (Not a blocker for this release)
Readme should state:

  • ecocounter.calibration_factors excludes factors in ecocounter.manual_counts_info with flagged as do_not_use= true . do_not_use is entered manually on a case-by-case basis depending on which calibration study is the 'best'.
  • Caution is required when assigning do_not_use to calibration studies. Avoid as much as possible applying a new factor to data that has already been calibrated and published to Open Data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to separate scooter calibration factors would be to edit ecocounter.validation_results, particularly the flow_ids CTE to link different flow_ids to those studies.

@A-DUYVESTYN
Copy link

@gabrielwol, I'm not finding this file that my review was requested for: volumes/ecocounter/views/create-view-counts_calibrated.sql

@gabrielwol
Copy link
Collaborator Author

@gabrielwol, I'm not finding this file that my review was requested for: volumes/ecocounter/views/create-view-counts_calibrated.sql

Renamed to "counts" from counts_calibrated

@A-DUYVESTYN
Copy link

@gabrielwol, I'm not finding this file that my review was requested for: volumes/ecocounter/views/create-view-counts_calibrated.sql

Renamed to "counts" from counts_calibrated

In bigdata, I see both VIEWs counts_calibrated and counts. Is the intention to only have counts which would contain calibrated counts? And to get raw data, use table counts_unfiltered?

@gabrielwol
Copy link
Collaborator Author

In bigdata, I see both VIEWs counts_calibrated and counts. Is the intention to only have counts which would contain calibrated counts? And to get raw data, use table counts_unfiltered?

@A-DUYVESTYN yes, we would keep only counts and counts_unfiltered.

  • counts contains both raw and calibrated volumes, and omits anomalous ranges and unvalidated flows/sitese.
  • counts_unfiltered contains all raw data.

Will work on readmes today.

@A-DUYVESTYN
Copy link

OK. I'm not clear on process for assigning flows_unfiltered.validated and sites_unfiltered.validated. Another one for documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ecocounter: develop sensitivity/factor tables
4 participants