1085 Ecocounter Open Data Schema+DAG+QC plots #1096

gabrielwol · 2024-11-08T22:14:34Z

What this pull request accomplishes:

New Monthly DAG to insert/download Ecocounter data. Includes prompts to review data and check if data for all days are present. @chmnata
- dags/ecocounter_open_data.py
Open Data tables + functions @A-DUYVESTYN
- open_data_daily_counts + fn open_data_daily_counts_insert , open_data_raw_counts + fn open_data_raw_counts_insert, open_data_sites
Calibration factor supporting tables @A-DUYVESTYN
- volumes/ecocounter/tables/sensitivity_history.sql: format sensitivity history for ease of use. We should remove sensitivity_changes when we are satisfied with changes.
- volumes/ecocounter/views/create-view-calibration_factors.sql: merge sensitivity changes with validation studies to identify when to apply calibration factors
- volumes/ecocounter/views/create-view-counts.sql: apply calibration factors

Does not need significant review:

R plots
- Interactive anomalous range ID and QC plots: volumes/ecocounter/qc/qc_shiny_app.r + qc_graph_volumes postgres function
- Output static PDF plots: volumes/ecocounter/qc/qc_plots_pdf.r (could be useful base in the future if we want to automatically certain output QC plots each month).
Misc.
- Add counter to sites table (commonly used ID field in Ecocounter communications/dashboard)
  - updated ecocounter_pull.py, pull_data_from_api.py.
- Add direction_main field to flows: easily group dominant/contraflow together
- Fixed anomalous_range join conditions (Use only one of site_id and flow_id)
  - counts, counts_calibrated

Issue(s) this solves:

Ecocounter: develop sensitivity/factor tables #1085

What, in particular, needs to reviewed:

What needs to be done by a sysadmin after this PR is merged

Delete counts_old (saved the old version of counts for monitoring project dependency)
Delete discontinuities table

…munications)

…ction_factors

…in both site and flow ids)

… DAG

chmnata · 2024-11-25T22:17:42Z

dags/ecocounter_open_data.py

@@ -106,46 +96,93 @@ def reminder_message(ds = None, **context):
    wait_till_10th.doc_md = """
    Wait until the 10th day of the month to export data. Alternatively mark task as success to proceed immediately.
    """
+
+    @task()
+    def get_years(ds=None):


did we add this for mapped task naming? It was using ds before right?

Yes, the insert_and_download_data task_group is mapped over the output of get_years. The purpose is to run the exports for last two months (which may be two separate years), since new data frequently arrives within ~30 days.

A-DUYVESTYN · 2024-11-25T22:21:41Z

volumes/ecocounter/tables/sensitivity_history.sql

OK to remove sensitivity_changes since it's superseded by sensitivity_history.
Not sure why flow_id = 101042942 has a stray date_range = (,) .
The readme should mention:

sensitivity_history is manually updated based on emails with with Eco-counter Technical Support Specialist (Derek Yates as of 2024).

what the numbered settings mean (least to most selective, or not standardized)

Eco-counter must confirm what the current and new settings are whenever a sensitivity is adjusted, along with the date the change is applied.

OK to remove sensitivity_changes since it's superseded by sensitivity_history.

✅

Not sure why flow_id = 101042942 has a stray date_range = (,) .

That flow_id never had any data. Deleted the entry.

The readme should mention...

Adding!

A-DUYVESTYN · 2024-11-25T22:33:31Z

volumes/ecocounter/views/create-view-calibration_factors.sql

How to separate scooter flows calibration factors from bike flows? Currently we apply the same "total volume" factor to both, but we will have separate validation factors for these veh types soon. (Not a blocker for this release)
Readme should state:

ecocounter.calibration_factors excludes factors in ecocounter.manual_counts_info with flagged as do_not_use= true . do_not_use is entered manually on a case-by-case basis depending on which calibration study is the 'best'.

Caution is required when assigning do_not_use to calibration studies. Avoid as much as possible applying a new factor to data that has already been calibrated and published to Open Data.

The way to separate scooter calibration factors would be to edit ecocounter.validation_results, particularly the flow_ids CTE to link different flow_ids to those studies.

A-DUYVESTYN · 2024-11-25T22:35:53Z

@gabrielwol, I'm not finding this file that my review was requested for: volumes/ecocounter/views/create-view-counts_calibrated.sql

gabrielwol · 2024-11-25T22:45:04Z

@gabrielwol, I'm not finding this file that my review was requested for: volumes/ecocounter/views/create-view-counts_calibrated.sql

Renamed to "counts" from counts_calibrated

A-DUYVESTYN · 2024-11-26T14:54:37Z

@gabrielwol, I'm not finding this file that my review was requested for: volumes/ecocounter/views/create-view-counts_calibrated.sql

Renamed to "counts" from counts_calibrated

In bigdata, I see both VIEWs counts_calibrated and counts. Is the intention to only have counts which would contain calibrated counts? And to get raw data, use table counts_unfiltered?

gabrielwol · 2024-11-26T14:56:55Z

In bigdata, I see both VIEWs counts_calibrated and counts. Is the intention to only have counts which would contain calibrated counts? And to get raw data, use table counts_unfiltered?

@A-DUYVESTYN yes, we would keep only counts and counts_unfiltered.

counts contains both raw and calibrated volumes, and omits anomalous ranges and unvalidated flows/sitese.
counts_unfiltered contains all raw data.

Will work on readmes today.

A-DUYVESTYN · 2024-11-26T15:34:50Z

OK. I'm not clear on process for assigning flows_unfiltered.validated and sites_unfiltered.validated. Another one for documentation.

…umes

… airflow imports (#1115)

gabrielwol added 23 commits October 21, 2024 18:02

#1085 sensitivity_history table

b4e359b

#1085 add counter variable to sites tables (used by ecocounter in com…

dcf802d

…munications)

#1085 script used to populate sensitivity history

9f9440c

#1085 transform sensitivity_history and validation_results into corre…

01697e4

…ction_factors

#1085 counts_corrected view

8318257

#1085 open data daily view

15da0e1

#1085 update sensitivity history, fix groupby bug in opendata view

7312c2b

#1085 sensitivity -> setting

3f6a837

#1085 add qc plots

49bfa9f

#1085 add qc shiny app

19fdfef

#1085 ecocounter_graph_volumes

ac7740b

#1085 shiny app improvements; zoom, faster render times, dynamic labels

9b19644

#1085 shiny app; add flow_id

581cd00

#1085 improve export feature, fix scaled volume bug

6e8ba63

#1085 add MOVE open data sync DAG

59b4db0

#1085 add "direction_main" for easier grouping of flows

6e2562f

#1085 fix anomalous_range join conditions for new shiny ranges (conta…

9adc3a9

…in both site and flow ids)

#1085 open data views (filtered + calibrated & raw, daily).

39ec9c3

#1085 shiny; ability to view validated counts only

e9ccdb8

#1085 don't pull for decommissioned sites/flows

a82948d

#1085 base of ecocounter_open_data dag

232c1c3

#1085 add task.bash to download data

6595b4f

#1085 fix permissions error

511ea73

gabrielwol added Ecocounter open data labels Nov 8, 2024

gabrielwol self-assigned this Nov 8, 2024

gabrielwol linked an issue Nov 8, 2024 that may be closed by this pull request

Ecocounter: develop sensitivity/factor tables #1085

Open

gabrielwol added 3 commits November 12, 2024 18:29

#1085 switch open data views to tables + add insert functions, adjust…

2e8a97d

… DAG

#1085 test_dags add connection

e68b8c1

#1085 sort sites, add daily tick marks

1930137

#1085 map data pulls over years (last 2 months)

636844b

chmnata reviewed Nov 25, 2024

View reviewed changes

A-DUYVESTYN reviewed Nov 25, 2024

View reviewed changes

gabrielwol added 21 commits November 26, 2024 21:07

#1085 remove discontinuities table, replace with sensitivity_history

c9f211e

#1085 readme updates

1275087

#1085 create open_data schema views for unified permanent cycling vol…

e66b0d4

…umes

#1085 update DAG to pull from open_data schema

1b4ddfb

#1085 output readme with pandoc

2c1dcb1

#1085 data check bug fix

8df8acf

#1085 lat,lng to latitude,longitude

c3f3050

#1085 doc updates

b3923a5

#1085 bash script for re-exporting historical ecocounter data

90782fb

#1085 use variable for EXPORT_PATH

54669ac

#1085 add location_dir_id, extra columns to summary table

b0b7c3f

#1085 fix for data-availability check

c8cc228

#1085 fix: get_years -> int, minor doc changes

d89182f

#1085 remove name columns from 15 min data

808cc2e

#1085 add zeros to 15 min data

086d95f

#1085 minor doc update

b2ec2f8

#1085 remove toplevel database connections (#1112), remove deprecated…

d5dc31d

… airflow imports (#1115)

#1085 don't try to set os.environ['PGPASSWORD']

bfb52c9

#1085 update year breaks

ae5c566

#1085 use templating for conn details

d00e15a

#1085 pandoc! + update readme pagebreaks

3b9dd0b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1085 Ecocounter Open Data Schema+DAG+QC plots #1096

1085 Ecocounter Open Data Schema+DAG+QC plots #1096

gabrielwol commented Nov 8, 2024 •

edited

Loading

chmnata Nov 25, 2024

gabrielwol Nov 26, 2024

A-DUYVESTYN Nov 25, 2024

gabrielwol Nov 26, 2024

A-DUYVESTYN Nov 25, 2024 •

edited

Loading

gabrielwol Nov 26, 2024

A-DUYVESTYN commented Nov 25, 2024

gabrielwol commented Nov 25, 2024

A-DUYVESTYN commented Nov 26, 2024

gabrielwol commented Nov 26, 2024

A-DUYVESTYN commented Nov 26, 2024

1085 Ecocounter Open Data Schema+DAG+QC plots #1096

Are you sure you want to change the base?

1085 Ecocounter Open Data Schema+DAG+QC plots #1096

Conversation

gabrielwol commented Nov 8, 2024 • edited Loading

What this pull request accomplishes:

Issue(s) this solves:

What, in particular, needs to reviewed:

What needs to be done by a sysadmin after this PR is merged

chmnata Nov 25, 2024

Choose a reason for hiding this comment

gabrielwol Nov 26, 2024

Choose a reason for hiding this comment

A-DUYVESTYN Nov 25, 2024

Choose a reason for hiding this comment

gabrielwol Nov 26, 2024

Choose a reason for hiding this comment

A-DUYVESTYN Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

gabrielwol Nov 26, 2024

Choose a reason for hiding this comment

A-DUYVESTYN commented Nov 25, 2024

gabrielwol commented Nov 25, 2024

A-DUYVESTYN commented Nov 26, 2024

gabrielwol commented Nov 26, 2024

A-DUYVESTYN commented Nov 26, 2024

gabrielwol commented Nov 8, 2024 •

edited

Loading

A-DUYVESTYN Nov 25, 2024 •

edited

Loading