Skip to content

Commit

Permalink
accepted James suggestion, updated ymal file and final for review and…
Browse files Browse the repository at this point in the history
… merge
  • Loading branch information
mmmiah committed Sep 13, 2024
1 parent 8b66601 commit 194b683
Show file tree
Hide file tree
Showing 3 changed files with 89 additions and 75 deletions.
74 changes: 71 additions & 3 deletions transform/models/marts/diagnostics/_diagnostics.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,78 @@ models:
- name: longitude
description: The longitude of the station.
- name: good_detector_count
description: The number of good detectors per day and station
description: The number of good detectors per day and station.
- name: bad_detector_count
description: The number of good detectors per day and station
description: The number of bad detectors per day and station.
- name: average_sample_count
description: |
The average number of samples from all detectors reported from
The average number of samples from all detectors reported from.
per station and day
- name: down_or_no_data_count
description: The number of down or no data bad detectors count per day and station.
- name: insufficient_data_count
description: The number of bad detectors per day and station that have insufficient data.
- name: card_off_count
description: The number of bad detectors per day and station that card was off.
- name: high_val_count
description: The number of bad detectors per day and station that had extremely high value.
- name: intermittent_count
description: The number of bad detectors per day and station that had intermittent count.
- name: constant_count
description: The number of bad detectors per day and station that had constant count.
- name: diagnostics__detector_monthly_by_station
description: |
This file contains detector status data aggregated to the station level in monthly tempporal resulation.
columns:
- name: station_id
description: The unique ID of the station.
tests:
- not_null
- name: sample_month
description: The month associated with raw data samples being counted.
tests:
- not_null
- name: station_type
description: The type of the station.
- name: district
description: The Caltrans district for the station.
- name: county
description: The county FIPS code in which the station installed.
- name: city
description: The city FIPS code in which the station is installed.
- name: freeway
description: The freeway on which the station is installed.
- name: direction
description: The direction of travel for the freeway on which the station is installed.
- name: physical_lanes
description: The number of lanes in the station in that time period.
- name: state_postmile
description: The State postmile for the station in that time period.
- name: absolute_postmile
description: The absolute postmile for the station.
- name: latitude
description: The latitude of the station.
- name: longitude
description: The longitude of the station.
- name: monthly_detector_count
description: Total number of good and bad detectors in a month per statoin.
- name: monthly_good_detector_count
description: The number of good detectors per month and station.
- name: monthly_bad_detector_count
description: The number of bad detectors per month and station.
- name: monthly_sample_count
description: |
The total number of samples from all detectors reported from.
per station for a month
- name: down_or_no_data_count
description: The number of down or no data bad detectors count per month and station.
- name: insufficient_data_count
description: The number of bad detectors per month and station that have insufficient data.
- name: card_off_count
description: The number of bad detectors per month and station that card was off.
- name: high_val_count
description: The number of bad detectors per month and station that had extremely high value.
- name: intermittent_count
description: The number of bad detectors per month and station that had intermittent count.
- name: constant_count
description: The number of bad detectors per month and station that had constant count.
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ detector_status_with_count as (
sample_ct,
count_if(status = 'Good') as good_detector,
count_if(status != 'Good') as bad_detector,
count_if(status = 'Good') as good,
count_if(status = 'Down/No Data') as down_or_no_data,
count_if(status = 'Insufficient Data') as insufficient_data,
count_if(status = 'Card Off') as card_off,
Expand All @@ -40,7 +39,6 @@ detector_status_by_station as (
round(avg(sample_ct)) as average_sample_count,
sum(good_detector) as good_detector_count,
sum(bad_detector) as bad_detector_count,
sum(good) as good_count,
sum(down_or_no_data) as down_or_no_data_count,
sum(insufficient_data) as insufficient_data_count,
sum(card_off) as card_off_count,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,50 +6,31 @@
with detector_daily_status as (
select
*,
DATE_TRUNC(month, sample_date) as sample_month
DATE_TRUNC(month, sample_date) as sample_month,
ROW_NUMBER() over (partition by sample_month, station_id order by sample_date desc) as rn
from {{ ref('diagnostics__detector_daily_by_station') }}
),

recent_data as (
select
district,
station_id,
station_type,
sample_month,
county,
city,
freeway,
direction,
latitude,
longitude,
length,
state_postmile,
absolute_postmile,
ROW_NUMBER() over (
partition by
district, station_id, station_type, sample_month, county, city, freeway, direction, latitude, longitude
order by sample_date desc
) as rn
from
detector_daily_status
),

detector_monthly_status_by_station as (
select
district,
station_id,
station_type,
sample_month,
county,
city,
freeway,
direction,
latitude,
longitude,
station_id,
MAX(case when rn = 1 then district end) as district,
MAX(case when rn = 1 then state_postmile end) as state_postmile,
MAX(case when rn = 1 then absolute_postmile end) as absolute_postmile,
MAX(case when rn = 1 then latitude end) as latitude,
MAX(case when rn = 1 then longitude end) as longitude,
MAX(case when rn = 1 then physical_lanes end) as physical_lanes,
MAX(case when rn = 1 then station_type end) as station_type,
MAX(case when rn = 1 then county end) as county,
MAX(case when rn = 1 then city end) as city,
MAX(case when rn = 1 then freeway end) as freeway,
MAX(case when rn = 1 then direction end) as direction,
MAX(case when rn = 1 then length end) as length,
SUM(detector_count) as monthly_detector_count,
ROUND(SUM(average_sample_count)) as monthly_sample_count,
SUM(good_detector_count) as monthly_good_detector_count,
SUM(good_count) as good_count,
SUM(bad_detector_count) as monthly_bad_detector_count,
SUM(down_or_no_data_count) as down_or_no_data_count,
SUM(insufficient_data_count) as insufficient_data_count,
SUM(card_off_count) as card_off_count,
Expand All @@ -59,41 +40,8 @@ detector_monthly_status_by_station as (
from
detector_daily_status
group by
district,
station_id,
station_type,
sample_month,
county,
city,
freeway,
direction,
latitude,
longitude
),

monthly_station_health as (
select
dms.*,
recent.length,
recent.state_postmile,
recent.absolute_postmile
from
detector_monthly_status_by_station as dms
left join
recent_data
as recent
on
dms.district = recent.district
and dms.station_id = recent.station_id
and dms.station_type = recent.station_type
and dms.sample_month = recent.sample_month
and dms.county = recent.county
and dms.city = recent.city
and dms.freeway = recent.freeway
and dms.direction = recent.direction
and dms.latitude = recent.latitude
and dms.longitude = recent.longitude
and recent.rn = 1
sample_month
)

select * from monthly_station_health
select * from detector_monthly_status_by_station

0 comments on commit 194b683

Please sign in to comment.