-
Notifications
You must be signed in to change notification settings - Fork 8
2.7 Timeseries padder: variable vs. constant
Some QC tests (e.g. spike, persistence) evaluate windows of data rather than a single data point (e.g. range). Thus, the QAQC module needs to pad data before and/or after the time window of interest in order to perform the full suite of QC tests. The timeseries_padder
module does this. Since the NEON processing pipelines operate on daily time blocks, typically at least a day before and a day after the target day need to be accessed.
Data may be padded using a constant value that is specified for all data on which the module operates, or a variable value that is determined from the expected data rate of the named location (found within the location file) and the value of the QC thresholds for the named location (found in the thresholds file). The latter is preferred because it will automatically adjust to changes in threshold parameters and data rate that result in needing a larger or smaller window of data to perform QAQC.
The constant timeseries padder python module timeseries_padder.timeseries_padder.constant_pad_main
uses variables designated under env:
(e.g. OUT_PATH
, WINDOW_SIZE
, YEAR_INDEX
, etc.) to designate arguments for the module. See an example of how the env:
is designated for the constant timeseries padder below:
transform:
image_pull_secrets:
- battelleecology-quay-read-all-pull-secret
image: quay.io/battelleecology/timeseries_padder:26
cmd:
- "/bin/bash"
stdin:
- "#!/bin/bash"
- python3 -m timeseries_padder.timeseries_padder.constant_pad_main
env:
OUT_PATH: /pfs/out
WINDOW_SIZE: '1'
LOG_LEVEL: INFO
RELATIVE_PATH_INDEX: '3'
YEAR_INDEX: '4'
MONTH_INDEX: '5'
DAY_INDEX: '6'
LOCATION_INDEX: '7'
DATA_TYPE_INDEX: '8'
The variable timeseries padder python module does not use the env
specified in a yaml file, but rather arguments passed via the python command using the argparse
python package. This same approach is also used in the [SHORT-NAME]_egress.yaml
. The following example shows the corresponding variable timeseries padder employed in the [SHORT-NAME]_timeseries_padder.yaml
. Note how timeseries_padder.timeseries_padder.variable_pad_main
is now called, followed by the arguments that will be parsed in lieu of being specified in env:
.
transform:
image_pull_secrets:
- battelleecology-quay-read-all-pull-secret
image: quay.io/battelleecology/timeseries_padder:31
cmd:
- "/bin/bash"
stdin:
- "#!/bin/bash"
- python3 -m timeseries_padder.timeseries_padder.variable_pad_main --yearindex 4 --monthindex 5 --dayindex 6 --locindex 7 --subdirindex 8
env:
OUT_PATH: /pfs/out
LOG_LEVEL: INFO