[BUG]: DFP file_batcher
module always uses default date regex pattern
#1576
Labels
bug
Something isn't working
Version
23.11
Which installation method(s) does this occur on?
Conda
Describe the bug.
This line in the
file_batcher
looks forbatch_iso_date_regex_pattern
in the module's config which does not exists as key so will always useDEFAULT_ISO_DATE_REGEX_PATTERN
This results in the default value being used if the pattern doesn't match filenames. Consequently, the program falls back to the last modified time for all files, potentially causing entire datasets to be loaded at once and leading to CUDF overflow errors for large files.
Minimum reproducible example
Run DFP pipeline on input files with names that don't follow default regex pattern.
Relevant log output
Click here to see error details
Full env printout
Click here to see environment details
Other/Misc.
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: