Releases: dathere/datapusher-plus
Releases · dathere/datapusher-plus
0.7.0
0.6.0
- validate excel file exported CSVs as well, as they can potentially be invalid CSVs (e.g. differing column counts per row)
- support negative values for PREVIEW_ROWS to start previewing from the end of a file (e.g. -1000 = last 1000 rows)
- if an Excel file is invalid or password-protected, show additional file metadata by using the
file
command - remove obsolete CHUNK_INSERT_ROWS setting as we now do Postgres COPY
- add PREFER_DMY setting for parsing dates and doing column date inferencing (otherwise, the default is YMD)
- add logic to DROP VIEWS if ALIAS_UNIQUE is false, and show warning on datastore log
- implement smart auto-indexing which is controlled by AUTO_INDEX_THRESHOLD (default: 3) and AUTO_INDEX_DATES (default: true)
- improved log messages (comma-separated formatting for numbers, context-sensitive normalizing/transcoding messages, etc.)
- applied Black formatter to jobs.py
Full Changelog: 0.5.1...0.6.0
0.5.1
- Fixed #39 - no "data rows" bug
- added more implementation comments and TODOS
Full Changelog: 0.5.0...0.5.1
0.5.0
- new AUTO_ALIAS_UNIQUE setting with a default of false. This ensure the alias is stable if the resource is updated.
- updated deployment instructions
- two-stage normalization/validation of incoming files, ensuring that we can gracefully handle corrupt files
- ensure column names are "safe" (e.g. valid postgresql column identifiers), modifying them as required - while still retaining the original "unsafe" name in the data dictionary
Full Changelog: 0.4.0...0.5.0
0.4.0
What's Changed
- smart data dictionary
- "safe" column names handling
- uwsgi deployment fixed
- send the env file explicitly by @TomeCirun in #45
More detailed changelog notes forthcoming.
Full Changelog: 0.3.1...0.4.0
0.3.1
Changed
- refactored log message right before qsv preprocessing starts e45d607
Full Changelog: 0.3.0...0.3.1
0.3.0
Changed
- spreadsheet files that are added as a link are parsed properly so long as the resource format is set
- header names are sanitized so they are valid Postgres column identifiers
Fixed
- wsgi deployment fixed
Full Changelog: 0.2.0...0.3.0
0.2.0
What's Changed
- fix UnboundLocalError by @TomeCirun in #40
- Add datapusherplus config by @TomeCirun in #41
- fix resource download by @TomeCirun in #42
- delete settings.py by @TomeCirun in #43
New Contributors
- @TomeCirun made their first contribution in #40
Full Changelog: 0.1.0...0.2.0
0.1.0
Added
- available smarter data type mapping to Postgres data types. By looking at the min/max values of a column,
we can infer the best postgres data type - integer, bigint or numeric, instead of using the numeric Postgres type for all integers.
This is done by changing TYPE_MAPPING ofInteger
fromnumeric
tosmartint
. #37 - Add resource preview metadata fields:
preview
- if the resource is a preview, and not the entire file, containing only the first PREVIEW_ROWS of the file (boolean)preview_rows
- the number of rows of the previewtotal_record_count
- the actual number of rows of the file
Changed
- change mapping of inferred Date fields to the Postgres
date
data type, instead of using Postgrestimestamp
data type for
both Date (YYYY-MM-DD) and Datetime (YYYY-MM-DD HH:MM:SS TZ) columns. - warn when duplicates are found, instead of info
- decreased default preview to 1,000 rows
- better error handling when calling qsv binary
- update instructions to use the latest qsv binary - qsv 0.67.0
Fixed
- trimmed header and column values when processing spreadsheets. As spreadsheets are more often than not, manually curated,
there are often invisible whitespaces that "look" right that may cause invalid CSVs - e.g. column names with leading/trailing whitespaces
that cause Postgres errors when columns are created using the Excel column name.
Full Changelog: 0.0.23...0.1.0
0.0.23
Changed
- use
psycopg2-binary
instead ofpsycopg2
to ease installation and eliminate need to have postgres dev files - made logging messages auto-dedup aware if dupes are detected, by adding "unique" qualifier to record count
- pointed to the latest qsv version (0.46.1) with the excel off by 1 fix
- added note about nightly builds of qsv for maximum performance
- added note about additional DP+ supported Excel and TSV subformats
- use JOB_CONFIG consistently for setting DP+ settings
- made qsvdp the default QSV_BIN
- added note about how to install python 3.7 and above in DP+ virtual environment
Removed
- removed Hitchiker's guide quote from setup.py epilog
- removed
six
as DP+ requires at least python 3.7 - removed
pytest
step in Development installation until the tests are adapted to DP+
Fixed
- fixed development installation procedure, so no assumptions are made
- fixed production deployment procedure and made it more detailed
- fixed off by 1 error in
excel
export message in qsv
Full Changelog: 0.0.21...0.0.23