Releases · dathere/datapusher-plus

validate excel file exported CSVs as well, as they can potentially be invalid CSVs (e.g. differing column counts per row)
support negative values for PREVIEW_ROWS to start previewing from the end of a file (e.g. -1000 = last 1000 rows)
if an Excel file is invalid or password-protected, show additional file metadata by using the file command
remove obsolete CHUNK_INSERT_ROWS setting as we now do Postgres COPY
add PREFER_DMY setting for parsing dates and doing column date inferencing (otherwise, the default is YMD)
add logic to DROP VIEWS if ALIAS_UNIQUE is false, and show warning on datastore log
implement smart auto-indexing which is controlled by AUTO_INDEX_THRESHOLD (default: 3) and AUTO_INDEX_DATES (default: true)
improved log messages (comma-separated formatting for numbers, context-sensitive normalizing/transcoding messages, etc.)
applied Black formatter to jobs.py

Full Changelog: 0.5.1...0.6.0

Assets 2

05 Jan 18:42

jqnatividad

0.5.1

0541390

0.5.1

Fixed #39 - no "data rows" bug
added more implementation comments and TODOS

Full Changelog: 0.5.0...0.5.1

Assets 2

04 Jan 17:50

jqnatividad

0.5.0

fd53e7f

0.5.0

new AUTO_ALIAS_UNIQUE setting with a default of false. This ensure the alias is stable if the resource is updated.
updated deployment instructions
two-stage normalization/validation of incoming files, ensuring that we can gracefully handle corrupt files
ensure column names are "safe" (e.g. valid postgresql column identifiers), modifying them as required - while still retaining the original "unsafe" name in the data dictionary

Full Changelog: 0.4.0...0.5.0

Assets 2

13 Dec 14:38

jqnatividad

0.4.0

c31e045

0.4.0

What's Changed

smart data dictionary
"safe" column names handling
uwsgi deployment fixed
send the env file explicitly by @TomeCirun in #45

More detailed changelog notes forthcoming.

Full Changelog: 0.3.1...0.4.0

Contributors

TomeCirun

Assets 2

09 Dec 15:46

jqnatividad

0.3.1

ad4bb59

0.3.1

Changed

refactored log message right before qsv preprocessing starts e45d607

Full Changelog: 0.3.0...0.3.1

Assets 2

09 Dec 15:21

jqnatividad

0.3.0

9e4fb71

0.3.0

Changed

spreadsheet files that are added as a link are parsed properly so long as the resource format is set
header names are sanitized so they are valid Postgres column identifiers

Fixed

wsgi deployment fixed

Full Changelog: 0.2.0...0.3.0

Assets 2

07 Dec 14:29

jqnatividad

0.2.0

58b4ad7

0.2.0

What's Changed

fix UnboundLocalError by @TomeCirun in #40
Add datapusherplus config by @TomeCirun in #41
fix resource download by @TomeCirun in #42
delete settings.py by @TomeCirun in #43

New Contributors

@TomeCirun made their first contribution in #40

Full Changelog: 0.1.0...0.2.0

Contributors

TomeCirun

Assets 2

09 Sep 19:12

jqnatividad

0.1.0

a02034a

0.1.0

Added

available smarter data type mapping to Postgres data types. By looking at the min/max values of a column,
we can infer the best postgres data type - integer, bigint or numeric, instead of using the numeric Postgres type for all integers.
This is done by changing TYPE_MAPPING of Integer from numeric to smartint. #37
Add resource preview metadata fields:
- preview - if the resource is a preview, and not the entire file, containing only the first PREVIEW_ROWS of the file (boolean)
- preview_rows - the number of rows of the preview
- total_record_count - the actual number of rows of the file

Changed

change mapping of inferred Date fields to the Postgres date data type, instead of using Postgres timestamp data type for
both Date (YYYY-MM-DD) and Datetime (YYYY-MM-DD HH:MM:SS TZ) columns.
warn when duplicates are found, instead of info
decreased default preview to 1,000 rows
better error handling when calling qsv binary
update instructions to use the latest qsv binary - qsv 0.67.0

Fixed

trimmed header and column values when processing spreadsheets. As spreadsheets are more often than not, manually curated,
there are often invisible whitespaces that "look" right that may cause invalid CSVs - e.g. column names with leading/trailing whitespaces
that cause Postgres errors when columns are created using the Excel column name.

Full Changelog: 0.0.23...0.1.0

Assets 2

09 May 15:29

jqnatividad

0.0.23

8abedaf

0.0.23

Changed

use psycopg2-binary instead of psycopg2 to ease installation and eliminate need to have postgres dev files
made logging messages auto-dedup aware if dupes are detected, by adding "unique" qualifier to record count
pointed to the latest qsv version (0.46.1) with the excel off by 1 fix
added note about nightly builds of qsv for maximum performance
added note about additional DP+ supported Excel and TSV subformats
use JOB_CONFIG consistently for setting DP+ settings
made qsvdp the default QSV_BIN
added note about how to install python 3.7 and above in DP+ virtual environment

Removed

removed Hitchiker's guide quote from setup.py epilog
removed six as DP+ requires at least python 3.7
removed pytest step in Development installation until the tests are adapted to DP+

Fixed

fixed development installation procedure, so no assumptions are made
fixed production deployment procedure and made it more detailed
fixed off by 1 error in excel export message in qsv

Full Changelog: 0.0.21...0.0.23

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's Changed

Contributors

Changed

Changed

Fixed

What's Changed

New Contributors

Contributors

Added

Changed

Fixed

Changed

Removed

Fixed

Releases: dathere/datapusher-plus

0.7.0

What's Changed

New Contributors

Contributors

0.6.0

0.5.1

0.5.0

0.4.0

What's Changed

Contributors

0.3.1

Changed

0.3.0

Changed

Fixed

0.2.0

What's Changed

New Contributors

Contributors

0.1.0

Added

Changed

Fixed

0.0.23

Changed

Removed

Fixed