Skip to content

Releases: dathere/datapusher-plus

0.7.0

17 Jan 12:55
Compare
Choose a tag to compare

More detailed release notes forthcoming...

What's Changed

  • fix import of MutableMapping from collections.abc by @ctrepka in #49

New Contributors

Full Changelog: 0.6.0...0.7.0

0.6.0

06 Jan 18:09
Compare
Choose a tag to compare
  • validate excel file exported CSVs as well, as they can potentially be invalid CSVs (e.g. differing column counts per row)
  • support negative values for PREVIEW_ROWS to start previewing from the end of a file (e.g. -1000 = last 1000 rows)
  • if an Excel file is invalid or password-protected, show additional file metadata by using the file command
  • remove obsolete CHUNK_INSERT_ROWS setting as we now do Postgres COPY
  • add PREFER_DMY setting for parsing dates and doing column date inferencing (otherwise, the default is YMD)
  • add logic to DROP VIEWS if ALIAS_UNIQUE is false, and show warning on datastore log
  • implement smart auto-indexing which is controlled by AUTO_INDEX_THRESHOLD (default: 3) and AUTO_INDEX_DATES (default: true)
  • improved log messages (comma-separated formatting for numbers, context-sensitive normalizing/transcoding messages, etc.)
  • applied Black formatter to jobs.py

Full Changelog: 0.5.1...0.6.0

0.5.1

05 Jan 18:42
Compare
Choose a tag to compare
  • Fixed #39 - no "data rows" bug
  • added more implementation comments and TODOS

Full Changelog: 0.5.0...0.5.1

0.5.0

04 Jan 17:50
Compare
Choose a tag to compare
  • new AUTO_ALIAS_UNIQUE setting with a default of false. This ensure the alias is stable if the resource is updated.
  • updated deployment instructions
  • two-stage normalization/validation of incoming files, ensuring that we can gracefully handle corrupt files
  • ensure column names are "safe" (e.g. valid postgresql column identifiers), modifying them as required - while still retaining the original "unsafe" name in the data dictionary

Full Changelog: 0.4.0...0.5.0

0.4.0

13 Dec 14:38
Compare
Choose a tag to compare

What's Changed

  • smart data dictionary
  • "safe" column names handling
  • uwsgi deployment fixed
  • send the env file explicitly by @TomeCirun in #45

More detailed changelog notes forthcoming.

Full Changelog: 0.3.1...0.4.0

0.3.1

09 Dec 15:46
Compare
Choose a tag to compare

Changed

  • refactored log message right before qsv preprocessing starts e45d607

Full Changelog: 0.3.0...0.3.1

0.3.0

09 Dec 15:21
Compare
Choose a tag to compare

Changed

  • spreadsheet files that are added as a link are parsed properly so long as the resource format is set
  • header names are sanitized so they are valid Postgres column identifiers

Fixed

  • wsgi deployment fixed

Full Changelog: 0.2.0...0.3.0

0.2.0

07 Dec 14:29
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.1.0...0.2.0

0.1.0

09 Sep 19:12
Compare
Choose a tag to compare

Added

  • available smarter data type mapping to Postgres data types. By looking at the min/max values of a column,
    we can infer the best postgres data type - integer, bigint or numeric, instead of using the numeric Postgres type for all integers.
    This is done by changing TYPE_MAPPING of Integer from numeric to smartint. #37
  • Add resource preview metadata fields:
    • preview - if the resource is a preview, and not the entire file, containing only the first PREVIEW_ROWS of the file (boolean)
    • preview_rows - the number of rows of the preview
    • total_record_count - the actual number of rows of the file

Changed

  • change mapping of inferred Date fields to the Postgres date data type, instead of using Postgres timestamp data type for
    both Date (YYYY-MM-DD) and Datetime (YYYY-MM-DD HH:MM:SS TZ) columns.
  • warn when duplicates are found, instead of info
  • decreased default preview to 1,000 rows
  • better error handling when calling qsv binary
  • update instructions to use the latest qsv binary - qsv 0.67.0

Fixed

  • trimmed header and column values when processing spreadsheets. As spreadsheets are more often than not, manually curated,
    there are often invisible whitespaces that "look" right that may cause invalid CSVs - e.g. column names with leading/trailing whitespaces
    that cause Postgres errors when columns are created using the Excel column name.

Full Changelog: 0.0.23...0.1.0

0.0.23

09 May 15:29
Compare
Choose a tag to compare

Changed

  • use psycopg2-binary instead of psycopg2 to ease installation and eliminate need to have postgres dev files
  • made logging messages auto-dedup aware if dupes are detected, by adding "unique" qualifier to record count
  • pointed to the latest qsv version (0.46.1) with the excel off by 1 fix
  • added note about nightly builds of qsv for maximum performance
  • added note about additional DP+ supported Excel and TSV subformats
  • use JOB_CONFIG consistently for setting DP+ settings
  • made qsvdp the default QSV_BIN
  • added note about how to install python 3.7 and above in DP+ virtual environment

Removed

  • removed Hitchiker's guide quote from setup.py epilog
  • removed six as DP+ requires at least python 3.7
  • removed pytest step in Development installation until the tests are adapted to DP+

Fixed

  • fixed development installation procedure, so no assumptions are made
  • fixed production deployment procedure and made it more detailed
  • fixed off by 1 error in excel export message in qsv

Full Changelog: 0.0.21...0.0.23