PostgreSQL type conversion should default to NATIONAL CHARACTER #574

daniel-skovenborg · 2024-01-16T10:45:51Z

PostgreSQL type conversion in PostgreSQLJDBCDatatypeImporter imports varchar as CHARACTER VARYING and text as CHARACTER LARGE OBJECT. However, because SQL:1999 distincts between CHARACTER and NATIONAL CHARACTER, and PostgreSQL does not, the conversion should default to NATIONAL CHARACTER.
I haven't tried, but I believe this could break migration of SIARD archives from PostgreSQL databases to databases that distinct between VARCHAR and NVARCHAR if cells contains non-ASCII characters..

Of course, NATIONAL CHARACTER is not always what you'll want, e.g. if the database encoding is not a Unicode type or the column is just an enum.
I suggest that the type conversion methods should take the schema, table, and column as arguments, and that the PostgreSQL importer should have an option to make the following query to determine if a text column holds national characters:

select exists(select from SCHEMA_NAME.TABLE_NAME where COLUMN_NAME::text ~ '[^\x01-\x7F]');

This will of course very much slow down the import and should probably be an opt in.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostgreSQL type conversion should default to NATIONAL CHARACTER #574

PostgreSQL type conversion should default to NATIONAL CHARACTER #574

daniel-skovenborg commented Jan 16, 2024

PostgreSQL type conversion should default to NATIONAL CHARACTER #574

PostgreSQL type conversion should default to NATIONAL CHARACTER #574

Comments

daniel-skovenborg commented Jan 16, 2024