Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL type conversion should default to NATIONAL CHARACTER #574

Open
daniel-skovenborg opened this issue Jan 16, 2024 · 0 comments
Open

Comments

@daniel-skovenborg
Copy link
Contributor

PostgreSQL type conversion in PostgreSQLJDBCDatatypeImporter imports varchar as CHARACTER VARYING and text as CHARACTER LARGE OBJECT. However, because SQL:1999 distincts between CHARACTER and NATIONAL CHARACTER, and PostgreSQL does not, the conversion should default to NATIONAL CHARACTER.
I haven't tried, but I believe this could break migration of SIARD archives from PostgreSQL databases to databases that distinct between VARCHAR and NVARCHAR if cells contains non-ASCII characters..

Of course, NATIONAL CHARACTER is not always what you'll want, e.g. if the database encoding is not a Unicode type or the column is just an enum.
I suggest that the type conversion methods should take the schema, table, and column as arguments, and that the PostgreSQL importer should have an option to make the following query to determine if a text column holds national characters:

select exists(select from SCHEMA_NAME.TABLE_NAME where COLUMN_NAME::text ~ '[^\x01-\x7F]');

This will of course very much slow down the import and should probably be an opt in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant