Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redshift Loader: migrate column if enum field's max length changes #1158

Open
dilyand opened this issue Dec 15, 2022 · 2 comments
Open

Redshift Loader: migrate column if enum field's max length changes #1158

dilyand opened this issue Dec 15, 2022 · 2 comments

Comments

@dilyand
Copy link
Contributor

dilyand commented Dec 15, 2022

If we have a schema with a property that is an enum, eg:

"name": {
 "type": "string",
 "enum": [
    "abc",
    "def",
    "ghi",
    "jkl"
  ],
  "maxLength": 256
}

currently the column will not be created as VARCHAR(256) but with the size of the longest possible enum value, ie VARCHAR(3).

If a new, longer value is added to the enum, the column will not be migrated to a new size. Instead, any input values will be truncated so they can be loaded.

The loader should probably:

  • always respect the maxLength setting if there is one;
  • if there is no maxLength, keep track of whether the max possible value of the enum has changed and migrate the column.
@jfjalburquerque
Copy link

I have same issue, after update an schema to insert a new enum value if the size of this new value is higher than older ones redshift varchar is not resized and value is truncated.

Is there any solution for this ?

@jethron
Copy link

jethron commented Jun 23, 2023

There looks to be some fun history here!

Support for this seems originally added in snowplow/schema-ddl#83, but then disabled in snowplow/schema-ddl#121 because it broke snowplow/schema-ddl#36 which also lead to snowplow/schema-ddl#110.

So maxLength is ignored for enums again, and the length is based purely on the enum values. Unless the longest value is 256 chars or longer, text255 encoding is used for the column. Then because text255 can't be resized, the ALTER statement never runs (because it can't due to a Redshift limitation). If the max length of a value in the original enum is > 255 chars and you want to resize to something even larger it should work, but in no other cases. This is probably pretty rare.

These days Redshift has ALTER TABLE column_name ENCODE ... so as a manual workaround you can probably manually alter the column's encoding to ZSTD or something, alter again to resize the length to the correct value, and then optionally alter the encoding back to text255 (if the new length is still <=255 characters long). Subsequent loads should then work without truncation.

I think these days we're better off just encoding as ZSTD from the get-go, ignoring Text255 for enums, and re-enabling maxLength support for enum columns. But I think for rdb-loader to be able to automatically handle the migration properly schema-ddl would need specific support added for updating ENCODE which I'm not sure is a small amount of work. It would be nice though if rdb-loader/schema-ddl gave a warning or something if it skipped a resize due to the column encoding, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants