-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-4026: [c++] Allow non-string custom attributes in schemas #3069
base: main
Are you sure you want to change the base?
Conversation
…m attributes are JSON string to avoid ambiguity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I would be happy to see this PR revived and eventually merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: format the json over more lines for better readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
"\"mapField\": \"{\\\"key1\\\":\\\"value1\\\", \\\"key2\\\":\\\"value2\\\"}\", " | ||
"\"nullField\": \"null\", " | ||
"\"numberField\": \"1.23\", " | ||
"\"arrayField\": [1], " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at these changes I think this change may break someone's application.
Do we want to support a backward compatible encoding via some property/setting/... ?
This particular case isn't much of a problem. The current behavior of the
C++ library is to throw an exception if you attempt to read a schema with
non-string custom attributes, so this is unlikely in the wild. This test
creates the schema programmatically bypassing the reading issue.
The compatibility issue with this PR is existing code that interrogates the
value of a custom attribute that is a string type. With this PR, the value
is JSON encoded text, so the returned value will contain a quoted string
instead of the raw string in that case.
There is another PR 3604 that avoids this compatibility issue, but at the
cost of creating ambiguity.
The API was recently changed, remaining CustomFields to CustomAttributes,
signaling a breaking change. Perhaps renaming the function
CustomAttributes::getAttributes to CustomAttributes::getAttributesJSON
would do the same here.
…On Thu, Sep 26, 2024, 7:13 AM Martin Grigorov ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In lang/c++/test/unittest.cc
<#3069 (comment)>:
> @@ -457,11 +457,11 @@ struct TestSchema {
std::string expectedJsonWithCustomAttribute =
"{\"type\": \"record\", \"name\": \"Test\",\"fields\": "
"[{\"name\": \"f1\", \"type\": \"long\", "
- "\"arrayField\": \"[1]\", "
- "\"booleanField\": \"true\", "
- "\"mapField\": \"{\\\"key1\\\":\\\"value1\\\", \\\"key2\\\":\\\"value2\\\"}\", "
- "\"nullField\": \"null\", "
- "\"numberField\": \"1.23\", "
+ "\"arrayField\": [1], "
Looking at these changes I think this change may break someone's
application.
Do we want to support a backward compatible encoding via some
property/setting/... ?
—
Reply to this email directly, view it on GitHub
<#3069 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BKKFEFOTN2DUMQW6AUKEGT3ZYPT7DAVCNFSM6AAAAABMDDFJFGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDGMZQHA4DSNRZGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Perhaps this would be the best! |
It seems that everyone involved in this discussion agrees with @jmd-z's proposal, and the PR could be merged once the change is implemented. @jmd--z, do you have time to make the small adjustment? |
looking forward for this fix to be merged! |
second this. |
Do we wait for an improvement here (deprecate the old methods and introduce new ones) ? |
Help is always welcome! |
Before I stumbled upon this issue (and the associated JIRA issue), I had made a patch similar to what's proposed -- but instead of just renaming methods to make it more clear that they are JSON-encoded, I had actually added new methods. I just opened a PR with that change (#3266), however I'm more than happy to close that PR if the change in this one is preferred. Is this PR's merge being held up only for renaming the methods to have a "JSON" suffix? If we wanted to keep the old ones and just add new ones, that ends up looking a lot like the PR I just opened. I'd just need to mark the old methods deprecated in my branch. |
I didn't compare this PR against #3266 but yes, the requested change here is to keep the old API as deprecated and introduce new methods which behave better. |
@martin-g, I would happily see @jmd-z's change (this PR) merge, since he put the effort into it first. @jmd-z, do you plan to make these changes? It's been a few months since your last comment. In the meantime, since we cannot push a commit to their branch to help push this forward, I have made changes in my own branch. Perhaps my PR (#3266) could be considered as an alternative if we don't hear back from @jmd-z? It deprecates all of the old string-only methods ( |
Custom attributes are encoded as JSON to avoid ambiguity with simple string values.
What is the purpose of the change
Fixes exception from C++ library when compiling schemas with non-string custom attributes.
Verifying this change
This change added tests and can be verified as follows:
Documentation