-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
escape non-breaking space as \20, as required in RFC 7613, section 7.3 #21
base: master
Are you sure you want to change the base?
escape non-breaking space as \20, as required in RFC 7613, section 7.3 #21
Conversation
Thanks for your contribution. :) Unfortunately I am not sure if this is correct: |
Hi @Flowdalic thank you for this feedback. |
I did dig deeper in the specs and found a few references...
To be honest, I find it very difficult to understand the various cross-references about characters classes, e.g. spaces. Also, I don't understand what is actually meant in https://tools.ietf.org/html/rfc7564#section-9.14 ... there is no reference, what is meant by What I found as well in the above-cited RFC 7564 https://tools.ietf.org/html/rfc7564#section-5.3 "A note about spaces" ... they explicitly mention that the identifier part should not contain spaces. I tend to say, that when the previous implementation used "Character.isWhitespace()" they already included the general idea about excluding whitespaces from localpart. But a few specific ones were not yet included, but falling IMHO clearly into the spaces category. (clarification question:) |
From RFC 7563: "…core properties defined by the Unicode Standard". See also https://unicode.org/reports/tr44/
Right, but if you simply map them to Could you elaborate a bit about the motivation for this PR? What was the trigger?
I am not sure if it is correct or not. It is underspecified in XEP-0106 if '<space>' means U+0020, or all of whitespace. Since it is escaped to
I'd just escape U+0020. Because even after thinking a little while about it, I could not come up with a reason to escape more whitespace characters to |
👍 thanks. IMHO, this reference to Unicodes {Zs} "feels" a bit unsharp, considering that these categories are prone to change...
Very true. Information gets lost and reverse-transformation/decoding produces different results.
I'm participating 'Hacktoberfest' and stumbled upon jxmpp. Frankly said, my original intent was just to add some tests and increase code coverage. Then I found this issue.
Fair point. |
@Flowdalic may I ask, if you have any opinion on my previous comment? |
Right
If "towards this" means replacing the isWhitespace check with |
The method XmppStringUtils.escapeLocalpart(String localpart)
does not escape non-breaking spaces.
But RF7613 section 7.3 is explicitly mentioning, that all spaces should be escaped as \20.
This PR adds tests and incl. the fix.
Any feedback welcome.