Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fixing URL shortening for URLs with Arabic characters
When URLs have unescaped Arabic characters, they are not extracted correctly. This happens not only with the library we use (`twitter-text`), but also with Ruby's `uri` library and `postrank-uri` gem: ``` irb(main):009:0> PostRank::URI.extract('https://fatabyyano.net/هذا-المقطع-ليس-لاشتباكات-حديثة-بين-الج/') => ["https://fatabyyano.net/"] irb(main):010:0> URI.extract('https://fatabyyano.net/هذا-المقطع-ليس-لاشتباكات-حديثة-بين-الج/') => ["https://fatabyyano.net/"] irb(main):011:0> Twitter::TwitterText::Extractor.extract_urls('https://fatabyyano.net/هذا-المقطع-ليس-لاشتباكات-حديثة-بين-الج/') => ["https://fatabyyano.net/"] ``` So, the fix here is to first escape URLs that contain Arabic characters before sending them to the URL extraction method when shortening URLs. Fixes CV2-3690.
- Loading branch information