-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open() utf-8 standardization #75
Comments
Yeah, it seems to be technically correct for every I don't think (unless maybe PEP 597 lands) there's any way to automatically enforce that practice, so it's likely to stay more a fix-it-if-someone-complains thing in this and other projects. Which in turn means, I'm guessing, Windows python users must be mostly setting I don't think we (or any given library) should be testing the locale and complaining at the start. Maybe Python itself should change the default encoding and warn people about it, but users don't need random libraries complaining about their locale setting. Oh, actually a larger fix might be to get the CI tests running on Windows -- I think Github Actions supports that. |
I agree with everything Jack says, per usual. The only other thought I have is that semgrep helps a lot with these kinds of things. They don't currently have a rule requiring an encoding parameter to open(), but I bet they'd be happy to have one. They have a bunch of rules categorized as "python/lang/best-practice" (including one to make sure open() commands are closed): https://github.com/returntocorp/semgrep-rules/tree/develop/python/lang/best-practice If you got such a rule put together in semgrep, we could turn it on for this repo (we use it elsewhere already), and it would throw an error in PR's that don't include the There's a guide here for creating new rules: https://semgrep.dev/docs/writing-rules/overview/ I contributed one a bit back, it was pretty painless. Yaml and Python. |
Awesome, adding it to semgrep and including Windows in the CI both sound good. Although the uncommon few who experience it can just do what Jack mentioned, for now I'll still take a look at adding the rule to semgrep. I have been learning a lot just with this one issue - thank you both! |
How did this go, @munyanjacob ? Any progress? |
I finished writing the rule but have not been able to commit it (appears to also be because of a Windows issue, this time with Windows marking files as executable and Currently downloading WSL to see if using that can help. Windows does not feel good. |
Windows, as you're learning, may be an uphill fight. :) Usually, pre-commit will fix most errors for you. I usually run it once (it fails and fixes things), and then I run it again (and it passes). |
The rule just merged but with severity of warning instead of error. Is a warning going to be enough for requiring an encoding in new PRs? |
Do you guys generally run Linux, Mac, or WSL? I knew it was common for people to use linux/mac for dev, but I thought there would be more Windows users running into problems like mine lol |
I think the severity should be enough. I just enabled semgrep on this repo, so I think we'll see as soon as somebody does a PR with a bad open in it. Good stuff. I also enabled it for our other repos that use semgrep. I run Linux. I don't know what Jack's using. Nobody around here uses Windows that I have noticed in awhile. I usually only find out they do because something is broken for them (like this). I like Ubuntu, FWIW! |
Extending the discussion from #74 to a new issue:
I do think that testing to make sure people are using utf-8 makes sense, because I also think it's likely for locale encoder differences to be a future problem if locale encoders aren't ensured to be utf-8 or every
open()
is not specified to use utf-8.Because the page linked said it would be best practice to specify, is this something we should update other instances of
open()
to have & require in the future? It sounds simpler to test for utf-8 once at the start, but best practices are best practices so idk which way to proceedOriginally posted by @munyanjacob in #74 (comment)
The text was updated successfully, but these errors were encountered: