Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

premailer doesn't appear to work on m1 macs #249

Open
davidfwatson opened this issue Feb 23, 2021 · 18 comments
Open

premailer doesn't appear to work on m1 macs #249

davidfwatson opened this issue Feb 23, 2021 · 18 comments

Comments

@davidfwatson
Copy link

Running premailer on an m1 mac with pretty standard html content works correctly on my 2019 imac, but on my m1 mac, it results in garbage output:

❯ python -m premailer -f example_html.html

h t m l l a n g = " e n " >

%
@gdvalderrama
Copy link

@davidfwatson could you add the content of example_html.html or some other file that reproduces the problem?

@davidfwatson
Copy link
Author

I think the issue is unicode characters! I've created an example, and attached it.
example_html.html.zip

@peterbe
Copy link
Owner

peterbe commented Jun 21, 2021

I think the issue is unicode characters! I've created an example, and attached it.
example_html.html.zip

import premailer

with open('/Users/peterbe/Downloads/example_html.html/example_html.html') as f:
   html = f.read()

out = premailer.transform(
    html,
)
print(out)

outputs:

<html lang="en">
    <head>
        <meta http-equiv="Content-Type" content="text/html charset=UTF-8">
        <title>Example title</title>
    </head>
    <body>
🌐
    </body>
</html>

By the way, it's supposed to be:

-<meta http-equiv="Content-Type" content="text/html charset=UTF-8" />
+<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

in case that matters to your other tooling.

@davidfwatson
Copy link
Author

So, on my m1 mac, I don't get that, output, I get this:

❯ python3 -m premailer -f example_html.html
<html><head></head><body><p>h   t   m   l       l   a   n   g   =   "   e   n   "   &gt;
                   </p></body></html>%

@peterbe
Copy link
Owner

peterbe commented Jun 23, 2021

Here's what I get:

▶ python3 -m premailer -f /Users/peterbe/Downloads/example_html.html/example_html.html ; echo
<html lang="en">
    <head>
        <meta http-equiv="Content-Type" content="text/html charset=UTF-8">
        <title>Example title</title>
    </head>
    <body>
🌐
    </body>
</html>

I have...:

▶ python3 --version
Python 3.8.1
▶ pip list | rg lxml
lxml              4.5.0

What do you have?

And do you have the file program to check the encoding of the file? This is what I get:

▶ file /Users/peterbe/Downloads/example_html.html/example_html.html
/Users/peterbe/Downloads/example_html.html/example_html.html: HTML document text, UTF-8 Unicode text

@davidfwatson
Copy link
Author

❯ python3 --version
Python 3.9.2
❯ pip list | grep lxml
lxml                     4.6.2
❯ file example_html.html
example_html.html: HTML document text, UTF-8 Unicode text

@peterbe
Copy link
Owner

peterbe commented Jun 24, 2021

@davidfwatson Just for sanity checking, what do you get when you run:

cat example_html.html

@davidfwatson
Copy link
Author

❯ cat example_html.html
<html lang="en">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
        <title>Example title</title>
    </head>
    <body>
🌐
    </body>
</html>%

@peterbe
Copy link
Owner

peterbe commented Jun 25, 2021

I'm at loss then. I don't know what could be going on.

I have seen really strange behaviors coming out of lxml before when emojiis are involved.
It would be nice to be able to understand when that HTML string, after being read in, becomes garble. And if any of that is related to premailer or somewhere else.

@davidfwatson
Copy link
Author

davidfwatson commented Jun 25, 2021 via email

@peterbe
Copy link
Owner

peterbe commented Jun 28, 2021 via email

@davidfwatson
Copy link
Author

I can give that a shot, but just to be clear, I ran it on my intel mac with identical versions and it did work. I'll try to find time to match your versions and rerun today, but I suspect the result will be the same.

@securibee
Copy link

securibee commented Jul 5, 2021

I'm running into this identical issue on a M1 mac as well.

python3 --version
Python 3.8.2
pip list | grep lxml
lxml 4.6.3

@peterbe
Copy link
Owner

peterbe commented Jul 6, 2021

Can we try to figure out if premailer is using lxml in a way that can be fixed for m1 macs? Or is it a hard bug in lxml and if so do we have a tracker URL for that?

@securibee
Copy link

I downgraded lxml and it's working for me with version 4.5.0, give that a go @davidfwatson.

@davidfwatson
Copy link
Author

Sorry to report, but it doesn't appear to have made a difference for me:

❯ pip3 list | grep lxml

lxml       4.5.0

❯ cat example_html.html
<html lang="en">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
        <title>Example title</title>
    </head>
    <body>
🌐
    </body>
</html>%

❯ python3 -m premailer -f example_html.html
<html><head></head><body><p>h   t   m   l       l   a   n   g   =   "   e   n   "   &gt;
                   </p></body></html>%

@laurikari
Copy link

This does appear to be a bug in lxml, and the most recent version 4.8.0 is still affected.

Here's the lxml bug: https://bugs.launchpad.net/lxml/+bug/1949271

The bug report reveals that a workaround is to use UTF-16 or UTF-32 instead of UTF-8.

Instead of

html = transform(html)

this works for me:

parsed = etree.fromstring(html.strip().encode('utf-32'), etree.HTMLParser())
html = etree.tostring(transform(parsed), method='html', encoding='utf-8').decode()

It ain't pretty, but at least it works.

@medmunds
Copy link

I'm able to use html entity encoding as a workaround. (I guess anything that avoids lxml having to deal with utf-8 input...)

>>> from premailer import transform
>>> html = "<p>🌐</p>"

>>> transform(html)
Traceback (most recent call last):
  ...
  File ".../python3.11/site-packages/premailer/premailer.py", line 353, in transform
    tree = etree.fromstring(stripped, parser).getroottree()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'getroottree'

>>> html.encode("ascii", "xmlcharrefreplace").decode("ascii")
'<p>&#127760;</p>'

>>> transform(html.encode("ascii", "xmlcharrefreplace").decode("ascii"))
'<html>\n<head></head>\n<body><p>🌐</p></body>\n</html>\n'

Premailer 3.10.0, lxml 4.9.2, Python 3.11.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants