Decode stderr bytes before logging #33

goodboy · 2017-12-04T15:54:11Z

On py3 we get bytes back from the stderr output. Decode to a native str before logging to keep a readable output.

wdoekes · 2017-12-11T13:39:56Z

pysipp/report.py

+        # log stderr
+        stderr = proc.streams.stderr  # bytes in py3
+        stderr = stderr.decode() if isinstance(stderr, bytes) else stderr
+        emit("stderr for '{}' @ {}\n{}\n".format(ua.name, ua.srcaddr, stderr))


It feels wrong.

stderr was also bytes in py2 (non-unicode-str; there is no future import unicode_literals). But once you decode it, it would be unicode, and conflict with the emit(bytestring.format())

$ cat 1.py from subprocess import Popen, PIPE p = Popen(r"/usr/bin/printf '\xc3\xa1' >&2", shell=True, stderr=PIPE, stdout=PIPE) out, err = p.communicate() print(repr(err)) stderr = err print(repr(stderr.decode()))

$ python 1.py '\xc3\xa1' Traceback (most recent call last): File "1.py", line 6, in <module> print(repr(stderr.decode())) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

$ python3 1.py b'\xc3\xa1' 'á'

But this should work:

stderr = stderr.decode() if not isinstance(stderr, str) else stderr

^-- py2: str==bytes=>noop, py3: str!=bytes=>decode

And the bytestring.format() doesn't choke either.

Howver. Always decoding a bytestring as utf-8 is not so safe. Better do decode with 'replace':

>>> bytes([0x41, 0x81, 0xc3, 0xa1]).decode() Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 1: invalid start byte >>> bytes([0x41, 0x81, 0xc3, 0xa1]).decode('utf-8', 'replace') 'A�á'

This is because the default encoding changed from ascii to utf-8 with python3.

To go back to your code snippet above, this works on both py2 and py3:

from subprocess import Popen, PIPE p = Popen(r"/usr/bin/printf '\xc3\xa1' >&2", shell=True, stderr=PIPE, stdout=PIPE) out, err = p.communicate() print(repr(err)) stderr = err print(repr(stderr.decode('utf-8', 'replace')))

When run:

$ python2 ./foo.py '\xc3\xa1' u'\xe1' $ python3 ./foo.py b'\xc3\xa1' 'á'

So you don't even need the isinstance check.

Errata: its utf-8 since python3.5.

And probably more future proof too since python 3.7 is going to lean on utf-8 more:

https://www.python.org/dev/peps/pep-0538/

https://www.python.org/dev/peps/pep-0540/

I mean, its my humble opinion that if your system isn't set up to support utf8 these days, you're running a misconfigured system 😉

So you don't even need the isinstance check.

Well, we still do, because:

print('{}'.format(stderr.decode('utf-8', 'replace')))

Traceback (most recent call last): File "1.py", line 6, in <module> print('{}'.format(stderr.decode('utf-8', 'replace'))) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0: ordinal not in range(128)

The emit() gets a bytestring in py2, which doesn't do unicode.

I mean, its my humble opinion that if your system isn't set up to support utf8 these days, you're running a misconfigured system

Absolutely, but we don't want to error out because an application sends invalid utf-8.

That's because you can't embed a unicode string into a ascii format string (well, implicit encoding happens). Adding a u to the format string also makes it work on both:

print(u'{}'.format(stderr.decode('utf-8', 'replace')))

My point exactly :)

Although I did not dive into emit() to see if it takes an unicodestring in py2.

Yeah, so I think we're on the same page.

Just personally I think I'd rather see the underlying format string adjusted than a if isinstance(...) expression.

Decode stderr bytes before logging

adccf18

goodboy requested review from vodik and wdoekes December 4, 2017 15:54

wdoekes reviewed Dec 11, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decode stderr bytes before logging #33

Decode stderr bytes before logging #33

goodboy commented Dec 4, 2017

wdoekes Dec 11, 2017

vodik Dec 11, 2017

vodik Dec 11, 2017 •

edited

Loading

wdoekes Dec 12, 2017

vodik Dec 12, 2017 •

edited

Loading

wdoekes Dec 13, 2017

vodik Dec 13, 2017

Decode stderr bytes before logging #33

Are you sure you want to change the base?

Decode stderr bytes before logging #33

Conversation

goodboy commented Dec 4, 2017

wdoekes Dec 11, 2017

Choose a reason for hiding this comment

vodik Dec 11, 2017

Choose a reason for hiding this comment

vodik Dec 11, 2017 • edited Loading

Choose a reason for hiding this comment

wdoekes Dec 12, 2017

Choose a reason for hiding this comment

vodik Dec 12, 2017 • edited Loading

Choose a reason for hiding this comment

wdoekes Dec 13, 2017

Choose a reason for hiding this comment

vodik Dec 13, 2017

Choose a reason for hiding this comment

vodik Dec 11, 2017 •

edited

Loading

vodik Dec 12, 2017 •

edited

Loading