Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode stderr bytes before logging #33

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions pysipp/report.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,11 @@ def emit_logfiles(agents2procs, level='warn', max_lines=100):
emit = getattr(log, level)
for ua, proc in agents2procs:

# print stderr
emit("stderr for '{}' @ {}\n{}\n".format(
ua.name, ua.srcaddr, proc.streams.stderr))
# log stderr
stderr = proc.streams.stderr # bytes in py3
stderr = stderr.decode() if isinstance(stderr, bytes) else stderr
emit("stderr for '{}' @ {}\n{}\n".format(ua.name, ua.srcaddr, stderr))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels wrong.

stderr was also bytes in py2 (non-unicode-str; there is no future import unicode_literals). But once you decode it, it would be unicode, and conflict with the emit(bytestring.format())

$ cat 1.py 
from subprocess import Popen, PIPE
p = Popen(r"/usr/bin/printf '\xc3\xa1' >&2", shell=True, stderr=PIPE, stdout=PIPE)
out, err = p.communicate()
print(repr(err))
stderr = err
print(repr(stderr.decode()))
$ python 1.py 
'\xc3\xa1'
Traceback (most recent call last):
  File "1.py", line 6, in <module>
    print(repr(stderr.decode()))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
$ python3 1.py 
b'\xc3\xa1'
'á'

But this should work:

stderr = stderr.decode() if not isinstance(stderr, str) else stderr

^-- py2: str==bytes=>noop, py3: str!=bytes=>decode

And the bytestring.format() doesn't choke either.

Howver. Always decoding a bytestring as utf-8 is not so safe. Better do decode with 'replace':

>>> bytes([0x41, 0x81, 0xc3, 0xa1]).decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 1: invalid start byte
>>> bytes([0x41, 0x81, 0xc3, 0xa1]).decode('utf-8', 'replace')
'A�á'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the default encoding changed from ascii to utf-8 with python3.

To go back to your code snippet above, this works on both py2 and py3:

from subprocess import Popen, PIPE
p = Popen(r"/usr/bin/printf '\xc3\xa1' >&2", shell=True, stderr=PIPE, stdout=PIPE)
out, err = p.communicate()
print(repr(err))
stderr = err
print(repr(stderr.decode('utf-8', 'replace')))

When run:

$ python2 ./foo.py 
'\xc3\xa1'
u'\xe1'
$ python3 ./foo.py 
b'\xc3\xa1'
'á'

So you don't even need the isinstance check.

Copy link
Member

@vodik vodik Dec 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errata: its utf-8 since python3.5.

And probably more future proof too since python 3.7 is going to lean on utf-8 more:

I mean, its my humble opinion that if your system isn't set up to support utf8 these days, you're running a misconfigured system 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you don't even need the isinstance check.

Well, we still do, because:

print('{}'.format(stderr.decode('utf-8', 'replace')))
Traceback (most recent call last):
  File "1.py", line 6, in <module>
    print('{}'.format(stderr.decode('utf-8', 'replace')))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0: ordinal not in range(128)

The emit() gets a bytestring in py2, which doesn't do unicode.

I mean, its my humble opinion that if your system isn't set up to support utf8 these days, you're running a misconfigured system

Absolutely, but we don't want to error out because an application sends invalid utf-8.

Copy link
Member

@vodik vodik Dec 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because you can't embed a unicode string into a ascii format string (well, implicit encoding happens). Adding a u to the format string also makes it work on both:

print(u'{}'.format(stderr.decode('utf-8', 'replace')))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point exactly :)

Although I did not dive into emit() to see if it takes an unicodestring in py2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so I think we're on the same page.

Just personally I think I'd rather see the underlying format string adjusted than a if isinstance(...) expression.


# FIXME: no idea, but some logs are not being printed without this
# logging mod bug?
time.sleep(0.01)
Expand Down