Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only show stack trace in debug mode #1860

Merged
merged 43 commits into from
Dec 8, 2023
Merged

Conversation

aaronatp
Copy link
Contributor

@aaronatp aaronatp commented Nov 18, 2023

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

This PR creates a new function last_resort_exception_handler to be a last line of defense for exceptions that are not otherwise handled. It will show the stack trace in dev mode (if ran with the -d or --debug flags).

EDIT: I'll work on making this code pass the tests!

capa/main.py Outdated Show resolved Hide resolved
capa/main.py Outdated Show resolved Hide resolved
@aaronatp
Copy link
Contributor Author

aaronatp commented Nov 19, 2023

Hi @williballenthin thank you for these comments! I certainly see your point about avoiding global variables, and have rewritten the code to avoid them and to make the decision logic in the exception handler depend on argparse. I have pushed a new draft of the code, but I have some questions about it:

  1. In this new draft, I have delayed making the sys.excepthook statement until after the parser is instantiated. One concern - would exceptions raised before the sys.excepthook statement not be covered by the custom handler?

  2. In most uses of sys.excepthook that I have seen, a function object is passed to sys.excepthook (as opposed to a function call). I don't think it would make a difference passing parameters to the last_resort_exception_handler function before assigning it to sys.excepthook right? The things I have read indicate that this shouldn't be an issue but I am not 100% sure.

  3. The last_resort_exception_handler definition contains four parameters. When last_resort_exception_handler is assigned to sys.excepthook, it contains one variable (args) and three variables that have not yet been instantiated (the exception type, exception instance, and the traceback). As a result, I thought I should use the *args special syntax to pass these not-yet-instantiated variables. One concern is that I use another term exception_information which is non-conventional (because 'args' is used elsewhere in this program). Another concern is that (as I understand it) *args is usually reserved for variable-length lists, but (if the exception handler is ever invoked) we know it will contain 3 variables. As a result, is there a better way to write this?

EDIT: I have learned that *exception_information cannot be used as a placeholder for the Exception type, Exception instance, and Traceback object, before those objects are initialized. One option would be to initialize a global variable containing those three values to (None, None, None) and update it whenever the last_resort_exception_handler is called, but this uses the very code construct (global variables) it aims to avoid. This would look like this:

def last_resort_exception_handler(exctype, value, traceback):
    global exception_information
    exception_information = (exctype, value, traceback)
    
    if "-d" or "--debug" in args:
        sys.__excepthook__(*exception_information)  
    else:
        print(f"Unexpected exception raised: {exception_information[0]}. Please run capa in Python development mode "
            "to see the stack trace (https://docs.python.org/3/library/devmode.html#devmode). "
             "Please also report your issue on the capa GitHub page so we can improve the code! "
             "(https://github.com/mandiant/capa/issues)")


exception_information = (None, None, None)
sys.excepthook = last_resort_exception_handler(args, *exception_information)

However, using a class may help to store and set the exception_information values. I'll draft a prototype and see if this can avoid using global variables here.

@aaronatp
Copy link
Contributor Author

aaronatp commented Nov 23, 2023

Hi @williballenthin what do you think of this class-based replacement for sys.excepthook?

class LastResortExceptionHandler:
    def __init__(self, cmd_line_args):
        self.exctype = None
        self.value = None
        self.traceback = None
        self.cmd_line_args = cmd_line_args
  
    @property
    def update_exception_traceback_details(self):
        return (self.exctype, self.value, self.traceback)
  
    @update_exception_traceback_details.setter
    def update_exception_traceback_details(self, exctype, value, traceback)
        self.exctype = exctype
        self.value = value
        self.traceback = traceback
  
    def __call__(self, exctype, value, traceback):
        self.update_exception_traceback_details(exctype, value, traceback)
        if "-d" or "--debug" in self.cmd_line_args:
            sys.__excepthook__(self.exctype, self.value, self.traceback)
        else:
            print(f"Unexpected exception raised: {self.exctype}. Please run capa in Python development mode "
            "to see the stack trace (https://docs.python.org/3/library/devmode.html#devmode). "
             "Please also report your issue on the capa GitHub page so we can improve the code! "
             "(https://github.com/mandiant/capa/issues)")
  
  
sys.excepthook = LastResortExceptionHandler(args) # Line 1131 of main.py -> args = parser.parse_args(args=argv)

The __init__ method is supposed to initialize the sys.excepthook parameters. According to the documentation, sys.excepthook must take a callable function if it replaced. As a result, this class has a __call__ method, which: 1) updates the default sys.excepthook parameters (with the self.update_exception_traceback_details(exctype, value, traceback) line), and then 2) proceeds with the exception handling logic (checking if the -d or --debug command line options are used, and finishing accordingly). The property-decorated update_exception_traceback_details method is supposed to set new sys.excepthook parameters. What do you think of this? Are any parts of the code unclear?

@aaronatp
Copy link
Contributor Author

aaronatp commented Dec 1, 2023

Hi @williballenthin I hope your week is going well. Are there things you think I should still improve about this code?

Copy link
Collaborator

@williballenthin williballenthin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concepts are good, suggestions inline.

i'm going to be mostly offline for a few weeks. @mr-tz would you please help with final reviews?

capa/main.py Outdated
@@ -1133,6 +1154,8 @@ def main(argv: Optional[List[str]] = None):
if ret is not None and ret != 0:
return ret

sys.excepthook = last_resort_exception_handler(args, *exception_information)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does exception_information come from? i don't see it in scope (but im on mobile so maybe i missed it).

capa/main.py Outdated
value (str): exception type
traceback (TracebackType): the Type of a traceback object
"""
if "-d" or "--debug" in args:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use argparse parsed data, like if args.debug: ...

capa/main.py Outdated
@@ -1087,6 +1088,26 @@ def handle_common_args(args):
args.signatures = sigs_path


def last_resort_exception_handler(args, exctype, value, traceback) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative design is to check the args in main and install one of two different functions, so you don't have to parse args into the handler

@aaronatp
Copy link
Contributor Author

aaronatp commented Dec 2, 2023

Hi @williballenthin thank you for this feedback, your suggestions are very helpful! I apologize - I did not ask my question clearly above. Would you or @mr-tz be able to review this class-based implementation (also described in a comment above)? If so, please let me know if there are any issues with this implementation, or if you have any further suggestions :)

class LastResortExceptionHandler:
    def __init__(self, cmd_line_args):
        self.exctype = None
        self.value = None
        self.traceback = None
        self.cmd_line_args = cmd_line_args
  
    @property
    def update_exception_details(self):
        return (self.exctype, self.value, self.traceback)
  
    @update_exception_details.setter
    def update_exception_details(self, exctype, value, traceback)
        self.exctype = exctype
        self.value = value
        self.traceback = traceback
  
    def __call__(self, exctype, value, traceback):
        self.update_exception_details(exctype, value, traceback)
        if "-d" or "--debug" in self.cmd_line_args:
            sys.__excepthook__(self.exctype, self.value, self.traceback)
        else:
            print(f"Unexpected exception raised: {self.exctype}. Please run capa in Python development mode "
            "to see the stack trace (https://docs.python.org/3/library/devmode.html#devmode). "
             "Please also report your issue on the capa GitHub page so we can improve the code! "
             "(https://github.com/mandiant/capa/issues)")
  
  
sys.excepthook = LastResortExceptionHandler(args) # Line 1131 of main.py -> args = parser.parse_args(args=argv)

Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see this section https://www.bitecode.dev/i/125718336/you-are-not-the-only-one-doing-this for an alternative, rich may be overkill, but pretty-traceback could be nice. Both can also give other insight on how to do this.

capa/main.py Outdated
if "-d" or "--debug" in args:
sys.__excepthook__(exctype, value, traceback)
else:
print(f"Unexpected exception raised: {exctype}. Please run capa in Python development mode "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(f"Unexpected exception raised: {exctype}. Please run capa in Python development mode "
print(f"Unexpected exception raised: {exctype}. Please run capa in debug mode (-d/--debug)"

capa/main.py Outdated
sys.__excepthook__(exctype, value, traceback)
else:
print(f"Unexpected exception raised: {exctype}. Please run capa in Python development mode "
"to see the stack trace (https://docs.python.org/3/library/devmode.html#devmode). "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"to see the stack trace (https://docs.python.org/3/library/devmode.html#devmode). "
"to see the stack trace"

@williballenthin
Copy link
Collaborator

ah i see i'm sorry @aaronatp for misunderstanding

i see now from your class-based design that you're trying to access both CLI args and the exception information from a function that only receives the exception information. using a class for this is reasonable, but as you've found, needs a good bit of code (20-30 lines of variables shuffled around).

there are two design patterns i'd recommend, both derived from closures. a closure is a function that has access to data that was in scope when the function was created. it's a way to bind some data to logic that will be invoked later.

def make_hook(args):
  def hook(ex1, ex2, ex3):
    if args.debug:
      print(ex1)
    else:
      print("there was an error, use debug mode")

  return hook

sys.excepthook = make_hook(args)

you could also use functools.partial for this:

https://docs.python.org/3/library/functools.html#functools.partial

def handler(args, ex1, ex2, ex3):
  ...

sys.excepthook = functools.partial(handler, args)

sorry i'm on mobile so im a little terse and don't have good references. if the above isn't immediately obvious, take a look at some of the blog posts/tutorials on using closures from Python. i'm also happy to explain more (as i have time).

@aaronatp
Copy link
Contributor Author

aaronatp commented Dec 3, 2023

Hi @mr-tz and @williballenthin, thank you both for your feedback! I have implemented the closure that @williballenthin suggested and have included that below. I also included a try-except statement for worst-case-scenario error handling. I have also edited the error message in line with your suggestions. Please let me know if you have any further questions or suggestions :)

I also had a look at the pretty_traceback library and it looks interesting! I noticed, however, that the codebase is not frequently updated and it has few maintainers. (The last commit was September 7th.) Do you think that this could potentially lead to more headaches down the line? For example, if newer versions of python are released that are not backwards compatible with pretty_traceback. But if you don't think this is a significant issue, please let me know - I would be happy to implement it in capa!

def last_resort_exception_handler(cmd_line_args):
    exctype, value, traceback = None
    def handle_exception_information(cmd_line_args):
        try:
            nonlocal exctype, value, traceback              
            if "-d" or "--debug" in cmd_line_args:
                return sys.__excepthoook__(exctype, value, traceback)
            else:
                print(f"Unexpected exception raised: {exctype}. Please run capa in debug mode (-d/--debug) "
                      "to see the stack trace. Please also report your issue on the capa GitHub page so we "
                      "can improve the code! (https://github.com/mandiant/capa/issues)")

        except Exception as e: # pylint: disable=broad-except
            logger.error("%s", str(e))
            return # should this return an object like some other capa try-except statements, e.g., E_CORRUPT_FILE or E_INVALID_FILE_TYPE?

    return handle_exception_information

sys.excepthook = last_resort_exception_handler(args) # Line 1131 of main.py -> args = parser.parse_args(args=argv)

@mr-tz
Copy link
Collaborator

mr-tz commented Dec 3, 2023

Awesome, can you please update the PR so we can discuss/review inline and see the CI status?

Thanks for researching the maintenance status and other details on the library. If it's not much code we should just keep it in capa so we have full control and can update it easily. I feel we do not need prettier stack traces.

Replacement for the default sys.excepthook function. Prints traceback in debug mode. Otherwise, prints helpful message.
@aaronatp
Copy link
Contributor Author

aaronatp commented Dec 4, 2023

Hi @mr-tz thanks for your comments! I have updated the code and will now work on making sure it passes the tests. Please let me know if you have any thoughts :)

Also no worries about the pretty_traceback, thank you for letting me know!

EDIT: Also, one thing to note is that sys.excepthook will not be replaced until after the command line arguments are parsed and the args variable is created in the main function. As a result, the main function will use the default sys.excepthook function up to that point. Although the custom exception handler could be technically replaced as soon as the main function runs, we would have to use a global variable, which is not great. The current solution tries to balance these considerations - while it does not replace sys.excepthook immediately, it avoids global variables.

Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is coming together nicely, just a few remaining comments before we can merge (I think)

capa/main.py Outdated
Comment on lines 991 to 994
args (argparse.Namespace): parsed command-line arguments
exctype (Exception): exception class
value (str): exception instance
traceback (TracebackType): a traceback object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively/additionally set the types in the function so mypy can do type checking

capa/main.py Outdated
def handle_exception_information(args):
try:
nonlocal exctype, value, traceback
if "-d" or "--debug" in args:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is args parsed at this point, so can we do args.debug?

capa/main.py Outdated

except Exception as e: # pylint: disable=broad-except
logger.error("%s", str(e))
return # should this return an object like some other capa try-except statements, e.g., E_CORRUPT_FILE or E_INVALID_FILE_TYPE?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if this returns the value that is then used as program return code

since it's not known an error name that indicates this would be good, e.g. E_UNHANDLED?

@aaronatp aaronatp changed the title Only show stack trace in dev mode Only show stack trace in debug mode Dec 4, 2023
@aaronatp
Copy link
Contributor Author

aaronatp commented Dec 6, 2023

Hi @mr-tz, for the custom exception handler function test, I have been thinking about: 1) checking whether a traceback is printed if the user uses the debugging flag; and, 2) whether the traceback was produced by the custom handler (the default output of sys.excepthook is printing a traceback so we need to distinguish sys.excepthook's traceback from the custom handler in order to see if the custom handler is working correctly).

The first goal seems quite achievable. The second goal is giving me more trouble. Assuming a traceback is printed, we want to see whether the custom handler called the "sys.excepthook" traceback-printing function, or whether the default "sys.excepthook" was the one that called it. I thought that we could distinguish the custom handler traceback from the default traceback by decorating the custom handler with a function that prints some extra words, such as "The custom handler was called." This would enable us to confirm that a traceback was produced by the custom handler.

However, we replace sys.excepthook with a function whose variables are unbounded. In other words, the replacement function's variables are not defined or initialized. This fact has made it somewhat difficult to implement a decorator for the custom handler because decorators usually modify a function's behavior, but when we decorate an unbounded function's variables, we make no reference to its variables. From my research, one solution may be to use metaclasses, but that seems like disproportionately heavy machinery.

There may be other ways of distinguishing the custom handler's traceback from the default sys.excepthook's, but I have not found any yet, and they may be much more cobbled together. Can you help me understand how I can create a decorator that's applied to a function with unbounded variables that also returns an unbounded function? This would help me distinguish the custom handler's traceback from the default sys.excepthook's, and therefore conclude that the custom handler is functioning correctly.

@mr-tz
Copy link
Collaborator

mr-tz commented Dec 6, 2023

Can we look for certain output parts that we always expect for the respective handler? Let's keep it as simple as possible.

@aaronatp
Copy link
Contributor Author

aaronatp commented Dec 8, 2023

Hi @mr-tz, I looked at some examples of sys.excepthook in other projects and have restructured the function here based on what I saw. I think it's now simpler and better. Now, the custom handler only prints a simple informative message - however, the custom handler is only used if the program is not run in debug mode. In debug mode, the default sys.excepthook is used, which prints a traceback. This is similar to Meta's Buck repo (https://github.com/facebook/buck/blob/9c7c421e49f4d92d67321f18c6d1cd90974c77c4/python-dsl/buck_parser/buck.py#L2299).

When we simplify the custom handler, we may not need to test it. Since sys.excepthook (respectively, the custom handler) is only invoked when there is an unhandled exception, and essentially terminates a program, I have had trouble figuring out how to: 1) run the custom handler in a test; and then, 2) continue the rest of the tests. When we make custom handler essentially a print statement, we may not need to design tests for it. Indeed, repos that use custom sys.excepthook handlers often implement simple handlers, but I have not seen any repos test the handlers specifically. What do you think of this? Do you see any simple way we could test the custom handler but continue executing the program afterwards?

EDIT: Also, there is a type annotation issue with the exctype parameter of the custom exception handler. The default sys.excepthook function expects exctype to have a "type[BaseException]" type annotation. However, since the built-in type() function was not subscriptable in Python 3.8, using that type annotation would not build on Python 3.8 platforms. Something like "Union[type[BaseException], typing.Type[BaseException]]" would fail to build for the same reason.

Although it would still cause an issue for the mypy test, one option would be to not give exctype any type annotation in the custom handler's function definition. However, even though it's unlikely, I am wondering whether this could be a security issue if someone manages to write any data type to the exctype parameter of the sys.excepthook function. What do you suggest?

Copy link
Collaborator

@mr-tz mr-tz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, this is much simpler and I don't think we need tests here, please address my comments (and the CI fails) and then we can merge

capa/main.py Outdated
@@ -112,6 +113,7 @@
E_MISSING_CAPE_STATIC_ANALYSIS = 21
E_MISSING_CAPE_DYNAMIC_ANALYSIS = 22
E_EMPTY_REPORT = 23
E_UNHANDLED = 24
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
E_UNHANDLED = 24

capa/main.py Outdated
traceback: TracebackType,
):
"""
if not in debug mode, prints helpful message on unhandled exceptions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not in debug mode, prints helpful message on unhandled exceptions
prints friendly message on unhandled exceptions to regular users, debug mode shows regular stack trace

capa/main.py Outdated
Comment on lines 991 to 993
args:
# TODO(aaronatp): Once capa drops support for Python 3.8, move this to the type annotation in the function parameters
exctype (type[BaseException]): exception class
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still accurate or can it be removed?

@aaronatp
Copy link
Contributor Author

aaronatp commented Dec 8, 2023

Hi @mr-tz, thank you for your comments! I have just fixed the first 2 of 3. For the third one, I am working to resolve a type annotation issue with the exctype parameter of the custom exception handler. The default sys.excepthook function expects exctype to have a "type[BaseException]" type annotation. However, since the built-in type() function was not subscriptable in Python 3.8, using that type annotation would not build on Python 3.8 platforms. Something like "Union[type[BaseException], typing.Type[BaseException]]" would fail to build for the same reason.

One option would be to not give exctype any type annotation in the custom handler's function definition (although this would cause the mypy test to fail). Even though it's unlikely, I am wondering whether this could be a security issue if someone manages to write any data type to the exctype parameter of the sys.excepthook function. What do you suggest?

@aaronatp
Copy link
Contributor Author

aaronatp commented Dec 8, 2023

Hi @mr-tz, after reading a bit further, I see it is possible to disable specific mypy warnings. I have just done that - please let me know if you would rather I not disable it!

@mr-tz
Copy link
Collaborator

mr-tz commented Dec 8, 2023

Ignoring here is fine. I think we also need an issue and link for the TODO in the code.

@aaronatp
Copy link
Contributor Author

aaronatp commented Dec 8, 2023

@mr-tz great thank you, just done that! Please let me know if you have any more comments!

capa/main.py Outdated
Comment on lines 993 to 997
print(
f"Unexpected exception raised: {exctype}. Please run capa in debug mode (-d/--debug) "
+ "to see the stack trace. Please also report your issue on the capa GitHub page so we "
+ "can improve the code! (https://github.com/mandiant/capa/issues)"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(
f"Unexpected exception raised: {exctype}. Please run capa in debug mode (-d/--debug) "
+ "to see the stack trace. Please also report your issue on the capa GitHub page so we "
+ "can improve the code! (https://github.com/mandiant/capa/issues)"
)
if exctype is KeyboardInterrupt:
print("KeyboardInterrupt detected, program terminated")
else:
print(
f"Unexpected exception raised: {exctype}. Please run capa in debug mode (-d/--debug) "
+ "to see the stack trace. Please also report your issue on the capa GitHub page so we "
+ "can improve the code: https://github.com/mandiant/capa/issues"
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think of this to also handle CTRL-C (and not say unhandled exception there)

@aaronatp
Copy link
Contributor Author

aaronatp commented Dec 8, 2023

@mr-tz this change is good, thank you!

@mr-tz mr-tz merged commit 8531acd into mandiant:master Dec 8, 2023
25 checks passed
@mr-tz
Copy link
Collaborator

mr-tz commented Dec 8, 2023

Great, thank you!

@williballenthin
Copy link
Collaborator

thank you @aaronatp !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants