Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New codemod: safe-lxml-parsing #64

Merged
merged 3 commits into from
Oct 6, 2023
Merged

New codemod: safe-lxml-parsing #64

merged 3 commits into from
Oct 6, 2023

Conversation

clavedeluna
Copy link
Contributor

@clavedeluna clavedeluna commented Oct 6, 2023

Overview

A new codemod that will check calls to lxml.etree.parse and lxml.etree.fromstring

Description

  • internal docs
  • These two functions use a default unsafe parser, so we will set a safe one. If a user does assign a parser, the safe-lxml-parser-defaults codemod will kick in (if enabled)

@codecov-commenter
Copy link

codecov-commenter commented Oct 6, 2023

Codecov Report

Merging #64 (62d4046) into main (1c506b5) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #64      +/-   ##
==========================================
+ Coverage   95.82%   95.86%   +0.04%     
==========================================
  Files          45       46       +1     
  Lines        1676     1694      +18     
==========================================
+ Hits         1606     1624      +18     
  Misses         70       70              
Files Coverage Δ
src/core_codemods/__init__.py 100.00% <100.00%> (ø)
src/core_codemods/lxml_safe_parser_defaults.py 100.00% <ø> (ø)
src/core_codemods/lxml_safe_parsing.py 100.00% <100.00%> (ø)

@clavedeluna clavedeluna changed the title codemod safe lxml parsing New codemod: safe-lxml-parsing Oct 6, 2023
var = "hello"
"""
expexted_output = f"""from lxml.etree import {func}
import lxml
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will point out that this double import is not idea, but it does work and is valid code. Our other codemods (tempfile) also do this at this time.

@clavedeluna clavedeluna marked this pull request as ready for review October 6, 2023 12:35

def on_result_found(self, original_node, updated_node):
self.remove_unused_import(original_node)
self.add_needed_import("lxml")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is sufficient:

>>> import lxml
>>> lxml.etree.XMLParser()
AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 lxml.etree.XMLParser

AttributeError: module 'lxml' has no attribute 'etree'

I think it needs to be import lxml.etree.

Not trying to pick on your PR in particular (because it's very nice overall) but imo this is one fairly strong argument for functional integration tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woah, really good catch here. In fact it's so correct there are no examples of import lxml ... lxml.etree online! I will update the tests accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had to change BOTH lxml codemods, great find! Glad I waited for the docs

Copy link
Member

@drdavella drdavella Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really hammers in the need for functional-level tests for me. Any of us could have (and probably have) made similar mistakes, but it's really hard to know whether we're generating correct code unless it actually executes.

I think the right way to think about this is: what kind of test would have alerted you to this problem immediately? In this particular case, I think that just importing the output of the existing integration test would have caught it, so maybe that's a good starting point for functional integration tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fully convinced! I will follow your lead in prioritizing that work and happy to take it on.

@@ -18,7 +18,7 @@ jobs:
linting:
name: Pylint & Black Checks
runs-on: ubuntu-20.04
timeout-minutes: 3
timeout-minutes: 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@clavedeluna clavedeluna requested a review from drdavella October 6, 2023 15:15
@clavedeluna clavedeluna merged commit f65f1f6 into main Oct 6, 2023
8 checks passed
@clavedeluna clavedeluna deleted the safe-lxml-parsing branch October 6, 2023 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants