Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for f-strings in SQLQueryParameterization #124

Merged
merged 8 commits into from
Nov 15, 2023
Merged

Conversation

andrecsilva
Copy link
Contributor

@andrecsilva andrecsilva commented Nov 9, 2023

Overview

Added support for query parameterization contained in f-strings.

Description

Aside from the main change, the following are of note:

  • Added a new utility: BaseType to represent literal types in python. It comes along with a method to infer an expression type from a few common cases
  • Changed how multiple expressions in a single injection are parameterized. The reason is to avoid problems with type conversions. For example, given the following:
cursor.execute("SELECT * FROM users where name ='" + name + '_and_' + lastname + "'")

will now produce:

cursor.execure("SELECT * FROM users where name =?", ('{0}_and_{1}'.format(name, lastname), ))

while before it would result in:

cursor.execure("SELECT * FROM users where name =?", (name + '_and_' + lastname, ))

The .format method was chosen for compatibility. While f-strings would look better, they require python 3.6+.

Copy link

codecov bot commented Nov 9, 2023

Codecov Report

Merging #124 (672c4cd) into main (954a20e) will increase coverage by 0.46%.
The diff coverage is 97.15%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #124      +/-   ##
==========================================
+ Coverage   95.82%   96.28%   +0.46%     
==========================================
  Files          64       64              
  Lines        2612     2695      +83     
==========================================
+ Hits         2503     2595      +92     
+ Misses        109      100       -9     
Files Coverage Δ
src/codemodder/codemods/utils.py 93.50% <100.00%> (+3.71%) ⬆️
...ansformations/remove_empty_string_concatenation.py 96.96% <96.00%> (+1.31%) ⬆️
src/core_codemods/sql_parameterization.py 96.21% <96.74%> (+4.88%) ⬆️

@andrecsilva andrecsilva marked this pull request as ready for review November 9, 2023 14:00
@@ -428,3 +504,18 @@ def leave_Call(self, original_node: cst.Call) -> None:
expr.value
):
self.calls[original_node] = query_visitor.leaves


def _extract_prefix_raw_value(self, node) -> Optional[Tuple[str, str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinda similar to clean_simplestring which is in another util module. At some point we should try to combine

Copy link
Member

@drdavella drdavella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, thanks for all of the hard work.

I do think there is some opportunity here to reduce duplication and improve readability, which would be nice to do before merging.

My longer term concern is that this is pretty complicated code, which makes it fairly difficult to understand as a reviewer/maintainer. It is also going to continue to grow as we support additional string processing and probably other SQL drivers.

Right now it feels like there is not a clear separation of concerns between the code that processes string literals and the core codemod itself. I think going forward it would be good to break these things into separate components where possible before this becomes too unwieldy.

src/codemodder/codemods/utils.py Outdated Show resolved Hide resolved
case cst.FormattedStringText():
# TODO formatted string case
pass
if isinstance(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nitpick but feels like this would read slightly cleaner with a match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the previous comment, match would create an extra indent with no benefit. As a personal rule, I tend to use match to capture complex patterns or enforce types in case body. Mostly to appease the static analysis.

I'm not against to just use match in those cases. Maybe we could talk about adopting an informal coding standard here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me. I'm a little unclear on where match is beneficial myself. Although I do agree in certain places it appeases the type checker and in other places it enables destructuring which is quite useful.

return new_middle[0]
for e in new_middle:
exception = False
if isinstance(e, cst.SimpleString | cst.FormattedStringText):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match?

Copy link
Contributor Author

@andrecsilva andrecsilva Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match would create an extra indent here with no additional benefit.

src/core_codemods/sql_parameterization.py Outdated Show resolved Hide resolved
# be careful of f"name='literal'", it needs one but not two
return False
return False
t = _extract_prefix_raw_value(self, node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the final condition, this method exactly duplicates the body of _is_literal_start, which suggests it should be factored out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not much benefit in doing that. To elaborate, let's ponder what would be the return of the newly factored out method.

The logical, straightforward answer would be Optional[Tuple[...]] since we extract the prefix and raw_value and check for failing conditions (i.e. the b prefix). But since this is also a may-fail method, we would need to add an extra check for this which would result in a similar duplicated code.

This type of chaining of may-fail methods are prevalent here and the overall code could be cleaned up quite a using maybe monad as a programming pattern. As much as I'd like do use it, I've been avoiding it (at least explicitly) since would cost readability quite a bit as monads are seemingly infamously hard to grasp.

@andrecsilva
Copy link
Contributor Author

LGTM overall, thanks for all of the hard work.

I do think there is some opportunity here to reduce duplication and improve readability, which would be nice to do before merging.

My longer term concern is that this is pretty complicated code, which makes it fairly difficult to understand as a reviewer/maintainer. It is also going to continue to grow as we support additional string processing and probably other SQL drivers.

Right now it feels like there is not a clear separation of concerns between the code that processes string literals and the core codemod itself. I think going forward it would be good to break these things into separate components where possible before this becomes too unwieldy.

I've added a comment explaining the major steps of the transformations. For now I'm expecting the code to change quite a bit when we add support for other string format options, so I'm avoiding factoring out some of the string processing methods.

@andrecsilva andrecsilva merged commit 7be15b4 into main Nov 15, 2023
11 of 12 checks passed
@andrecsilva andrecsilva deleted the sqlp-fstring branch November 15, 2023 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants