Support for f-strings in SQLQueryParameterization #124

andrecsilva · 2023-11-09T13:14:16Z

Overview

Added support for query parameterization contained in f-strings.

Description

Aside from the main change, the following are of note:

Added a new utility: BaseType to represent literal types in python. It comes along with a method to infer an expression type from a few common cases
Changed how multiple expressions in a single injection are parameterized. The reason is to avoid problems with type conversions. For example, given the following:

cursor.execute("SELECT * FROM users where name ='" + name + '_and_' + lastname + "'")

will now produce:

cursor.execure("SELECT * FROM users where name =?", ('{0}_and_{1}'.format(name, lastname), ))

while before it would result in:

cursor.execure("SELECT * FROM users where name =?", (name + '_and_' + lastname, ))

The .format method was chosen for compatibility. While f-strings would look better, they require python 3.6+.

codecov · 2023-11-09T13:19:13Z

Codecov Report

Merging #124 (672c4cd) into main (954a20e) will increase coverage by 0.46%.
The diff coverage is 97.15%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #124      +/-   ##
==========================================
+ Coverage   95.82%   96.28%   +0.46%     
==========================================
  Files          64       64              
  Lines        2612     2695      +83     
==========================================
+ Hits         2503     2595      +92     
+ Misses        109      100       -9

Files	Coverage Δ
src/codemodder/codemods/utils.py	`93.50% <100.00%> (+3.71%)`	⬆️
...ansformations/remove_empty_string_concatenation.py	`96.96% <96.00%> (+1.31%)`	⬆️
src/core_codemods/sql_parameterization.py	`96.21% <96.74%> (+4.88%)`	⬆️

src/codemodder/codemods/transformations/remove_empty_string_concatenation.py

clavedeluna · 2023-11-09T14:29:34Z

src/core_codemods/sql_parameterization.py

@@ -428,3 +504,18 @@ def leave_Call(self, original_node: cst.Call) -> None:
                                expr.value
                            ):
                                self.calls[original_node] = query_visitor.leaves
+
+
+def _extract_prefix_raw_value(self, node) -> Optional[Tuple[str, str]]:


kinda similar to clean_simplestring which is in another util module. At some point we should try to combine

drdavella

LGTM overall, thanks for all of the hard work.

I do think there is some opportunity here to reduce duplication and improve readability, which would be nice to do before merging.

My longer term concern is that this is pretty complicated code, which makes it fairly difficult to understand as a reviewer/maintainer. It is also going to continue to grow as we support additional string processing and probably other SQL drivers.

Right now it feels like there is not a clear separation of concerns between the code that processes string literals and the core codemod itself. I think going forward it would be good to break these things into separate components where possible before this becomes too unwieldy.

src/codemodder/codemods/utils.py

drdavella · 2023-11-13T02:05:56Z

src/core_codemods/sql_parameterization.py

-            case cst.FormattedStringText():
-                # TODO formatted string case
-                pass
+            if isinstance(


Just a nitpick but feels like this would read slightly cleaner with a match?

Same as the previous comment, match would create an extra indent with no benefit. As a personal rule, I tend to use match to capture complex patterns or enforce types in case body. Mostly to appease the static analysis.

I'm not against to just use match in those cases. Maybe we could talk about adopting an informal coding standard here.

That sounds good to me. I'm a little unclear on where match is beneficial myself. Although I do agree in certain places it appeases the type checker and in other places it enables destructuring which is quite useful.

drdavella · 2023-11-13T18:06:48Z

src/core_codemods/sql_parameterization.py

+            return new_middle[0]
+        for e in new_middle:
+            exception = False
+            if isinstance(e, cst.SimpleString | cst.FormattedStringText):


match would create an extra indent here with no additional benefit.

src/core_codemods/sql_parameterization.py

drdavella · 2023-11-13T18:11:51Z

src/core_codemods/sql_parameterization.py

-                # be careful of f"name='literal'", it needs one but not two
-                return False
-        return False
+        t = _extract_prefix_raw_value(self, node)


Other than the final condition, this method exactly duplicates the body of _is_literal_start, which suggests it should be factored out.

There is not much benefit in doing that. To elaborate, let's ponder what would be the return of the newly factored out method.

The logical, straightforward answer would be Optional[Tuple[...]] since we extract the prefix and raw_value and check for failing conditions (i.e. the b prefix). But since this is also a may-fail method, we would need to add an extra check for this which would result in a similar duplicated code.

This type of chaining of may-fail methods are prevalent here and the overall code could be cleaned up quite a using maybe monad as a programming pattern. As much as I'd like do use it, I've been avoiding it (at least explicitly) since would cost readability quite a bit as monads are seemingly infamously hard to grasp.

andrecsilva · 2023-11-14T13:33:43Z

LGTM overall, thanks for all of the hard work.

I do think there is some opportunity here to reduce duplication and improve readability, which would be nice to do before merging.

My longer term concern is that this is pretty complicated code, which makes it fairly difficult to understand as a reviewer/maintainer. It is also going to continue to grow as we support additional string processing and probably other SQL drivers.

Right now it feels like there is not a clear separation of concerns between the code that processes string literals and the core codemod itself. I think going forward it would be good to break these things into separate components where possible before this becomes too unwieldy.

I've added a comment explaining the major steps of the transformations. For now I'm expecting the code to change quite a bit when we add support for other string format options, so I'm avoiding factoring out some of the string processing methods.

andrecsilva marked this pull request as ready for review November 9, 2023 14:00

andrecsilva requested review from drdavella and clavedeluna as code owners November 9, 2023 14:00

clavedeluna reviewed Nov 9, 2023

View reviewed changes

drdavella approved these changes Nov 13, 2023

View reviewed changes

andrecsilva force-pushed the sqlp-fstring branch from 970b3f2 to 0189e06 Compare November 14, 2023 13:27

andrecsilva added 8 commits November 15, 2023 08:24

Expanded literal detection

41be954

Initial support for f-strings

ee40238

Bugfixes and refactoring

c0d5b8d

Bugfixes, refactoring and new tests

a5d15bf

Added separated tests for BaseType

4480ad1

Small typing and variables cleanup

2687508

Simplified case match

8b293e0

Some refactoring

672c4cd

andrecsilva force-pushed the sqlp-fstring branch from 0189e06 to 672c4cd Compare November 15, 2023 11:24

andrecsilva merged commit 7be15b4 into main Nov 15, 2023
11 of 12 checks passed

andrecsilva deleted the sqlp-fstring branch November 15, 2023 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for f-strings in SQLQueryParameterization #124

Support for f-strings in SQLQueryParameterization #124

andrecsilva commented Nov 9, 2023 •

edited

Loading

codecov bot commented Nov 9, 2023 •

edited

Loading

clavedeluna Nov 9, 2023

drdavella left a comment

drdavella Nov 13, 2023

andrecsilva Nov 14, 2023

drdavella Nov 14, 2023

drdavella Nov 13, 2023

andrecsilva Nov 14, 2023 •

edited

Loading

drdavella Nov 13, 2023

andrecsilva Nov 14, 2023

andrecsilva commented Nov 14, 2023

Support for f-strings in SQLQueryParameterization #124

Support for f-strings in SQLQueryParameterization #124

Conversation

andrecsilva commented Nov 9, 2023 • edited Loading

Overview

Description

codecov bot commented Nov 9, 2023 • edited Loading

Codecov Report

clavedeluna Nov 9, 2023

Choose a reason for hiding this comment

drdavella left a comment

Choose a reason for hiding this comment

drdavella Nov 13, 2023

Choose a reason for hiding this comment

andrecsilva Nov 14, 2023

Choose a reason for hiding this comment

drdavella Nov 14, 2023

Choose a reason for hiding this comment

drdavella Nov 13, 2023

Choose a reason for hiding this comment

andrecsilva Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

drdavella Nov 13, 2023

Choose a reason for hiding this comment

andrecsilva Nov 14, 2023

Choose a reason for hiding this comment

andrecsilva commented Nov 14, 2023

andrecsilva commented Nov 9, 2023 •

edited

Loading

codecov bot commented Nov 9, 2023 •

edited

Loading

andrecsilva Nov 14, 2023 •

edited

Loading