-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maintain a unique set of "set" SQL statements #3756
Conversation
@@ -93,7 +89,8 @@ public class Comdb2Handle extends AbstractConnection { | |||
|
|||
HashMap<String, Cdb2BindValue> bindVars; | |||
HashMap<Integer, Cdb2BindValue> bindVarsByIndex; | |||
private List<String> sets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering what if you just change the List to a Set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing this @riverszhang89
My first thought was to change this field type to LinkedHashSet
(which is ordered), but faced two issues:
- The
java.util.Set
interface doesn't provide asubList(...)
(or equivalent) method, which is what was being used when deciding whichset
commands to append to theSQLQuery
. Of course we can provide our own implementation of this method, but it won't help because... - A side effect of the current behavior is that
set
commands configured in oneStatement
could be propagated to subsequent statements. The code made an attempt to avoid this by keeping a 'counter' (nSetsSent) of theset
commands already sent, but that counter gets reset in some situations (e.g. if there's a re-connection), which means that there's a chance for anyStatement
to sendset
commands configured for a previous/different statement, which makes the behavior non-deterministic.
Keeping the collection of set
commands per statement and clearing it once the statement is executed provides stronger guarantees IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A set statement isn't directly sent to server. It's embedded in the next SQL statement. A counter keeps track of the last sent set command in the list so that the API knows what set statements need to be packed into the next SQL query. For instance,
SET TIMEZONE UTC
SELECT NOW() -- counter is at 0; so pack 'SET TIMEZONE UTC' into this query
SET MAXQUERYTIME 1
SET VERIFYRETRY OFF
SELECT SLEEP(10) -- counter is at 1; so pack 'SET MAXQUERYTIME 1' and 'SET VERIFYRETRY OFF' into this query
Set statements are also per-connection, in Comdb2. There isn't a so called 'multiple-statement-execution' in comdb2. You can certainly create multiple java.sql.statement objects on a single java.sql.connection, but they end up sharing each other's set statements. This is by design. This is why the counter resets to 0 if server disconnects, that the API can restore all set statements previously issued, when failing over to the next node.
On a second thought, I don't think we can use linkedhashset either (at least not without overriding its add()). Consider the following statements:
SET TIMEZONE UTC
SET TIMEZONE EST
SET TIMEZONE UTC
Say the timezone is changing back and forth (maybe because it's loaded from a config file, or maybe it's coming from an UI where the user switches timezone a couple times). Now, with a List, the final timezone is UTC; but with a linkedhashset, the 2nd 'SET TIMEZONE UTC' is getting ignored, leaving the final timezone EST.
How about just removing the string from the list before adding it? something like this:
String set_statement;
sets.remove(set_statement);
sets.add(set_statement);
This way you'd avoid inserting duplicates into the list, and at the same time maintain the order of insertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense. I'm not a fan of the delete-then-add sequence (items on the backing array get reshuffled constantly), but will do it if that's the only option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the title & description, also re-implemented the patch.
I still had to add an additional data structure to keep track of the set
commands that have been executed, instead of having a single list with a position/counter. Using the single list + counter didn't work in some cases. Following your example:
SET TIMEZONE UTC #T0
SET TIMEZONE EST #T1
SET TIMEZONE UTC #T2
- When the SET command in T0 is executed, the counter
nSetsSent
value will be set to 1 - Then, at T1 , the counter
nSetsSent
value will increase to 2 - At T2, if we delete-then-insert the SET command setting the time zone to UTC, the size of
sets
remains the same (2 items), so usingnSetsSent
to calculate the sub-list of unsent commands won't return the correct value (will return zero elements in this case, which will cause the last SET not to be executed).
I think the PR maintains the same principle:
- There's a list of SET statements we need to add to the query
- The list cannot grow forever, so it's safe to remove duplicates while maintaining order
- We don't want to re-send the same SET commands already set for the connection
- If the connection ever closes and we need to reconnect, we need a way for replaying the SET commands already sent
@hgeraldino I like your idea of keeping 2 lists of set commands. Can you please rebase and squash commits into one? |
/runtests /cbuild-compile-only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding style check: Success ✓.
Smoke testing: Error ⚠.
Cbuild submission: Error ⚠.
Regression testing: 9/473 tests failed ⚠.
The first 10 failing tests are:
cldeadlock
logfill
ruleset_no_fp_generated
ruleset
enospc
mem_tracker
queueing_sql
sc_timepart_logicalsc_generated
analyze_exit
Signed-off-by: Hector Geraldino <[email protected]>
@hgeraldino Thanks for the pull request!!! |
Rebased/squashed |
See issue #3755 for info about this patch
The proposed change splits the
set
SQL statements into two separate sets:pendingSetStmts
that keeps track of theset
SQL statements that have been added by the API/caller but not executed just yetsentSetStmts
with the list of "set" statements that have been executed. This separate list is maintained so already executed "set" statements can be queued/executed again in the case of a re-connection.Both sets are implemented as linked lists, as elements are removed before they are added again. This is done in order to avoid having an ever growing number of set commands added to the list, which causes OOM errors in applications with lots of activity.