fix(postgres): close socket actively when timeout happens during query #11480

windmgc · 2023-08-29T13:12:24Z

Summary

Currently, we do set/keep socket keepalive after every Postgres SQL query, based on keepalive timeout configured or lua_socket_keepalive_timeout(default 60s).
This could go wrong under some cases, when a query encounters read timeout when trying to receive data from a database with high load, the query ends on Kong's side but the query result may be sent back after timeout happens, and the result data will be lingering inside the socket buffer, and the socket itself get reused for subsequent query, then the subsequent query might get the uncorrect result from the previous query.

The PR checks the query result's err string, and if timeout happens, it'll try to close the socket actively so that the subsequent query will establish new clean ones.

Checklist

The Pull Request has tests
A changelog file has been added to CHANGELOG/unreleased/kong or adding skip-changelog label on PR if unnecessary. README.md (Please ping @vm-001 if you need help)
~~- [ ] There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE~~

Full changelog

[Implement ...]

Issue reference

Fix FTI-5322

bungle · 2023-08-29T15:58:11Z

@windmgc good. I have seen this before in other cases. I think we should close connection in any error, not just timeout. In general, error in socket -> throw it away and get a new one.

ms2008 · 2023-08-30T03:02:32Z

Great catch. I agree with @bungle that we should close the connection in any error, not just a timeout.

Or we should first perform a ping-like operation (I'm not sure if it can be easily implemented in Openresty, I've used it in other languages) after taking out the connection to make sure that the connection is working.

windmgc · 2023-08-30T08:54:09Z

However, I found it quite hard to tell whether the error returned by pgmoon is due to a socket error or is just an SQL error.

Option 1 is that I can enumerate the common error strings returned by lua-nginx module but it seems to be dirty. Option 2 is that we can disconnect if any kind of error happens even if it is an SQL error and the socket can actually be reused. My take is that based on DAO we don't have many arbitrary SQL queries that are erroneous so it might be okay to just disconnect regardless of the error type.

bungle · 2023-09-12T08:45:58Z

My take is that based on DAO we don't have many arbitrary SQL queries that are erroneous so it might be okay to just disconnect regardless of the error type.

@windmgc,

I think it is fine to close on any error.

VicYP · 2023-09-14T11:38:45Z

@dndx @fffonion Please take a look and see if we need to backport this fix.

team-gateway-bot · 2023-09-18T03:16:57Z

The backport to release/3.1.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release/3.1.x release/3.1.x
# Navigate to the new working tree
cd .worktrees/backport-release/3.1.x
# Create a new branch
git switch --create backport-11480-to-release/3.1.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d2da4dbb372db3687f1dfae33ba422c384b61024
# Push it to GitHub
git push --set-upstream origin backport-11480-to-release/3.1.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release/3.1.x

Then, create a pull request where the base branch is release/3.1.x and the compare/head branch is backport-11480-to-release/3.1.x.

team-gateway-bot · 2023-09-18T03:16:59Z

The backport to release/2.8.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release/2.8.x release/2.8.x
# Navigate to the new working tree
cd .worktrees/backport-release/2.8.x
# Create a new branch
git switch --create backport-11480-to-release/2.8.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d2da4dbb372db3687f1dfae33ba422c384b61024
# Push it to GitHub
git push --set-upstream origin backport-11480-to-release/2.8.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release/2.8.x

Then, create a pull request where the base branch is release/2.8.x and the compare/head branch is backport-11480-to-release/2.8.x.

#11480) Currently, we do set/keep socket keepalive after every Postgres SQL query, based on keepalive timeout configured or lua_socket_keepalive_timeout(default 60s). This could go wrong under some cases, when a query encounters read timeout when trying to receive data from a database with high load, the query ends on Kong's side but the query result may be sent back after timeout happens, and the result data will be lingering inside the socket buffer, and the socket itself get reused for subsequent query, then the subsequent query might get the incorrect result from the previous query. The PR checks the query result's err string, and if any error happens, it'll try to close the socket actively so that the subsequent query will establish new clean ones. Fix FTI-5322 (cherry picked from commit d2da4db)

…ing query (#11480)" This reverts commit 1a514ef.

### Summary The PR #11480 introduced a bug that calls `store_connection` without passing `self`. This fixes that. Signed-off-by: Aapo Talvensaari <[email protected]>

### Summary The PR #11480 introduced a bug that calls `store_connection` without passing `self`. This fixes that. Signed-off-by: Aapo Talvensaari <[email protected]> (cherry picked from commit 201b0a9)

…pens during query (#11480)"" This reverts commit 396774d.

### Summary The PR #11480 introduced a bug that calls `store_connection` without passing `self`. This fixes that. Signed-off-by: Aapo Talvensaari <[email protected]>

…eout happens during query (#11480)""" This reverts commit 5b6d932.

fix(postgres): close socket actively when timeout happens during query

f702cb8

pull-request-size bot added the size/M label Aug 29, 2023

github-actions bot assigned windmgc Aug 29, 2023

github-actions bot added the core/db label Aug 29, 2023

fix keepalive

e20963c

close connection when any type of error occurs

4e9fcba

windmgc added 2 commits August 30, 2023 17:19

add changelog

c4660f6

add test

c9bfa5a

windmgc marked this pull request as ready for review August 31, 2023 07:45

windmgc requested a review from bungle August 31, 2023 07:56

VicYP requested review from fffonion and dndx September 14, 2023 11:37

ms2008 approved these changes Sep 18, 2023

View reviewed changes

windmgc merged commit d2da4db into master Sep 18, 2023
29 checks passed

windmgc deleted the fix-sql-query-disorder branch September 18, 2023 03:16

windmgc added backport release/2.8.x labels Sep 18, 2023

This was referenced Sep 18, 2023

[Backport release/3.4.x] fix(postgres): close socket actively when timeout happens during query #11586

Merged

[Backport release/3.3.x] fix(postgres): close socket actively when timeout happens during query #11587

Merged

team-gateway-bot mentioned this pull request Sep 18, 2023

[Backport release/3.2.x] fix(postgres): close socket actively when timeout happens during query #11588

Merged

windmgc mentioned this pull request Sep 18, 2023

[Backport release/2.8.x] fix(postgres): close socket actively when timeout happens during query #11589

Merged

3 tasks

windmgc mentioned this pull request Sep 18, 2023

[Backport release/3.1.x] fix(postgres): close socket actively when timeout happens during query #11590

Closed

3 tasks

AndyZhang0707 added a commit that referenced this pull request Sep 18, 2023

Revert "fix(postgres): close socket actively when timeout happens dur…

32325f3

…ing query (#11480)" This reverts commit 1a514ef.

dndx pushed a commit that referenced this pull request Sep 19, 2023

Revert "fix(postgres): close socket actively when timeout happens dur…

396774d

…ing query (#11480)" This reverts commit 1a514ef.

kikito mentioned this pull request Oct 3, 2023

docs(COPYRIGHT): update copyright info in 3.4.1 #11688

Merged

bungle added a commit that referenced this pull request Nov 6, 2023

fix(db): pg store connection called without self

c3ce21d

### Summary The PR #11480 introduced a bug that calls `store_connection` without passing `self`. This fixes that. Signed-off-by: Aapo Talvensaari <[email protected]>

bungle mentioned this pull request Nov 6, 2023

fix(db): pg store connection called without self #11926

Merged

jschmid1 pushed a commit that referenced this pull request Nov 7, 2023

fix(db): pg store connection called without self

201b0a9

### Summary The PR #11480 introduced a bug that calls `store_connection` without passing `self`. This fixes that. Signed-off-by: Aapo Talvensaari <[email protected]>

AndyZhang0707 added a commit that referenced this pull request Dec 4, 2023

Revert "Revert "fix(postgres): close socket actively when timeout hap…

fcbe330

…pens during query (#11480)"" This reverts commit 396774d.

windmgc pushed a commit that referenced this pull request Dec 11, 2023

Revert "Revert "fix(postgres): close socket actively when timeout hap…

5b6d932

…pens during query (#11480)"" This reverts commit 396774d.

windmgc pushed a commit that referenced this pull request Jan 24, 2024

fix(db): pg store connection called without self

f94b7c4

### Summary The PR #11480 introduced a bug that calls `store_connection` without passing `self`. This fixes that. Signed-off-by: Aapo Talvensaari <[email protected]>

kikito mentioned this pull request Jan 24, 2024

fix(scripts): fix update-copyright in venv and remove unused repos #12413

Closed

3 tasks

windmgc pushed a commit that referenced this pull request Mar 7, 2024

fix(db): pg store connection called without self

3c004ab

### Summary The PR #11480 introduced a bug that calls `store_connection` without passing `self`. This fixes that. Signed-off-by: Aapo Talvensaari <[email protected]>

windmgc pushed a commit that referenced this pull request Mar 8, 2024

fix(db): pg store connection called without self

18801db

### Summary The PR #11480 introduced a bug that calls `store_connection` without passing `self`. This fixes that. Signed-off-by: Aapo Talvensaari <[email protected]>

AndyZhang0707 added a commit that referenced this pull request Jul 18, 2024

Revert "Revert "Revert "fix(postgres): close socket actively when tim…

8e4c60a

…eout happens during query (#11480)""" This reverts commit 5b6d932.

AndyZhang0707 added a commit that referenced this pull request Jul 26, 2024

Revert "Revert "Revert "fix(postgres): close socket actively when tim…

20b52af

…eout happens during query (#11480)""" This reverts commit 5b6d932.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(postgres): close socket actively when timeout happens during query #11480

fix(postgres): close socket actively when timeout happens during query #11480

windmgc commented Aug 29, 2023 •

edited

Loading

bungle commented Aug 29, 2023 •

edited

Loading

ms2008 commented Aug 30, 2023

windmgc commented Aug 30, 2023 •

edited

Loading

bungle commented Sep 12, 2023

VicYP commented Sep 14, 2023

team-gateway-bot commented Sep 18, 2023

team-gateway-bot commented Sep 18, 2023

fix(postgres): close socket actively when timeout happens during query #11480

fix(postgres): close socket actively when timeout happens during query #11480

Conversation

windmgc commented Aug 29, 2023 • edited Loading

Summary

Checklist

Full changelog

Issue reference

bungle commented Aug 29, 2023 • edited Loading

ms2008 commented Aug 30, 2023

windmgc commented Aug 30, 2023 • edited Loading

bungle commented Sep 12, 2023

VicYP commented Sep 14, 2023

team-gateway-bot commented Sep 18, 2023

team-gateway-bot commented Sep 18, 2023

windmgc commented Aug 29, 2023 •

edited

Loading

bungle commented Aug 29, 2023 •

edited

Loading

windmgc commented Aug 30, 2023 •

edited

Loading