Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(postgres): close socket actively when timeout happens during query #11480

Merged
merged 5 commits into from
Sep 18, 2023

Conversation

windmgc
Copy link
Member

@windmgc windmgc commented Aug 29, 2023

Summary

Currently, we do set/keep socket keepalive after every Postgres SQL query, based on keepalive timeout configured or lua_socket_keepalive_timeout(default 60s).
This could go wrong under some cases, when a query encounters read timeout when trying to receive data from a database with high load, the query ends on Kong's side but the query result may be sent back after timeout happens, and the result data will be lingering inside the socket buffer, and the socket itself get reused for subsequent query, then the subsequent query might get the uncorrect result from the previous query.

The PR checks the query result's err string, and if timeout happens, it'll try to close the socket actively so that the subsequent query will establish new clean ones.

Checklist

  • The Pull Request has tests
  • A changelog file has been added to CHANGELOG/unreleased/kong or adding skip-changelog label on PR if unnecessary. README.md (Please ping @vm-001 if you need help)
    - [ ] There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Full changelog

  • [Implement ...]

Issue reference

Fix FTI-5322

@bungle
Copy link
Member

bungle commented Aug 29, 2023

@windmgc good. I have seen this before in other cases. I think we should close connection in any error, not just timeout. In general, error in socket -> throw it away and get a new one.

@ms2008
Copy link
Contributor

ms2008 commented Aug 30, 2023

Great catch. I agree with @bungle that we should close the connection in any error, not just a timeout.

Or we should first perform a ping-like operation (I'm not sure if it can be easily implemented in Openresty, I've used it in other languages) after taking out the connection to make sure that the connection is working.

@windmgc
Copy link
Member Author

windmgc commented Aug 30, 2023

However, I found it quite hard to tell whether the error returned by pgmoon is due to a socket error or is just an SQL error.

Option 1 is that I can enumerate the common error strings returned by lua-nginx module but it seems to be dirty. Option 2 is that we can disconnect if any kind of error happens even if it is an SQL error and the socket can actually be reused. My take is that based on DAO we don't have many arbitrary SQL queries that are erroneous so it might be okay to just disconnect regardless of the error type.

@windmgc windmgc marked this pull request as ready for review August 31, 2023 07:45
@windmgc windmgc requested a review from bungle August 31, 2023 07:56
@bungle
Copy link
Member

bungle commented Sep 12, 2023

My take is that based on DAO we don't have many arbitrary SQL queries that are erroneous so it might be okay to just disconnect regardless of the error type.

@windmgc,

I think it is fine to close on any error.

@VicYP VicYP requested review from fffonion and dndx September 14, 2023 11:37
@VicYP
Copy link

VicYP commented Sep 14, 2023

@dndx @fffonion Please take a look and see if we need to backport this fix.

@windmgc windmgc merged commit d2da4db into master Sep 18, 2023
29 checks passed
@windmgc windmgc deleted the fix-sql-query-disorder branch September 18, 2023 03:16
@team-gateway-bot
Copy link
Collaborator

The backport to release/3.1.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release/3.1.x release/3.1.x
# Navigate to the new working tree
cd .worktrees/backport-release/3.1.x
# Create a new branch
git switch --create backport-11480-to-release/3.1.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d2da4dbb372db3687f1dfae33ba422c384b61024
# Push it to GitHub
git push --set-upstream origin backport-11480-to-release/3.1.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release/3.1.x

Then, create a pull request where the base branch is release/3.1.x and the compare/head branch is backport-11480-to-release/3.1.x.

@team-gateway-bot
Copy link
Collaborator

The backport to release/2.8.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release/2.8.x release/2.8.x
# Navigate to the new working tree
cd .worktrees/backport-release/2.8.x
# Create a new branch
git switch --create backport-11480-to-release/2.8.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 d2da4dbb372db3687f1dfae33ba422c384b61024
# Push it to GitHub
git push --set-upstream origin backport-11480-to-release/2.8.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release/2.8.x

Then, create a pull request where the base branch is release/2.8.x and the compare/head branch is backport-11480-to-release/2.8.x.

team-gateway-bot pushed a commit that referenced this pull request Sep 18, 2023
#11480)

Currently, we do set/keep socket keepalive after every Postgres SQL query, based on keepalive timeout configured or lua_socket_keepalive_timeout(default 60s).
This could go wrong under some cases, when a query encounters read timeout when trying to receive data from a database with high load, the query ends on Kong's side but the query result may be sent back after timeout happens, and the result data will be lingering inside the socket buffer, and the socket itself get reused for subsequent query, then the subsequent query might get the incorrect result from the previous query.

The PR checks the query result's err string, and if any error happens, it'll try to close the socket actively so that the subsequent query will establish new clean ones.

Fix FTI-5322

(cherry picked from commit d2da4db)
windmgc added a commit that referenced this pull request Sep 18, 2023
#11480)

Currently, we do set/keep socket keepalive after every Postgres SQL query, based on keepalive timeout configured or lua_socket_keepalive_timeout(default 60s).
This could go wrong under some cases, when a query encounters read timeout when trying to receive data from a database with high load, the query ends on Kong's side but the query result may be sent back after timeout happens, and the result data will be lingering inside the socket buffer, and the socket itself get reused for subsequent query, then the subsequent query might get the incorrect result from the previous query.

The PR checks the query result's err string, and if any error happens, it'll try to close the socket actively so that the subsequent query will establish new clean ones.

Fix FTI-5322

(cherry picked from commit d2da4db)
windmgc added a commit that referenced this pull request Sep 18, 2023
#11480)

Currently, we do set/keep socket keepalive after every Postgres SQL query, based on keepalive timeout configured or lua_socket_keepalive_timeout(default 60s).
This could go wrong under some cases, when a query encounters read timeout when trying to receive data from a database with high load, the query ends on Kong's side but the query result may be sent back after timeout happens, and the result data will be lingering inside the socket buffer, and the socket itself get reused for subsequent query, then the subsequent query might get the incorrect result from the previous query.

The PR checks the query result's err string, and if any error happens, it'll try to close the socket actively so that the subsequent query will establish new clean ones.

Fix FTI-5322

(cherry picked from commit d2da4db)
windmgc added a commit that referenced this pull request Sep 18, 2023
#11480)

Currently, we do set/keep socket keepalive after every Postgres SQL query, based on keepalive timeout configured or lua_socket_keepalive_timeout(default 60s).
This could go wrong under some cases, when a query encounters read timeout when trying to receive data from a database with high load, the query ends on Kong's side but the query result may be sent back after timeout happens, and the result data will be lingering inside the socket buffer, and the socket itself get reused for subsequent query, then the subsequent query might get the incorrect result from the previous query.

The PR checks the query result's err string, and if any error happens, it'll try to close the socket actively so that the subsequent query will establish new clean ones.

Fix FTI-5322

(cherry picked from commit d2da4db)
windmgc added a commit that referenced this pull request Sep 18, 2023
#11480)

Currently, we do set/keep socket keepalive after every Postgres SQL query, based on keepalive timeout configured or lua_socket_keepalive_timeout(default 60s).
This could go wrong under some cases, when a query encounters read timeout when trying to receive data from a database with high load, the query ends on Kong's side but the query result may be sent back after timeout happens, and the result data will be lingering inside the socket buffer, and the socket itself get reused for subsequent query, then the subsequent query might get the incorrect result from the previous query.

The PR checks the query result's err string, and if any error happens, it'll try to close the socket actively so that the subsequent query will establish new clean ones.

Fix FTI-5322

(cherry picked from commit d2da4db)
windmgc added a commit that referenced this pull request Sep 18, 2023
#11480)

Currently, we do set/keep socket keepalive after every Postgres SQL query, based on keepalive timeout configured or lua_socket_keepalive_timeout(default 60s).
This could go wrong under some cases, when a query encounters read timeout when trying to receive data from a database with high load, the query ends on Kong's side but the query result may be sent back after timeout happens, and the result data will be lingering inside the socket buffer, and the socket itself get reused for subsequent query, then the subsequent query might get the incorrect result from the previous query.

The PR checks the query result's err string, and if any error happens, it'll try to close the socket actively so that the subsequent query will establish new clean ones.

Fix FTI-5322

(cherry picked from commit d2da4db)
windmgc added a commit that referenced this pull request Sep 18, 2023
#11480)

Currently, we do set/keep socket keepalive after every Postgres SQL query, based on keepalive timeout configured or lua_socket_keepalive_timeout(default 60s).
This could go wrong under some cases, when a query encounters read timeout when trying to receive data from a database with high load, the query ends on Kong's side but the query result may be sent back after timeout happens, and the result data will be lingering inside the socket buffer, and the socket itself get reused for subsequent query, then the subsequent query might get the incorrect result from the previous query.

The PR checks the query result's err string, and if any error happens, it'll try to close the socket actively so that the subsequent query will establish new clean ones.

Fix FTI-5322

(cherry picked from commit d2da4db)
windmgc added a commit that referenced this pull request Sep 18, 2023
#11480)

Currently, we do set/keep socket keepalive after every Postgres SQL query, based on keepalive timeout configured or lua_socket_keepalive_timeout(default 60s).
This could go wrong under some cases, when a query encounters read timeout when trying to receive data from a database with high load, the query ends on Kong's side but the query result may be sent back after timeout happens, and the result data will be lingering inside the socket buffer, and the socket itself get reused for subsequent query, then the subsequent query might get the incorrect result from the previous query.

The PR checks the query result's err string, and if any error happens, it'll try to close the socket actively so that the subsequent query will establish new clean ones.

Fix FTI-5322

(cherry picked from commit d2da4db)
windmgc added a commit that referenced this pull request Sep 18, 2023
#11480)

Currently, we do set/keep socket keepalive after every Postgres SQL query, based on keepalive timeout configured or lua_socket_keepalive_timeout(default 60s).
This could go wrong under some cases, when a query encounters read timeout when trying to receive data from a database with high load, the query ends on Kong's side but the query result may be sent back after timeout happens, and the result data will be lingering inside the socket buffer, and the socket itself get reused for subsequent query, then the subsequent query might get the incorrect result from the previous query.

The PR checks the query result's err string, and if any error happens, it'll try to close the socket actively so that the subsequent query will establish new clean ones.

Fix FTI-5322

(cherry picked from commit d2da4db)
AndyZhang0707 added a commit that referenced this pull request Sep 18, 2023
dndx pushed a commit that referenced this pull request Sep 19, 2023
bungle added a commit that referenced this pull request Nov 6, 2023
### Summary

The PR #11480 introduced a bug that calls `store_connection`
without passing `self`. This fixes that.

Signed-off-by: Aapo Talvensaari <[email protected]>
jschmid1 pushed a commit that referenced this pull request Nov 7, 2023
### Summary

The PR #11480 introduced a bug that calls `store_connection`
without passing `self`. This fixes that.

Signed-off-by: Aapo Talvensaari <[email protected]>
team-gateway-bot pushed a commit that referenced this pull request Nov 7, 2023
### Summary

The PR #11480 introduced a bug that calls `store_connection`
without passing `self`. This fixes that.

Signed-off-by: Aapo Talvensaari <[email protected]>
(cherry picked from commit 201b0a9)
dndx pushed a commit that referenced this pull request Nov 9, 2023
### Summary

The PR #11480 introduced a bug that calls `store_connection`
without passing `self`. This fixes that.

Signed-off-by: Aapo Talvensaari <[email protected]>
(cherry picked from commit 201b0a9)
AndyZhang0707 added a commit that referenced this pull request Dec 4, 2023
windmgc pushed a commit that referenced this pull request Dec 11, 2023
windmgc pushed a commit that referenced this pull request Jan 24, 2024
### Summary

The PR #11480 introduced a bug that calls `store_connection`
without passing `self`. This fixes that.

Signed-off-by: Aapo Talvensaari <[email protected]>
windmgc pushed a commit that referenced this pull request Mar 7, 2024
### Summary

The PR #11480 introduced a bug that calls `store_connection`
without passing `self`. This fixes that.

Signed-off-by: Aapo Talvensaari <[email protected]>
windmgc pushed a commit that referenced this pull request Mar 8, 2024
### Summary

The PR #11480 introduced a bug that calls `store_connection`
without passing `self`. This fixes that.

Signed-off-by: Aapo Talvensaari <[email protected]>
AndyZhang0707 added a commit that referenced this pull request Jul 18, 2024
…eout happens during query (#11480)"""

This reverts commit 5b6d932.
AndyZhang0707 added a commit that referenced this pull request Jul 26, 2024
…eout happens during query (#11480)"""

This reverts commit 5b6d932.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants