-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postgres connections are returned to Pool too quickly #1199
Comments
I originally thought this was haskellari/postgresql-simple#69 but after testing that doesn't seem to be the case. Also related is #981 but the conclusions there seem to indicate you should be safe if you're using a pool. |
It seems that
|
And now for some real confusion... The issue still happens after calling |
I'm wondering if this could be related to bos/pool#32. |
I'm fairly confident this issue is something to do with main :: IO ()
main = do
-- I started a postgres server with:
-- docker run --rm --name some-postgres -p 5432:5432 -e POSTGRES_PASSWORD=secret postgres
pool <- Pool.createPool (PGS.connect PGS.defaultConnectInfo { PGS.connectPassword = "secret" } ) PGS.close 1 10 1
threadId <- Concurrent.forkIO $ do
Pool.withResource pool $ \conn -> do
let numThings :: Int = 100000000
putStrLn $ "start inserting " <> show numThings <> " things"
Monad.forM_ [1 .. numThings] $ \_ -> do
PGS.execute_ conn "insert into foo values(1);"
putStrLn "waiting for insert thread to make progress"
Concurrent.threadDelay 5000000
Monad.void $ Concurrent.killThread threadId
putStrLn "killing insert thread"
Pool.withResource pool $ \conn -> do
PGS.execute_ conn "insert into foo values(1);"
putStrLn "done"
|
It looks like Persistent does the right thing here by catching any exceptions when destroying the pool but this is the only thing I could imagine being wrong that would exhibit this behavior:
Especially since Postgresql-simple doesn't seem affected. |
~~Wow... I just verified that the use of Edit: Nevermind... I left However as I noted above |
This very much seems like it could be related: Also today I tried layering on some of the extra exception handling that Persistent does to see if I could get to the error from my postgresql-simple base: I wasn't able too though... The above issue makes me wonder if adding in resourceT stuff will cause the issue to show up though. |
I work with @codygman and am also looking into this issue. I was able to create a slightly smaller reproduction of the bug, which you can find here: https://gist.github.com/tfausak/ee019365d628a36fc3480d30d38bed0c {-# language OverloadedStrings #-}
import Control.Concurrent (forkIO, killThread, threadDelay)
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Logger (runStdoutLoggingT)
import Database.Persist.Postgresql (rawExecute, runSqlPool, withPostgresqlPool)
main :: IO ()
main = runStdoutLoggingT . withPostgresqlPool "" 1 $ \ pool -> liftIO $ do
threadId <- forkIO
. runStdoutLoggingT
$ runSqlPool (rawExecute "select pg_sleep(1)" []) pool
putStrLn "Waiting for the thread to start ..."
threadDelay 500000
putStrLn "Killing the thread ..."
killThread threadId
-- Wait around a little bit for the error to be output.
threadDelay 500000
putStrLn "Done." I do not see the same behavior when using Also if I switch from |
It actually even works if you call async within the forkIO which I found surprising: https://gist.github.com/codygman/d38a049092301ade8e8e1bb362cac778 |
Perusing through issues in conduit (the resourcet repo is in there) I also found a comment from @snoyberg that might be helpful here:
snoyberg/conduit#441 (comment) This makes me wonder how much Persistent could do to prevent such an issue from happening though... or if consumers just always have to be aware of this issue. |
The more I think about this the more I think about masking. That led me to try: runStdoutLoggingT . withPostgresqlPool "" 1 $ \ pool -> liftIO $ do
threadId <- uninterruptibleMask_ $ forkIOWithUnmask $ \unmask ->
unmask (void . runStdoutLoggingT $ runSqlPool (rawExecute "select pg_sleep(1)" []) pool)
`finally` putStrLn "cleanup in child thread" That didn't resolve the issue but it was inspired by revisiting the
|
Working code with uninterruptibleMask: {-# language OverloadedStrings #-}
import Control.Concurrent (forkIO, killThread, threadDelay)
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Logger (runStdoutLoggingT)
import Database.Persist.Postgresql (rawExecute, runSqlPool, withPostgresqlPool)
import Control.Exception
main :: IO ()
main = uninterruptibleMask_ . runStdoutLoggingT . withPostgresqlPool "" 1 $ \ pool -> liftIO $ do
threadId <- forkIO
. runStdoutLoggingT
$ runSqlPool (rawExecute "select pg_sleep(1)" []) pool
putStrLn "Waiting for the thread to start ..."
threadDelay 500000
putStrLn "Killing the thread ..."
killThread threadId
-- Wait around a little bit for the error to be output.
threadDelay 500000
putStrLn "Done." This uncovers an error with the Query that returns a result it wasn't expecting:
|
A lot of related discussion here: fpco/safe-exceptions#3 |
I also want to note here that you'd likely want a timeout in the uninterruptibleMask above in any production code. |
A recent blog by @joeyh mentions this issue and might have an explanation as to why mask doesn't work alone... an MVar could be empty: http://joeyh.name/blog/entry/bracketing_and_async_exceptions_in_haskell/ Edit: Down towards the end he makes some recommendations and points out that you need to call
|
I'm too tired to know if it's possible or a good idea, but I this piece of the docs keeps catching my eye:
Would it be possible or even desirable to use STM throughout the Persistent module to prevent this issue? |
It seems another fix that I believe will work without all other fixes is using uninterruptibleMask_ on the finish :: Connection
-> IO ()
finish (Conn fp _) = do
putStrLn "hi from libqp.finish"
uninterruptibleMask_ $ finalizeForeignPtr fp I'll verify if this by itself fixes things and then post asking about it in the related postgresql-libpq issue. |
Yeah, I was wondering if we could switch to |
I think either could work, but if you defer to the underlying libraries you are at the whim of whether they do things correctly asynchronously. Maybe this is the only way it can be... IIRC that was the point being made in: https://www.fpcomplete.com/blog/2018/04/async-exception-handling-haskell/ I still feel a little lost with all of this, but I'm pretty sure something is wrong somewhere 😄 |
|
Perhaps this is just a very long way for me to figure out I'm affected by #981?
I thought that by having a |
This bug in pool makes it kind of difficult to isolate the problem: bos/pool#32 destroyResource definitely gets called more than once and the best way to ensure the tainted connection is actually gone is to only allow the Pool to have 1 stripe and 1 connection. |
I thought that by cancelling pending queries in connRollback because it's called on ReleaseException I could successfully prevent the bad postgres connection that still has a running query from re-entering the pool but it seems not. I verified with print debugging that it does actually get destroyed... or at least
That makes me wonder if either:
I can't think of anything else. I'm afraid my team might have to move away from Persistent due to this issue :sad: |
oh, huh -- | Creates a pool of connections to a SQL database.
--
-- @since 2.11.0.0
createSqlPoolWithConfig
:: forall m backend. (MonadLoggerIO m, MonadUnliftIO m, BackendCompatible SqlBackend backend)
=> (LogFunc -> IO backend) -- ^ Function to create a new connection
-> ConnectionPoolConfig
-> m (Pool backend)
createSqlPoolWithConfig mkConn config = do
logFunc <- askLoggerIO
-- Resource pool will swallow any exceptions from close. We want to log
-- them instead.
let loggedClose :: backend -> IO ()
loggedClose backend = close' backend `UE.catchAny` \e -> runLoggingT
(logError $ T.pack $ "Error closing database connection in pool: " ++ show e)
logFunc
liftIO $ createPool
(mkConn logFunc)
loggedClose
(connectionPoolConfigStripes config)
(connectionPoolConfigIdleTimeout config)
(connectionPoolConfigSize config) we do catch stuff! Right here, we do catch
:: (MonadUnliftIO m, Exception e)
=> m a -- ^ action
-> (e -> m a) -- ^ handler
-> m a
catch f g = withRunInIO $ \run -> run f `EUnsafe.catch` \e ->
if isSyncException e
then run (g e)
-- intentionally rethrowing an async exception synchronously,
-- since we want to preserve async behavior
else EUnsafe.throwIO e 🤔 The |
OK, got it running locally, and I'm reproducing it. As is usually the case with concurrent behavior, compiler and runtime flags matter! Building with no GHC options
This happens reliably - every single time we get a Building with
|
Throwing an |
Can more easily make it reliable/deterministic by having the loop occur inside a SQL transaction - just put the
I wish the Acquire interace allowed us to get our hands on the caught exception. Gonna file an issue there. |
That's also what I saw. I also tried using bos/pool#37. Then again I'm not sure if I ever removed |
@merijn Suggested the RTS could be sending a signal in irc and said that could be tested by compiling with
|
Also related:
lpsmith/postgresql-simple#202 (comment) withPg from snaplet-postgres-simple: https://github.com/mightybyte/snaplet-postgresql-simple/blob/98739b6cdd8a4f97506b80de2ce102d8501975b0/src/Snap/Snaplet/PostgresqlSimple/Internal.hs#L127 -- | Function that reserves a single connection for the duration of the given
-- action. Nested calls to withPG will only reserve one connection. For example,
-- the following code calls withPG twice in a nested way yet only results in a single
-- connection being reserved:
--
-- > myHandler = withPG $ do
-- > queryTheDatabase
-- > commonDatabaseMethod
-- >
-- > commonDatabaseMethod = withPG $ do
-- > moreDatabaseActions
-- > evenMoreDatabaseActions
--
-- This is useful in a practical setting because you may often find yourself in a situation
-- where you have common code (that requires a database connection) that you wish to call from
-- other blocks of code that may require a database connection and you still want to make sure
-- that you are only using one connection through all of your nested methods.
withPG :: (HasPostgres m)
=> m b -> m b
withPG f = do
s <- getPostgresState
case s of
(PostgresPool p) -> withResource p (\c -> setLocalPostgresState (PostgresConn c) f)
(PostgresConn _) -> f Maybe I'm missing something, but this doesn't seem a lot different from what persistent does. |
Won't that happen on the same connection though? I'd expect an error on the same connection. Keeping Monad.forM_ on the outside was an attempt hoping that Data.Pool would block until the connection was in a valid state again or a new one was created. |
My understanding is that the problem happens roughly like this:
I consistently see that a new |
Does PG.close/LibPQ.finish fail with 'query in progress' if the connection has a query in progress I wonder? |
@parsonsmatt FYI just 30 minutes ago someone posted with a patch for postgresql-simple which could tie into all of this: |
When a query (especially long running) is interrupted while using a connection from a Pool, that connection is almost immediately returned to the pool even if the query is still running on that connection.
Another async consumer picks up the connection out of the pool to do it's work and you get:
The gist of it is:
I've created a reproducable repo here that just requires stack and docker (for the postgres server):
https://github.com/codygman/persistent-postgresql-query-in-progress-repro/blob/master/src/Main.hs
The text was updated successfully, but these errors were encountered: