-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock on connection loss during runtime #154
Comments
Ping doesn't work during transaction
Try this
|
Disconnect is not working properly in ecto, disconnect seems never calling.
Repo.query("CLOSE") works. |
It seems this changed the situation: I see this exception now
but then also this one
which seems to be caused here:
That looks related to 'disconnect' that you mentioned. |
Maybe it's fixed |
Looks better, yes. Thank you! I also see this now %DBConnection.ConnectionError{
message: ":closed",
severity: :error,
reason: :error
} and then later a simple
It doesn't recover by itself however; but I now see that it doesn't anymore with postgresql either (or maybe never did)... that has probably something to do with my application and supervision. It behaves the same now as with the postgresql adapter 👍 |
Or maybe not, sorry. Tested again today (after fixing my App). And weirdly I now see this:
|
Yes, that's gone. What (I think) I found out, though:
I would suggest to do it like the Postgresql adapter, and not exit the process. But exiting with some other reason than |
I've looked at postgrex/type_server.ex
|
The repo process still exits; and I think still Postgres is: Repo process starts and keeps running regardless of connectivity; then the Repo functions raise an exception (after trying to connect/checkout a worker).
|
Maybe :ok at the end of disconnect is needed. Also try different reasons in jamdb_oracle.erl |
That didn't change much. But I just dug my head into a bit, and noticed that a major difference to the Postgres adapter is, that it doesn't start an extra process for a DBConnection. See So I thought the equivalent would be not to use a :jamdb_oracle genserver in Jamdb.Oracle, but to use :jamdb_oracle_conn directly. Replacing I tried that a bit quick&dirty, and that works really fine! Behaves like the Ecto Postgres adapter now - i.e. The Repo process keeps running, but using it always throws the "request was dropped from queue" exception... until Oracle is accessible (again). This should also be more efficient, corresponding to the documentation of DBConnection (https://github.com/elixir-ecto/db_connection#design)
If you like, and agree, I can try to work a bit more on this and make a proper pull request; although it's actually relatively simple: @spec query(conn :: %Jamdb.Oracle{}, sql :: any(), params :: any()) ::
{:ok | :cont, any(), %Jamdb.Oracle{}} | {:error | :disconnect, any(), %Jamdb.Oracle{}}
def query(conn, sql, params \\ [])
def query(%{conn: conn, timeout: timeout} = s, sql, params) do
case :jamdb_oracle_conn.sql_query(conn, stmt(sql, params), timeout) do
{:ok, [{:result_set, columns, _, rows}], conn} ->
{:ok, %{num_rows: length(rows), rows: rows, columns: columns}, %{s | conn: conn}}
{:ok, [{:fetched_rows, _, _, _} = result], conn} -> {:cont, result, %{s | conn: conn}}
{:ok, [{:proc_result, 0, rows}], conn} -> {:ok, %{num_rows: length(rows), rows: rows}, %{s | conn: conn}}
{:ok, [{:proc_result, _, msg}], conn} -> {:error, msg, %{s | conn: conn}}
{:ok, [{:affected_rows, num_rows}], conn} -> {:ok, %{num_rows: num_rows, rows: nil}, %{s | conn: conn}}
{:ok, result, conn} -> {:ok, result, %{s | conn: conn}}
{:error, _type, reason, conn} -> {:disconnect, reason, %{s | conn: conn}}
end
end And then basically just threading the updated What do you think? |
Yes, I like it. type conn -> DBConnection.conn() - > GenServer.server() -> pid() | name() | {atom(), node()} In :jamdb_oracle_conn.sql_query first param is record #oraclient{} |
pid #PID<0.336.0 is linked to pool #PID<0.333.0> One more #PID<0.331.0> erlang:process_info output {{#PID<0.333.0>, {#PID<0.336.0>, {#PID<0.331.0>, |
No, I didn't use :jamdb_oracle (the genserver) at all now. So no There are two TODOs left on that branch:
I also had to add
to jamdb_oracle_conn - as without a socket connection |
Thank you, I'll check. I created branch dbconnection_direct
For example, if connection was closed and pid doesn't exists.
In some rare cases, but not sure |
Please check and test master branch and stage branch.
|
I wrote a test case, adding it to test "DBConnection behaviour on connection errors" do
some_query = %Jamdb.Oracle.Query{statement: ["select * from dual"]}
# with an unreachable oracle server,
# connection should start anyway,
assert {:ok, conn} = Jamdb.Oracle.start_link(hostname: "localhost", port: 7777, username: "bla", password: "foo", database: "does_not_exist", pool_size: 1)
# just using it should throw an DBConnection.Error
assert_raise DBConnection.ConnectionError, fn -> DBConnection.prepare_execute!(conn, some_query, []) end
end It fails in Feel free to copy and commit that code if you like. |
I got the identical results.
|
Identical to me, or identical between the two branches? On the current master (501f45c) I get
|
I got error on master too :( Can't understand why connect is calling multiple times, if :jamdb_oracle.start_link fails. |
fixed !? |
I would say it is not supposed to fail. Postgrex fails when essential config options are missing, but not when the server is currently not answering: iex(1)> Postgrex.start_link(database: "foo", hostname: "invalid.invalid")
{:ok, #PID<0.638.0>}
(only writes log messages; process does not die)
iex(3)> Postgrex.start_link([])
{:ok, #PID<0.644.0>}
** (EXIT from #PID<0.636.0>) shell process exited with reason: killed |
Uhh, I would expect a lot of other problems following from that. I think the solution on the |
Found a little bug on the stable branch: In {:error, err, conn} -> {:disconnect, err, %{s | conn: conn}} instead of
|
I also had some weird occurrences of errors, where a jamdb_oracle functions tried something on values that seemed to represent messages from other processes (like a Phoenix.Socket message). I looked into jamdb_oracle_conn.erl, and the little 'tricks' with Passwd, Cursor and Task seem suspicious. |
I did that on this branch: https://github.com/active-group/jamdb_oracle/tree/conn_no_mvars btw. |
Hi there!
If the connection to the Oracle server is lost during the runtime of an application (Ecto/Phoenix), resp the server is down for a short time, the Ecto adapter seems to deadlock and also does not recover when the Oracle server is available again.
I only see these log messages, once per worker in the pool (the second one actually twice per worker),
With the Ecto Postgresql-Adapater for example, I get a
DBConnection.ConnectionError
exception raised from the Repo functions instead (after a few retry attempts). But if the Postgres server is back, the Repo works again.The text was updated successfully, but these errors were encountered: