Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transaction task never completes after connection blip #2742

Open
jlao opened this issue Jun 18, 2024 · 5 comments
Open

Transaction task never completes after connection blip #2742

jlao opened this issue Jun 18, 2024 · 5 comments

Comments

@jlao
Copy link

jlao commented Jun 18, 2024

Hello, I left a comment on #2630 but it was already closed so I'm creating a new issue. We recently saw this happen again. From my previous post which still holds:

We saw another instance where a Redis transaction is hung forever after a connection blip. The version we are using is 2.7.17.27058.

In the logs, I see this sequence of events:

07:20:22.719 - <log line right before we do the transaction>
07:20:22.990 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 4) on 10.10.70.233:6001/Interactive, Flushed/Faulted, last: GET, origin: ReadFromPipe, outstanding: 2, last-read: 4s ago, last-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:23.012 - ConnectionRestored with args: 10.10.70.233:6001
07:20:26.558 - ConfigurationChanged with endpoint: 10.10.70.233:6001
07:20:30.259 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 39) on 10.10.70.232:6001/Interactive, Idle/Faulted, last: HGET, origin: ReadFromPipe, outstanding: 1, last-read: 4s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.233:6003/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.232:6002/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.231:6003/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.261 - ConnectionFailed with args: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 7) on 10.10.70.233:6002/Interactive, Idle/Faulted, last: PING, origin: ReadFromPipe, outstanding: 1, last-read: 3s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 6 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
07:20:30.269 - ConnectionRestored with args: 10.10.70.233:6003
07:20:30.269 - ConnectionRestored with args: 10.10.70.232:6002
07:20:30.269 - ConnectionRestored with args: 10.10.70.231:6003
07:20:30.270 - ConnectionRestored with args: 10.10.70.232:6001
07:20:30.270 - ConnectionRestored with args: 10.10.70.233:6002
07:20:30.275 - StackExchange.Redis.RedisConnectionException: SocketFailure (ReadSocketError/ConnectionReset, last-recv: 39) on 10.10.70.232:6001/Interactive, Idle/Faulted, last: HGET, origin: ReadFromPipe, outstanding: 1, last-read: 4s ago, last-write: 0s ago, unanswered-write: 0s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 9 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.7.17.27058
	 ---> Pipelines.Sockets.Unofficial.ConnectionResetException: An existing connection was forcibly closed by the remote host.
	 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
	   at Pipelines.Sockets.Unofficial.Internal.Throw.Socket(Int32 errorCode) in /_/src/Pipelines.Sockets.Unofficial/Internal/Throw.cs:line 59
	   at Pipelines.Sockets.Unofficial.SocketAwaitableEventArgs.GetResult() in /_/src/Pipelines.Sockets.Unofficial/SocketAwaitableEventArgs.cs:line 74
07:20:30.292 - ConfigurationChanged with endpoint: 10.10.70.232:6001

This is what the transaction code looks like:

ITransaction transaction = Database.CreateTransaction();
transaction.AddCondition(
	Condition.StringEqual(key1, 124));

Task<bool> task1 = transaction.KeyExpireAsync(
	key1,
	TimeSpan.FromMinutes(30));

Task<bool> task2 = transaction.KeyExpireAsync(
	key2,
	TimeSpan.FromMinutes(30));

Task<bool> task3 = transaction.KeyExpireAsync(
	key3,
	TimeSpan.FromMinutes(30));

Task<bool> task4 = transaction.KeyExpireAsync(
	key4,
	TimeSpan.FromMinutes(30));

if (!await transaction.ExecuteAsync())
{
	throw new Exception();
}

bool exp1 = await task1;
bool exp2 = await task2;
bool exp3 = await task3;
bool exp4 = await task4;

_logger.Verbose("{0} | {1} | {2} | {3}", exp1, exp2, exp3, exp4);

I took a process dump and when I open it I see:

  • A task is stuck on transaction.ExecuteAsync().
    • This task's status is WaitingForActivation
  • task1, task2, task3, and task4 are all Cancelled.
  • transaction appears to be null suggesting it was garbage collected since it went out of scope.

Some notes:

  • We do NOT recreate the ConnectionMultiplexer.
  • I can see in the process dump that we still have a reference to the ConnectionMultiplexer that we created originally.

I was able to gather some new information from the latest instance of this issue. I took a process dump and I noticed 36 async logical stack traces in PhysicalConnection.ReadFromPipe. I took a look at each one and in every case PhysicalConnection.TransactionActive was false. My best guess is that the transaction task was orphaned somehow.

Any thoughts on this?

@StenPetrov
Copy link

Similar issue encountered with calls to StringGetWithExpiryAsync - some of the calls just hang after the server experiences a short spike of timeouts

@renqiuchen-kurogames
Copy link

We encountered this bug in our production environment, with driver version v2.6.66. Upon analysis, we discovered a thread safety issue in the PhysicalConnection class, specifically within the RecordConnectionFailed and EnqueueInsideWriteLock functions. If the RecordConnectionFailed function reaches line 483 and another thread executes the EnqueueInsideWriteLock function, the Message will never be processed, causing the asynchronous task to remain suspended indefinitely.

@renqiuchen-kurogames
Copy link

@mgravell @NickCraver

@specialforest
Copy link

I wonder if #2773 and #2786 both related to this?

@jlao
Copy link
Author

jlao commented Nov 9, 2024

I may have a repro with the following unit tests. If I "Run until failure" this unit test it reliably fails in less than 20 runs with some kind of failure. Sometimes it fails because of timeout (Task seems to be hung) but interestingly I also see other exceptions bubbling up such as InvalidOperationException and ArgumentOutOfRangeExceptions that may be indicative of other bugs.

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
using Xunit;
using Xunit.Abstractions;

namespace StackExchange.Redis.Tests.Issues;

public class Issue2742 : TestBase
{
    public Issue2742(ITestOutputHelper output, SharedConnectionFixture? fixture = null)
        : base(output, fixture) { }

    [Fact]
    public async Task Execute()
    {
        try
        {
            using var conn = Create(allowAdmin: true, shared: false, asyncTimeout: 1000);
            var server = GetServer(conn);
            var db = conn.GetDatabase();

            await db.PingAsync();

            RedisKey key = Me();
            await db.KeyDeleteAsync(key);

            const int numIter = 1000;
            List<Task<long>> incTasks = new(numIter);
            List<Task<bool>> txTasks = new(numIter);
            for (int i = 0; i < numIter; i++)
            {
                var tx = db.CreateTransaction();
                _ = tx.StringGetAsync(key);
                incTasks.Add(tx.StringIncrementAsync(key));
                txTasks.Add(tx.ExecuteAsync());
            }

            conn.AllowConnect = false;

            await Task.WhenAny(incTasks);

            server.SimulateConnectionFailure(SimulatedFailureType.All);

            TimeSpan timeout = TimeSpan.FromSeconds(10);
            Task timeoutTask = Task.Delay(timeout);

            Stopwatch sw = Stopwatch.StartNew();
            long max = -1;
            int exceptions = 0;
            foreach (Task<long> t1 in incTasks)
            {
                try
                {
                    Task completedTask = await Task.WhenAny(t1, timeoutTask).ForAwait();
                    Assert.Equal(t1, completedTask);

                    long result = await t1;
                    max = Math.Max(max, result);
                }
                catch (TaskCanceledException)
                {
                    exceptions++;
                }
                catch (RedisException)
                {
                    exceptions++;
                }
            }

            Log($"inc {max}, num ex = {exceptions}");
            Assert.Equal(1000, max + exceptions);

            foreach (Task<bool> txTask in txTasks)
            {
                Assert.True(txTask.IsCompleted);
            }

            sw.Stop();

            Log($"elapsed = {sw.Elapsed}");
            Assert.True(sw.Elapsed < timeout, "took too long");
        }
        finally
        {
            ClearAmbientFailures();
        }
    }
}

An example of an ArgumentOutOfRangeException:

Message: 
System.ArgumentOutOfRangeException : Specified argument was out of the range of valid values. (Parameter 'position')

      Stack Trace: 
ThrowHelper.ThrowArgumentOutOfRangeException_PositionOutOfRange()
ReadOnlySequence`1.GetPosition(Int64 offset, SequencePosition origin)
BufferReader.SnapshotPosition() line 123
BufferReader.TryConsumeAsBuffer(Int32 count, ReadOnlySequence`1& buffer) line 147
BufferReader.ConsumeAsBuffer(Int32 count) line 130
PhysicalConnection.ReadLineTerminatedString(ResultType type, ResultFlags flags, BufferReader& reader) line 2088
PhysicalConnection.TryParseResult(ResultFlags flags, Arena`1 arena, ReadOnlySequence`1& buffer, BufferReader& reader, Boolean includeDetilInExceptions, ServerEndPoint server, Boolean allowInlineProtocol) line 2164
PhysicalConnection.TryParseResult(Boolean isResp3, Arena`1 arena, ReadOnlySequence`1& buffer, BufferReader& reader, Boolean includeDetilInExceptions, PhysicalConnection connection, Boolean allowInlineProtocol) line 2135
PhysicalConnection.ProcessBuffer(ReadOnlySequence`1& buffer) line 1954
PhysicalConnection.ReadFromPipe() line 1897
Issue2742.Execute() line 61
--- End of stack trace from previous location ---

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants