-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transaction task never completes after connection blip #2742
Comments
Similar issue encountered with calls to StringGetWithExpiryAsync - some of the calls just hang after the server experiences a short spike of timeouts |
We encountered this bug in our production environment, with driver version v2.6.66. Upon analysis, we discovered a thread safety issue in the PhysicalConnection class, specifically within the RecordConnectionFailed and EnqueueInsideWriteLock functions. If the RecordConnectionFailed function reaches line 483 and another thread executes the EnqueueInsideWriteLock function, the Message will never be processed, causing the asynchronous task to remain suspended indefinitely. |
I may have a repro with the following unit tests. If I "Run until failure" this unit test it reliably fails in less than 20 runs with some kind of failure. Sometimes it fails because of timeout (Task seems to be hung) but interestingly I also see other exceptions bubbling up such as InvalidOperationException and ArgumentOutOfRangeExceptions that may be indicative of other bugs. using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
using Xunit;
using Xunit.Abstractions;
namespace StackExchange.Redis.Tests.Issues;
public class Issue2742 : TestBase
{
public Issue2742(ITestOutputHelper output, SharedConnectionFixture? fixture = null)
: base(output, fixture) { }
[Fact]
public async Task Execute()
{
try
{
using var conn = Create(allowAdmin: true, shared: false, asyncTimeout: 1000);
var server = GetServer(conn);
var db = conn.GetDatabase();
await db.PingAsync();
RedisKey key = Me();
await db.KeyDeleteAsync(key);
const int numIter = 1000;
List<Task<long>> incTasks = new(numIter);
List<Task<bool>> txTasks = new(numIter);
for (int i = 0; i < numIter; i++)
{
var tx = db.CreateTransaction();
_ = tx.StringGetAsync(key);
incTasks.Add(tx.StringIncrementAsync(key));
txTasks.Add(tx.ExecuteAsync());
}
conn.AllowConnect = false;
await Task.WhenAny(incTasks);
server.SimulateConnectionFailure(SimulatedFailureType.All);
TimeSpan timeout = TimeSpan.FromSeconds(10);
Task timeoutTask = Task.Delay(timeout);
Stopwatch sw = Stopwatch.StartNew();
long max = -1;
int exceptions = 0;
foreach (Task<long> t1 in incTasks)
{
try
{
Task completedTask = await Task.WhenAny(t1, timeoutTask).ForAwait();
Assert.Equal(t1, completedTask);
long result = await t1;
max = Math.Max(max, result);
}
catch (TaskCanceledException)
{
exceptions++;
}
catch (RedisException)
{
exceptions++;
}
}
Log($"inc {max}, num ex = {exceptions}");
Assert.Equal(1000, max + exceptions);
foreach (Task<bool> txTask in txTasks)
{
Assert.True(txTask.IsCompleted);
}
sw.Stop();
Log($"elapsed = {sw.Elapsed}");
Assert.True(sw.Elapsed < timeout, "took too long");
}
finally
{
ClearAmbientFailures();
}
}
} An example of an ArgumentOutOfRangeException:
|
Hello, I left a comment on #2630 but it was already closed so I'm creating a new issue. We recently saw this happen again. From my previous post which still holds:
We saw another instance where a Redis transaction is hung forever after a connection blip. The version we are using is 2.7.17.27058.
In the logs, I see this sequence of events:
This is what the transaction code looks like:
I took a process dump and when I open it I see:
transaction.ExecuteAsync()
.task1
,task2
,task3
, andtask4
are all Cancelled.transaction
appears to be null suggesting it was garbage collected since it went out of scope.Some notes:
I was able to gather some new information from the latest instance of this issue. I took a process dump and I noticed 36 async logical stack traces in
PhysicalConnection.ReadFromPipe
. I took a look at each one and in every casePhysicalConnection.TransactionActive
was false. My best guess is that the transaction task was orphaned somehow.Any thoughts on this?
The text was updated successfully, but these errors were encountered: