You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have been trying to play around with the SpecPaxos implementation. The scenario I'm trying is simple: I run five replicas on a single machine (listening on different ports) and I have five clients sending requests in a closed loop. I understand that for SpecPaxos to deliver high throughput and low latency, the network needs to provide ordered delivery (at least for most of the time). If not, there will be many conflicts, leading to many roll backs that can hurt performance, but the system must keep making progress.
However, in the above scenario, I see that the replicas start to crash after a while. Once two replicas crash (in a five-node cluster), the clients block indefinitely.
Details:
I compiled with the paranoid flag on.
Here is how I start the servers:
Hello, I have been trying to play around with the SpecPaxos implementation. The scenario I'm trying is simple: I run five replicas on a single machine (listening on different ports) and I have five clients sending requests in a closed loop. I understand that for SpecPaxos to deliver high throughput and low latency, the network needs to provide ordered delivery (at least for most of the time). If not, there will be many conflicts, leading to many roll backs that can hurt performance, but the system must keep making progress.
However, in the above scenario, I see that the replicas start to crash after a while. Once two replicas crash (in a five-node cluster), the clients block indefinitely.
Details:
I compiled with the paranoid flag on.
Here is how I start the servers:
./bench/replica -c ./conf -i 0 -m spec >rep0 2>&1 &
./bench/replica -c ./conf -i 1 -m spec >rep1 2>&1 &
./bench/replica -c ./conf -i 2 -m spec >rep2 2>&1 &
./bench/replica -c ./conf -i 3 -m spec >rep3 2>&1 &
./bench/replica -c ./conf -i 4 -m spec >rep4 2>&1 &
Here is how start the clients:
./bench/client -c ./conf -n 1000 -m spec >cli-0 2>&1 &
./bench/client -c ./conf -n 1000 -m spec >cli-1 2>&1 &
./bench/client -c ./conf -n 1000 -m spec >cli-2 2>&1 &
./bench/client -c ./conf -n 1000 -m spec >cli-3 2>&1 &
./bench/client -c ./conf -n 1000 -m spec >cli-4 2>&1 &
Here is the stack trace of a replica that is crashing:
20190907-154417-2122 17865 * MergeLogs (replica.cc:820): [2] Merging 3 logs
20190907-154417-2124 17865 PANIC MergeLogs (replica.cc:1060): Assertion `newEntry.viewstamp.view == entry.view()' failed
20190907-154417-2124 17865 ! Backtrace (message.cc:169): Backtrace:
20190907-154417-2128 17865 ! Backtrace (message.cc:220): 0: _Z6_Panicv+0x9 [0x440314]
20190907-154417-2130 17865 ! Backtrace (message.cc:220): 1: _ZN9specpaxos4spec11SpecReplica9MergeLogsEmmRKSt3mapIiNS0_5proto19DoViewChangeMessageESt4lessIiESaISt4pairIKiS4_EEERSt6vectorINS_3Log8LogEntryESaISG_EE+0x1a19 [0x40bc9f]
20190907-154417-2132 17865 ! Backtrace (message.cc:220): 2: _ZN9specpaxos4spec11SpecReplica18HandleDoViewChangeERK16TransportAddressRKNS0_5proto19DoViewChangeMessageE+0x965 [0x40e5a3]
20190907-154417-2134 17865 ! Backtrace (message.cc:220): 3: ZN9specpaxos4spec11SpecReplica14ReceiveMessageERK16TransportAddressRKSsS6+0x5bd [0x4077c7]
20190907-154417-2136 17865 ! Backtrace (message.cc:220): 4: _ZN12UDPTransport10OnReadableEi+0xb84 [0x4489fc]
20190907-154417-2138 17865 ! Backtrace (message.cc:220): 5: _ZN12UDPTransport14SocketCallbackEisPv+0x39 [0x448f2b]
20190907-154417-2140 17865 ! Backtrace (message.cc:220): 6: event_base_loop+0x754 [0x7f684a341f24]
20190907-154417-2142 17865 ! Backtrace (message.cc:220): 7: _ZN12UDPTransport3RunEv+0x1f [0x447bff]
20190907-154417-2144 17865 ! Backtrace (message.cc:220): 8: main+0x94f [0x40610f]
20190907-154417-2146 17865 ! Backtrace (message.cc:220): 9: __libc_start_main+0xf5 [0x7f684938af45]
20190907-154417-2148 17865 ! Backtrace (message.cc:220): 10: _start+0x29 [0x4056c9]
20190907-154417-2150 17865 ! Backtrace (message.cc:220): 11: ???+0x29 [0x29]
I can attach the full logs if needed. Ideally, the replicas should not crash but resolve the conflicts and make progress.
The text was updated successfully, but these errors were encountered: