-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dotted version vectors on common tests #11
base: master
Are you sure you want to change the base?
Conversation
6b0bc5c
to
dc6f535
Compare
.travis.yml
Outdated
- make dialyzer | ||
- make lint | ||
- ./rebar3 ct --suite test/plumtree_SUITE --group hyparview --case broadcast_high_client_test | ||
# - make xref |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you comment these out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah sorry, i should have mentioned it, this is to try and catch logs for this failed test, doesn't happen always
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all debug commits are prefixed with !drop
in the message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, should I be reviewing this for a merge or should I wait until the !drop
commits are removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting until i've found the root cause of the failed tests is maybe better, i'll ping again when that happens, meanwhile i'll mark as the PR as a WIP
Great, that helps. Thanks. |
8215c6b
to
209de60
Compare
@cmeiklejohn this is now ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why the DVV addition is valuable/necessary?
src/plumtree_broadcast.erl
Outdated
[send_lazy(MessageId, Mod, Round, Root, Peer) || | ||
{MessageId, Mod, Round, Root} <- ordsets:to_list(Messages)]. | ||
|
||
send_lazy(MessageId, Mod, Round, Root, Peer) -> | ||
plumtree_util:log(debug, "sending lazy push ~p", | ||
[{i_have, MessageId, Mod, Round, Root, myself()}]), | ||
% plumtree_util:log(debug, "sending lazy push ~p", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove commented code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, fixup 694ce21
was pushed to that effect
src/plumtree_broadcast.erl
Outdated
State1 = add_lazy(From, Root, State), | ||
_ = send({prune, Root, myself()}, Mod, From), | ||
State1; | ||
handle_broadcast({true, MessageId}, _OldMessageId, Message, Mod, Round, Root, From, State) -> | ||
handle_broadcast(true, MessageId, Message, Mod, Round, Root, From, State); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this new case for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to overcome an issue i came across with. While performing a broadcast storm (ie. all members generating broadcasts) i would occasionally end up in a situation where one member would be rejecting graft
requests as the context provided was stale (ie. another subsequent write had subsumed it).
node A would generate a new write that gets eager pushed to node B, node B would then lazy push it to node C, node C doesn't have the key so it requests B to graft it but it is providing the context that originated from A, from the point of view of B this context is stale, this way B never replies to C with the missing key.
This extra return tuple to merge
allows node B to return the new context after the merge and that's the one that gets lazy pushed to C.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's neat. Can you document this in the code with a comment to explain this very case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -645,7 +657,7 @@ send(Msg, Mod, P) -> | |||
partisan_peer_service), | |||
PeerServiceManager = PeerService:manager(), | |||
instrument_transmission(Msg, Mod), | |||
PeerServiceManager:forward_message(P, ?SERVER, Msg). | |||
ok = PeerServiceManager:forward_message(P, ?SERVER, Msg). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this fails, will it crash? Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will, the idea is to not silently forward messages to failed peers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, won't this crash the caller? I'm wondering if this should return error instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't crash the caller but it does crash the plumtree_broadcast
worker process, returning error here doesn't really help since it is being ignored on all invocations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another alternative is reporting an error so it has some visibility, thoughts?
The DVV approach for tests is an alternative for the previous ETS based one which i'm not at all comfortable about going to production with (ie. a grow only table that keeps track of seen messages), most of the ideas/code were picked up from riak core and applied here. |
there is one pending fixup that i need to squash before this gets merged |
When performing a broadcast merge it might be the case that the application wants to return a new message id that results from the merge and have that broadcast instead of the originally received message id. Support this by handling an extra return tuple from the callback.
fixup rebase is done, commit history is now clean |
Should this be merged now? I don't remember where we left off with this and I'm trying to clear some of the outstanding PRs we have open. |
Instead of keeping a large ets table of all messages seen, keep a dvv per stored object and use that instead in the plumtree behaviour