Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dotted version vectors on common tests #11

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

lrascao
Copy link

@lrascao lrascao commented Jul 2, 2017

Instead of keeping a large ets table of all messages seen, keep a dvv per stored object and use that instead in the plumtree behaviour

@lrascao lrascao force-pushed the dvv branch 2 times, most recently from 6b0bc5c to dc6f535 Compare July 2, 2017 17:39
.travis.yml Outdated
- make dialyzer
- make lint
- ./rebar3 ct --suite test/plumtree_SUITE --group hyparview --case broadcast_high_client_test
# - make xref
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you comment these out?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry, i should have mentioned it, this is to try and catch logs for this failed test, doesn't happen always

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all debug commits are prefixed with !drop in the message

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, should I be reviewing this for a merge or should I wait until the !drop commits are removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting until i've found the root cause of the failed tests is maybe better, i'll ping again when that happens, meanwhile i'll mark as the PR as a WIP

@lrascao lrascao changed the title Use dotted version vector on common tests WIP: Use dotted version vector on common tests Jul 2, 2017
@cmeiklejohn
Copy link
Member

cmeiklejohn commented Jul 2, 2017

Great, that helps. Thanks.

@lrascao lrascao force-pushed the dvv branch 2 times, most recently from 8215c6b to 209de60 Compare July 4, 2017 18:00
@lrascao lrascao changed the title WIP: Use dotted version vector on common tests Use dotted version vector on common tests Jul 4, 2017
@lrascao lrascao changed the title Use dotted version vector on common tests Use dotted version vectors on common tests Jul 4, 2017
@lrascao
Copy link
Author

lrascao commented Jul 12, 2017

@cmeiklejohn this is now ready for review

Copy link
Member

@cmeiklejohn cmeiklejohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why the DVV addition is valuable/necessary?

[send_lazy(MessageId, Mod, Round, Root, Peer) ||
{MessageId, Mod, Round, Root} <- ordsets:to_list(Messages)].

send_lazy(MessageId, Mod, Round, Root, Peer) ->
plumtree_util:log(debug, "sending lazy push ~p",
[{i_have, MessageId, Mod, Round, Root, myself()}]),
% plumtree_util:log(debug, "sending lazy push ~p",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove commented code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, fixup 694ce21 was pushed to that effect

State1 = add_lazy(From, Root, State),
_ = send({prune, Root, myself()}, Mod, From),
State1;
handle_broadcast({true, MessageId}, _OldMessageId, Message, Mod, Round, Root, From, State) ->
handle_broadcast(true, MessageId, Message, Mod, Round, Root, From, State);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this new case for?

Copy link
Author

@lrascao lrascao Jul 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to overcome an issue i came across with. While performing a broadcast storm (ie. all members generating broadcasts) i would occasionally end up in a situation where one member would be rejecting graft requests as the context provided was stale (ie. another subsequent write had subsumed it).
node A would generate a new write that gets eager pushed to node B, node B would then lazy push it to node C, node C doesn't have the key so it requests B to graft it but it is providing the context that originated from A, from the point of view of B this context is stale, this way B never replies to C with the missing key.
This extra return tuple to merge allows node B to return the new context after the merge and that's the one that gets lazy pushed to C.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's neat. Can you document this in the code with a comment to explain this very case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -645,7 +657,7 @@ send(Msg, Mod, P) ->
partisan_peer_service),
PeerServiceManager = PeerService:manager(),
instrument_transmission(Msg, Mod),
PeerServiceManager:forward_message(P, ?SERVER, Msg).
ok = PeerServiceManager:forward_message(P, ?SERVER, Msg).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails, will it crash? Is that right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will, the idea is to not silently forward messages to failed peers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, won't this crash the caller? I'm wondering if this should return error instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't crash the caller but it does crash the plumtree_broadcast worker process, returning error here doesn't really help since it is being ignored on all invocations

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another alternative is reporting an error so it has some visibility, thoughts?

@lrascao
Copy link
Author

lrascao commented Jul 13, 2017

The DVV approach for tests is an alternative for the previous ETS based one which i'm not at all comfortable about going to production with (ie. a grow only table that keeps track of seen messages), most of the ideas/code were picked up from riak core and applied here.

@lrascao
Copy link
Author

lrascao commented Jul 14, 2017

there is one pending fixup that i need to squash before this gets merged

When performing a broadcast merge it might be the
case that the application wants to return a new message
id that results from the merge and have that broadcast instead
of the originally received message id. Support this by handling
an extra return tuple from the callback.
@lrascao
Copy link
Author

lrascao commented Jul 21, 2017

fixup rebase is done, commit history is now clean

@cmeiklejohn
Copy link
Member

Should this be merged now? I don't remember where we left off with this and I'm trying to clear some of the outstanding PRs we have open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants