Skip to content

Jiacheng-WU/naive_paxos

Repository files navigation

Team Members:

1. Jiacheng Wu:    [email protected]
2. Yuan-Mao Chueh: [email protected]

Project Description:

    * Build Requirements:
        * Configuration:
            - The configuration files are hardcoded in role/config.cpp load_config function
            - It is also easy to support reading JSON config files, which is not the important
            - We could support it before the demo!!
        * Tools and Libs
            - CMake : Ubuntu (snap install cmake), MacOS (brew install cmake)
            - Boost : Ubuntu (apt-get install libboost-all-dev) , MacOS (brew install boost)
            - Clang/GCC : should support C++ 20, LLVM/Clang 14.0 or higher is better!
        * Build Steps
            - mkdir build
            - cd build
            - cmake -DCMAKE_BUILD_TYPE=(Debug|Release) [-DCMAKE_CXX_COMPILER=clang++{>=14} -DCMAKE_CXX_COMPILER=clang{>=14}} ..
            - make
        * Output
            - paxos_server (The Paxos Server runs Paxos instance)
            - paxos_client (Example Client to issue requests)

    * Run Specifications:
        * For Server
            - We should specify the server_id for the server
            - ./paxos_server {id} ../config.json
            - Server started with specific id will use the ip and port specified in config.cpp
        * For Client
            - ./paxos_client {port = 0} ../config.json ../test.command
            - If not specified test file, then the client will use interactive mode
            - We support 4 types of command
                - lock (object_id)
                - unlock (object_id)
                - wait (number) {that is pause for number milliseconds}
                - server (server_id) {force next command to send server_id}
            - Just run since it is just an example displayed to issue requests
        * For Recovery
            - ./paxos_server {id} recovery
            - Add "recovery" after the command


    * Assumptions:
        * Simplicity Discovery:
            - The identity of the Paxos group members is hard coded at the clients in the code or through a configuration file.
        * Clients will be almost well-behaved:
            - the client could fail after acquiring a lock.
            - as long as it would restart with the same ip and port
            - then the client could unlock the previously locked item
            - The deadlock somehow can be handled by sending LOCK_AGAIN
        * Not consider the atmost-once or atleast-once semantics
            - even though we have already embedded at client_once in COMMANDS
            - but due to time limits, just ignore it here

    * Bonus Points
        * Recovery:
            - Use a separate log for the acceptor and learner
            - Use flush rather than fdatasync to sync due to it is hard to obtain fd from fstream
            - Though I knew it is better to use fdatasync to ensure flush to disk instead of os
        * Resending Message:
            - Add timeout for the proposer
            - the proposer will try to resend prepare/accept if it doesn't receive the majority message for a long time.
        * Deadlock:
            - We would like to return LOCK_AGAIN state to identify one client and try lock again.

Implementation:
    * No Leader/Master
        - No Leader Election Procedure
        - Each server could serve client requests, submit them to Paxos group, and responds to the request.
    * Three roles in a single entity
        - We have an Instance containing Proposer, Acceptor, and Learner
        - Thus, Proposer sometimes could directly read the corresponding Learner information
        - Some Optimization:
            * Proposer I don't send PREPARE/ACCEPT if the Learner I already learned the current sequence is chosen
            * Acceptor I inform other Proposer J with INFORM message if the Learner I has learned the chosen value
    * Asynchronous Message Event Mechanism
        - Use ASIO framework and callbacks to handle concurrent client requests
        - Each arrived Message would trigger a specific event to propose
        - Sometimes need a lock to maintain the critical section
            - Say if the instance would have two messages arrive at the same time, and may have conflicts to process
            - But if io_context is single_thread, it may be not necessary to hold a lock.
    * Request resubmit Mechanism
        - Each client request would be assigned to an Instance with Sequence SEQ
        - But Paxos may finally not choose the client requests as value, (Due to other higher proposal numbers)
        - Then how to tackle especially in an Asynchronous manner?
            - We keep the relation between SEQ and submitted value (client requests/commands) in unordered_map
            - After we could execute the value for the current SEQ
            - We will first retrieve the original value of this SEQ from the above relation
            - And compare them to if they are totally same
                - if no, it means the submitted value is not chosen for the current SEQ, we resubmit it
                - if yes, then we could simply drop the relation (SEQ->value) and respond to the client
    * Command Execution
        - How to Know when a command is chosen and when the command could be executed.
            - Additional INFORM message and timeout mechanism to ensure correctness and efficiency.
            - When the Learner received majority accepted message on the latest proposal number
            - Then this Learner knows this PAXOS instance achieved consensus, it just sends INFORM message
            - When other learners received the INFORM, they also knew we achieved a consensus
            - Next, they just put the value (commands) into a min heap of the server (less sequence is in the front)
            - Each server also maintains the current executed SEQ, and finds if there is a next SEQ to execute
            - One command can be executed (and therefore respond to clients) only if the previous SEQ all executed
            - Say even if SEQ 5 is achieved consensus, it still needs to wait for SEQ 1,2,3,4 to achieve consensus and execute.
            - We just use a min-heap to maintain accepted but cannot execute values and also deduplicated
            - Then once a new value is put into the heap, we just detect whether the front SEQ is executed SEQ + 1
            - If not, we just ignore it. Otherwise, we could execute the front SEQ, remove it, update executed SEQ, and detect recursively
    * Concise Structure
        - Only use 10 * sizeof(std::uint32_t) for each message, which is short and faster
            - Use enum MessageType and enum OperationType to distinguish
            - Use the same structure for both client and server message
            - But reuse some fields in different situations.
        - Only maintain necessary information on each Proposer, Acceptor, Learner
            - If some information is common, then put this information into the Instance or Server
            - Proposer/Acceptor/Learner just keep a necessary pointer to Instance (and Server)


Some Discussions:

    Suppose we have proposer P1, P2, P3
    Suppose we have acceptor A1, A2, A3

    * The Consensus is that once > half of acceptors agreed on a number, then they have a consensus.
        - More Specific, > half of the acceptors have accepted the same proposal number (not the value)
        - In fact, we should observe the PSM from the client side, once the PSM return done
        - the client side could only get the same value for the current instance after that time.
            - even from different nodes
            - somehow could block if the current instance on this node is not learned


    * Proposer P needs to identify which PREPARE proposal number the Acceptor (PROMISE) replied to !!
        Image an extreme case:
        - P first send PREPARE to A1, A2, A3 with proposal number n = 3;
        - A1, A2, A3 recv PREPARE and PROMISE not to respond to other proposals less than n = 3;
            - But these three messages arrive late until the next proposed action.
        - P finds it is a timeout to receive PROMISE, and re-propose PREPARE with n = 6;
            - But these three message just lost and A1, A2, A3 didn't receive it.
        - P then receives the message in the previous round with n = 3;
            - If we do not identify the proposal number,
              then P would think A1, A2, A3 promised not to respond other proposal with n = 6,
              even though now A1, A2, A3 never receive this PREPARE request with n = 6!!!
        - Then, it violates the Paxos algorithm, and may lead the acceptor to accept two different values!!
        Therefore, Propose P should identify which PREPARE numbered n the PROMISE message replied to.
        - If we received the out-of-dated PROMISE, that is the replied n < current proposal number
        - we should omit this PROMISE, and do nothing since the new proposal is already started
        - That's also the reason for additional_field_1 in struct Message;

    * Learner L should distinguish the different proposal numbers (but with the same value) of the ACCEPTED message
        - Note that consensus is achieved when a majority of Acceptors accept the same identifier number
        - (rather than the same value). [From Wikipedia]
        - I also have a counterexample if the learner just accepted > half of acceptors only agreed on the same value

    * It is legal to propose no-ops even if some acceptors may already previously accept some values when instance recovery
        - Since the instance will re-execute unlearned commands and the whole procedure of basic Paxos
        - Then even if it proposes no-ops in Phase 1 (actually Phase 1 even do not need proposal value)
        - It will still receive other accepted values as the value of current instances
        - Only if the majority of acceptors who received the PREPARE and replied PROMISE with no value
        - Could the new instance decide the current proposal value with NO-OPS!!
        - Thus, it is not wrong to propose NO-OPS when recovery for never learned instances.

    * It is also legal to even not propose no-ops or just use already existing sequences after recovery
        - Just similar to the above cases since just replacing the no-ops with current values submitted by clients
        - If we have a mechanism to support re-submit client requests when
            - the commands for the current instance is not the client submitted!


    * We need to persist in what state on Paxos nodes
        - The acceptor at least should preserve the three items of each instance for future recovery
        - Otherwise, the acceptor may reply to two different proposers with inconsistent states.

    * The proposer can send Accept Message to all acceptors even if it only received a majority of the promise
        - That is also the reason why multi-Paxos works
        - In fact, two majorities of groups must at least have one same value
        - Even we could imagine an extreme case, that is
            - Propose P1 first receives promise from A1, A2
            - And the P1 just sends accept to A2, A3. It still works
            - But for efficiency and reasonable,
            - it is better to update highest_prepare_proposal_number as well

    * In multi-paxos, the leader could skip the PREPARE in Phase 1, but what proposal number should it propose?
        - It must be the least proposal that should have
        - Otherwise may cause problems, Image this case
        - All proposers finish Instance 99 and continue to 100.
        - However, the leader P1 is somehow delayed and cannot contact P3,
        - then P3 tries to elect a leader and starts with proposal_number = 1
        - After P3 has finished the whole procedure and the learner even learned the value from > 1/2 acceptor
        - P1 is awake and simply executes the ACCEPT phase with its own proposal value with a higher number = 3
        - Then all other acceptors simply accept since it is a newer one which broke the Paxos
        - Therefore, we need the default number for the Leader is the less than one
        - and should be less than other possible proposal numbers in PREPARE
        - In such cases, the error will be discovered, and then after P1 is rejected,
        - It could arise a new round of Paxos or just learn a new leader!!
        - We should also mention here that if the leader finds itself timeout,
        - The leader should also do the whole Paxos (including PHASE 1) to ensure consensus!!

    * The Acceptor's PROMISE (on_prepare) and ACCEPTED (on_accept) should be atomic, that is,
        - we cannot separate the two procedures in PROMISE
            - WP writing the highest_prepare_proposal_number and
            - RA reading the highest_accepted_proposal_number/value
        - we also cannot separate the two procedures in ACCEPT
            - RP reading the highest_prepare_proposal_number and
            - WA writing the highest_accepted_proposal_number/value
        - Otherwise, here is the case:
            - P2.proposal_number in PREPARE > P1.proposal_number in ACCEPT
            - P1 sends ACCEPT  request to A1 and A1 executes RP in ACCEPTED, which passes the check
            - P2 sends PREPARE request to A1 and A1 executes WP in PROMISE
            - A1 in PROMISE execute RA and then A1 in ACCEPT execute WA
            - RP (for P1), WP (for P2), RA (for P2), WA( for P1)
            - The above case is not tolerable since P1 succeed write Value and P2 read Undecided Value
                - Say meanwhile if A2 already accepted P1, then now P1's proposal is accepted by majority
                - and P2 only send PREPARE to A1 and A3, then P2 is also pass the PHASE 1 with no specified value
                - then P2 could specify its own value different A1 and send ACCEPT to A1/A3 which also accepted
                - Thus, in this case, the majority of acceptors change their opinions
                - even after > half of the acceptors have agreed on a different value before.
                - which violates the consensus.
            - In fact, any sequential execution won't cause the above situations
                - if A1 accepted for P1 is before A1 promise to P2, then P2 will read the accepted value
                - if A1 try to accept P1 is after A1 promise to P2, then P1 will be rejected by A1

    * Learner's Behaviour, it is not necessary to the history of accepted cases
        - Learner only needs to care about the largest proposal number of accepted messages
        - Though it may somehow sacrifice a little of liveliness properties, but is easy to implement
        - In fact, the instance can be stopped if
            - there is a majority of acceptors accepted a proposal number
            - In even meantime some proposers may propose a new number
        - In our simplified case, the learner may only maintain this new number's accepted info
        - which may influence liveness somehow.
        - But we do an optimization by delaying the updating info until received new ACCEPTED
        - rather than the PREPARE!

    * We only need to inform once or even not inform
        - In fact, if the current sequence is already chosen, but the Server didn't know
        - It would choose a new proposal number to PREPARE, and then it finally will get the value
        - So it is not even necessary to inform others, but we just optimized for it
            - use an extra round to notify all the consensus values.
            - avoid the duplicated PREPARE/ACCEPT by making the acceptor reply INFORM.

    * Leader Election in Paxos is really a complex problem
        - Some implementations would like to use redirect messages to tell clients to resent to the server
        - It would cause the specific leader to handle too many requests
        - Even though we could apply optimizations to avoid PREPARE and ACCEPT message
        - However, in this case, we should tackle complex situations where
            - What if the leader failed?
            - What if two leaders are elected?
            - Is it needed to log leader election on Paxos Replicated Log?
                - If necessary, then how to decide the specific sequence for such an election message
                - We will find at that time, we still need the basic to process.
        - In fact, Raft paper is better to read and just figure out the above problems by its protocol.

About

Naive Paxos

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages