Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

propose POA testnet #158

Closed
wants to merge 9 commits into from
Closed

propose POA testnet #158

wants to merge 9 commits into from

Conversation

jjyr
Copy link
Contributor

@jjyr jjyr commented Dec 9, 2019

Updates:

  • 2019-12-19: update off-chain governance
  • 2019-12-18: update block attestation and verification

The Aggron testnet is use POW. Unfortunately, the average block time can be very slow even up to a few minutes due to mining power join and leave, see the peak from the chart https://explorer.nervos.org/aggron/charts.

To provide a stable testnet for development I propose a POA testnet to replace the current Aggron.

  • The block time should be very stable, approximate to 8 seconds.
  • The POA testnet supposed to be a long term running testnet, the validators of the testnet should be maintained and governance by the community.
  • A minority malicious validators can't halt or censorship on POA testnet.

view file

rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
@jjyr jjyr marked this pull request as ready for review December 19, 2019 07:23
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved

1. For block height `n` a validator checks `n % VALIDAROR_COUNT`.
2. If `INDEX == n % VALIDATOR_COUNT`, which means the validator is in it's turn to attests block `n`. The validator should wait for `BLOCK_INTERVAL` seconds then attests the block `n` with difficulty set to `2`.
3. If `INDEX != n % VALIDATOR_COUNT`, which means the validator is not in it's turn to attests block `n`. The validator should wait for `BLOCK_INTERVAL + rand(VALIDATOR_COUNT) * 0.5` seconds to wait for another attester to produce a new block. If there are no new block produced during the time, the validator should attest a new block with difficulty set to `1`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the result of BLOCK_INTERVAL + rand(VALIDATOR_COUNT) * 0.5 be BLOCK_INTERVAL? Or very close to BLOCK_INTERVAL? If so a validator and attester may generate a new block at nearly the same time, and the validator's block arrives other nodes earlier than the attester's. This could cause a lot of 1-block fork switch on testnet.

I suggest adding a buffer to validator's wait period, e.g. BLOCK_INTERVAL * 2 + rand(VALIDATOR_COUNT) * 0.5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add another variable BLOCK_TIMEOUT, set a value that slightly greater than BLOCK_INTERVAL like 10s.

Every validator who not in its turn waits for BLOCK_TIMEOUT + rand(VALIDATOR_COUNT) * 0.5 seconds to produce a block, this reduces the possibilities that two validators attest in the same time.

In fact, even there are two blocks produced at the same time it's won't hurt; the block that attested by an in-turn validator always has higher total difficulty; even if two blocks have same total difficulty, the next in turned validator can decide the main chain, the chain selection is just like the POW.

rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved
doitian
doitian previously approved these changes Jan 3, 2020
@doitian doitian requested review from janx, a team, quake and xxuejie and removed request for a team January 10, 2020 14:55
@nirenzang
Copy link
Contributor

I am curious when validator "n % VALIDATOR_COUNT" did not produce a block, why can't we ask the next validator to produce two blocks in a row? That would be much simpler, and we have no risk of short forks.

@rev-chaos
Copy link
Contributor

I'm not sure this mechanism could ensure converge in high probability. Forks happen easily, a validator attests for different forks in ATTEST_INTERVAL could not be considered malicious, or the consensus may halt. Then malicious validators can make use of this to break converge.

@shuyang-sjtu
Copy link

I am also curious about the case when the corresponding validator of a round fails to propose a block. It is hard to catch the intuition behind the strategy. For example, what role the randomness plays in "rand(VALIDATOR_COUNT) * 0.5" and why the difficulty is set to "2 when an in-turn attester produces a block and set to 1 when a not in-turn attester produces a block". Is it possible that validators not-in-turn collude to propose a chain of continuous blocks of weight 1? I know this is what ATTEST_INTERVAL and the eviction rule are set for. However, for the eviction rule, it is hard to formally (by a program) tell adversaries from nodes suffering delays. I am worried that there can still be an attack by continuous 1-weight blocks despite the ATTEST_INTERVAL since adversary nodes may "pipeline" and the ATTEST_INTERVAL could not be too long since VALIDATOR_COUNT > ATTEST_INTERVAL must hold.

@janx
Copy link
Member

janx commented Jan 16, 2020

It seems some questions are asked due to inconsistent views on the security model of testnet PoA:

  • what assumptions can we make?
  • what kind of adversarial behaviors should be tolerated?
  • in what circumstances testnet should be reset?

@jjyr
Copy link
Contributor Author

jjyr commented Jan 16, 2020

I am also curious about the case when the corresponding validator of a round fails to propose a block. It is hard to catch the intuition behind the strategy. For example, what role the randomness plays in "rand(VALIDATOR_COUNT) * 0.5" and why the difficulty is set to "2 when an in-turn attester produces a block and set to 1 when a not in-turn attester produces a block". Is it possible that validators not-in-turn collude to propose a chain of continuous blocks of weight 1? I know this is what ATTEST_INTERVAL and the eviction rule are set for. However, for the eviction rule, it is hard to formally (by a program) tell adversaries from nodes suffering delays. I am worried that there can still be an attack by continuous 1-weight blocks despite the ATTEST_INTERVAL since adversary nodes may "pipeline" and the ATTEST_INTERVAL could not be too long since VALIDATOR_COUNT > ATTEST_INTERVAL must hold.

  1. The randomness "rand(VALIDATOR_COUNT) * 0.5" is for reducing the possibility that two validators produce a block at the same time when an attester skips its turn.
  2. The reason of set difficulty to 2 for an in-turn attestation, is that due to the desynchronize clock and network delay, a validator may not see the new block then attests a not in-turn block; in this case, the in-turn block has higher total-difficulty, so client more likely choose the in-turn block as the main chain.
  3. This protocol assumes the number of adversary nodes less than half of VALIDATOR_COUNT, so we can set ATTEST_INTERVAL to VALIDATOR_COUNT / 2, so the eviction couldn't be stopped under this assumption.

@jjyr
Copy link
Contributor Author

jjyr commented Jan 16, 2020

I'm not sure this mechanism could ensure converge in high probability. Forks happen easily, a validator attests for different forks in ATTEST_INTERVAL could not be considered malicious, or the consensus may halt. Then malicious validators can make use of this to break converge.

The intention of this protocol is not to force converge to a single chain; instead of eliminating the fork, we eliminate the validators who make the fork; the core part of this protocol is to make sure that honest validators eventually can perform an eviction to remove malicious validators from the validator list.

@shuyang-sjtu
Copy link

shuyang-sjtu commented Jan 16, 2020

I am also curious about the case when the corresponding validator of a round fails to propose a block. It is hard to catch the intuition behind the strategy. For example, what role the randomness plays in "rand(VALIDATOR_COUNT) * 0.5" and why the difficulty is set to "2 when an in-turn attester produces a block and set to 1 when a not in-turn attester produces a block". Is it possible that validators not-in-turn collude to propose a chain of continuous blocks of weight 1? I know this is what ATTEST_INTERVAL and the eviction rule are set for. However, for the eviction rule, it is hard to formally (by a program) tell adversaries from nodes suffering delays. I am worried that there can still be an attack by continuous 1-weight blocks despite the ATTEST_INTERVAL since adversary nodes may "pipeline" and the ATTEST_INTERVAL could not be too long since VALIDATOR_COUNT > ATTEST_INTERVAL must hold.

  1. The randomness "rand(VALIDATOR_COUNT) * 0.5" is for reducing the possibility that two validators produce a block at the same time when an attester skips its turn.
  2. The reason of set difficulty to 2 for an in-turn attestation, is that due to the desynchronize clock and network delay, a validator may not see the new block then attests a not in-turn block; in this case, the in-turn block has higher total-difficulty, so client more likely choose the in-turn block as the main chain.
  3. This protocol assumes the number of adversary nodes less than half of VALIDATOR_COUNT, so we can set ATTEST_INTERVAL to VALIDATOR_COUNT / 2, so the eviction couldn't be stopped under this assumption.

For 1, I am totally convinced. For 2-3, what I meant is I am concerned that "1" weight for not-in-turn blocks might be too great (it might look more natural if it is something parameterized by ATTEST_INTERVAL / VALIDATOR_COUNT), here is an example (which tells what I meant by "pipeline") that the adversary could forever propose 1-weight blocks with half of all nodes being adversaries.

adversary no. Round1 Round2 Round3 Round4 Round5 Round6 Round7
Adv1 1 1 1
Adv2 1 1
Adv3 1 1

Here we assume VALIDATOR_COUNT=6, ATTEST_INTERVAL=3 and 3 of them are malicious (sure the assumption is "<1/2 VALIDATOR_COUNT" rather than "<=1/2" so it might be somehow different). "1" (in the table) for Round_i and adv_j means the j-th adversary proposes a 1-weight block in round i. In this case, malicious nodes can cooperate to propose a chain of continuous 1-weight blocks of any large total weight they want. But I am not sure whether this will be a great problem to worry about, since this issue can be possibly solved by the eviction.

@jjyr
Copy link
Contributor Author

jjyr commented Jan 16, 2020

It seems some questions are asked due to inconsistent views on the security model of testnet PoA:

  • what assumptions can we make?
  • what kind of adversarial behaviors should be tolerated?
  • in what circumstances testnet should be reset?
  • what assumptions can we make?
    1. The number of adversary nodes less than half of VALIDATOR_COUNT
  • what kind of adversarial behaviors should be tolerated?
    1. Since our purpose is to support testnet: censorship, collude, mined on forks won't hurt real assets, all these behaviors won't halt the chain either; on the contrary, these behaviors made the malicious nodes easy to be detected by a program.
  • in what circumstances testnet should be reset?
    1. malicious nodes are more than half of VALIDATOR_COUNT
    2. validators can't make an agreement of eviction on off-chain governance.

@jjyr
Copy link
Contributor Author

jjyr commented Jan 16, 2020

@shuyang-sjtu

https://github.com/nervosnetwork/rfcs/pull/158/files#diff-6eca90ec8afbdba69e3e0b53a6dcae4dR33

After produces block 1, Adv1 must wait for ATTEST_INTERVAL to produce the next block which number is 5.

So Adv2 and Adv3 can't produce block 2, 3, 4 continuously; there must be at least one block within 2, 3, 4 that produced by an honest validator; honest validators should use this chance to evict adversaries.

@shuyang-sjtu
Copy link

shuyang-sjtu commented Jan 16, 2020

@shuyang-sjtu

https://github.com/nervosnetwork/rfcs/pull/158/files#diff-6eca90ec8afbdba69e3e0b53a6dcae4dR33

After produces block 1, Adv1 must wait for ATTEST_INTERVAL to produce the next block which number is 5.

So Adv2 and Adv3 can't produce block 2, 3, 4 continuously; there must be at least one block within 2, 3, 4 that produced by an honest validator;

Oh sorry, I did not make it clear. I just now (for simplicity) assumed a second block can be proposed when "current_index - previous_index >= ATTEST_INTERVAL" (but in fact, it should be ">"), so the adversary cannot propose block 4 in the real protocol. This is what I tried to mean by "sure the assumption is "<1/2 VALIDATOR_COUNT" rather than "<=1/2" so it might be somehow different". Sorry for not conveying it clearly. Anyway, I think this is not a great issue in practice.

honest validators should use this chance to evict adversaries.

Sure! this issue can be solved by eviction rule if detected in time.

@jjyr
Copy link
Contributor Author

jjyr commented Jan 16, 2020

I am curious when validator "n % VALIDATOR_COUNT" did not produce a block, why can't we ask the next validator to produce two blocks in a row? That would be much simpler, and we have no risk of short forks.

If we allow the validator to produce n blocks, how to distinguish that if there really n blocks are skipped or the validator is malicious? for example:

For a 4 validator case, A, B is malicious, and C, D is honest. after B produces a block, A can pretend that C and D do not produce blocks within timeout, how C and D(or other nodes) handling this situation?

@nirenzang
Copy link
Contributor

I am curious when validator "n % VALIDATOR_COUNT" did not produce a block, why can't we ask the next validator to produce two blocks in a row? That would be much simpler, and we have no risk of short forks.

If we allow the validator to produce n blocks, how to distinguish that if there really n blocks are skipped or the validator is malicious? for example:

For a 4 validator case, A, B is malicious, and C, D is honest. after B produces a block, A can pretend that C and D do not produce blocks within timeout, how C and D(or other nodes) handling this situation?

C and D will keep producing blocks at their designated time slot, and their blocks will have heavier weights, so they will consider the chain constructed by themselves the authentic chain. Am I missing something? I am not familiar with POA so very likely I miss sth.

@nirenzang
Copy link
Contributor

nirenzang commented Jan 21, 2020

I see. As long as we

  1. log all info on the testnet consensus (block proposer, block receiving time, vote on eliminating block proposer...),
  2. raise an alarm when sth goes wrong (chain growth rate significantly lower than expected, number of orphaned blocks significantly higher than expected, blocks proposed by some proposers significantly larger than the others),

we can always solve the problem off-line when an alarm is raised via manual examination.

I have no more questions.

@shuyang-sjtu
Copy link

This protocol is extremely suitable for testing smart contracts on the testnet. As a matter of fact, it is very unlikely to have a great portion of adversaries in the real testnet since they have no incentive to undermine the system for development. This protocol can rule out all attacks come up by me even with almost half total validator nodes malicious, as long as an alarm is raised in time.


`ATTEST_INTERVAL` can be set to `VALIDATOR_COUNT / 2` the honest validators could eventually evict malicious validators unless the half of validators corrupted.

One thing is that CKB uses 2-phase commitment, a transaction must be proposed first before committed in a block. This means the honest validators need at least produce two blocks to finally commit the eviction transaction, and these two blocks must be within the proposal window, so we choose a large enough value in the POA testnet: `TX_PROPOSAL_WINDOW` is set to `ProposalWindow(2, MAX_VALIDATOR_COUNT)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

@doitian
Copy link
Member

doitian commented Jul 6, 2020

Closed. See #194

@doitian doitian closed this Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants