propose POA testnet #158

jjyr · 2019-12-09T05:21:11Z

Updates:

2019-12-19: update off-chain governance
2019-12-18: update block attestation and verification

The Aggron testnet is use POW. Unfortunately, the average block time can be very slow even up to a few minutes due to mining power join and leave, see the peak from the chart https://explorer.nervos.org/aggron/charts.

To provide a stable testnet for development I propose a POA testnet to replace the current Aggron.

The block time should be very stable, approximate to 8 seconds.
The POA testnet supposed to be a long term running testnet, the validators of the testnet should be maintained and governance by the community.
A minority malicious validators can't halt or censorship on POA testnet.

view file

rfcs/0000-poa-testnet/0000-poa-testnet.md

Co-Authored-By: Jason Chai <[email protected]>

rfcs/0000-poa-testnet/0000-poa-testnet.md

Co-Authored-By: Jan Xie <[email protected]>

rfcs/0000-poa-testnet/0000-poa-testnet.md

janx · 2019-12-30T02:25:24Z

rfcs/0000-poa-testnet/0000-poa-testnet.md

+
+1. For block height `n` a validator checks `n % VALIDAROR_COUNT`.
+2. If `INDEX == n % VALIDATOR_COUNT`, which means the validator is in it's turn to attests block `n`. The validator should wait for `BLOCK_INTERVAL` seconds then attests the block `n` with difficulty set to `2`.
+3. If `INDEX != n % VALIDATOR_COUNT`, which means the validator is not in it's turn to attests block `n`. The validator should wait for `BLOCK_INTERVAL + rand(VALIDATOR_COUNT) * 0.5` seconds to wait for another attester to produce a new block. If there are no new block produced during the time, the validator should attest a new block with difficulty set to `1`.


Could the result of BLOCK_INTERVAL + rand(VALIDATOR_COUNT) * 0.5 be BLOCK_INTERVAL? Or very close to BLOCK_INTERVAL? If so a validator and attester may generate a new block at nearly the same time, and the validator's block arrives other nodes earlier than the attester's. This could cause a lot of 1-block fork switch on testnet.

I suggest adding a buffer to validator's wait period, e.g. BLOCK_INTERVAL * 2 + rand(VALIDATOR_COUNT) * 0.5.

We could add another variable BLOCK_TIMEOUT, set a value that slightly greater than BLOCK_INTERVAL like 10s.

Every validator who not in its turn waits for BLOCK_TIMEOUT + rand(VALIDATOR_COUNT) * 0.5 seconds to produce a block, this reduces the possibilities that two validators attest in the same time.

In fact, even there are two blocks produced at the same time it's won't hurt; the block that attested by an in-turn validator always has higher total difficulty; even if two blocks have same total difficulty, the next in turned validator can decide the main chain, the chain selection is just like the POW.

rfcs/0000-poa-testnet/0000-poa-testnet.md

Co-Authored-By: Jan Xie <[email protected]>

nirenzang · 2020-01-14T08:27:49Z

I am curious when validator "n % VALIDATOR_COUNT" did not produce a block, why can't we ask the next validator to produce two blocks in a row? That would be much simpler, and we have no risk of short forks.

rev-chaos · 2020-01-14T09:24:00Z

I'm not sure this mechanism could ensure converge in high probability. Forks happen easily, a validator attests for different forks in ATTEST_INTERVAL could not be considered malicious, or the consensus may halt. Then malicious validators can make use of this to break converge.

shuyang-sjtu · 2020-01-14T19:14:08Z

I am also curious about the case when the corresponding validator of a round fails to propose a block. It is hard to catch the intuition behind the strategy. For example, what role the randomness plays in "rand(VALIDATOR_COUNT) * 0.5" and why the difficulty is set to "2 when an in-turn attester produces a block and set to 1 when a not in-turn attester produces a block". Is it possible that validators not-in-turn collude to propose a chain of continuous blocks of weight 1? I know this is what ATTEST_INTERVAL and the eviction rule are set for. However, for the eviction rule, it is hard to formally (by a program) tell adversaries from nodes suffering delays. I am worried that there can still be an attack by continuous 1-weight blocks despite the ATTEST_INTERVAL since adversary nodes may "pipeline" and the ATTEST_INTERVAL could not be too long since VALIDATOR_COUNT > ATTEST_INTERVAL must hold.

janx · 2020-01-16T05:53:51Z

It seems some questions are asked due to inconsistent views on the security model of testnet PoA:

what assumptions can we make?
what kind of adversarial behaviors should be tolerated?
in what circumstances testnet should be reset?

jjyr · 2020-01-16T10:28:08Z

I am also curious about the case when the corresponding validator of a round fails to propose a block. It is hard to catch the intuition behind the strategy. For example, what role the randomness plays in "rand(VALIDATOR_COUNT) * 0.5" and why the difficulty is set to "2 when an in-turn attester produces a block and set to 1 when a not in-turn attester produces a block". Is it possible that validators not-in-turn collude to propose a chain of continuous blocks of weight 1? I know this is what ATTEST_INTERVAL and the eviction rule are set for. However, for the eviction rule, it is hard to formally (by a program) tell adversaries from nodes suffering delays. I am worried that there can still be an attack by continuous 1-weight blocks despite the ATTEST_INTERVAL since adversary nodes may "pipeline" and the ATTEST_INTERVAL could not be too long since VALIDATOR_COUNT > ATTEST_INTERVAL must hold.

The randomness "rand(VALIDATOR_COUNT) * 0.5" is for reducing the possibility that two validators produce a block at the same time when an attester skips its turn.
The reason of set difficulty to 2 for an in-turn attestation, is that due to the desynchronize clock and network delay, a validator may not see the new block then attests a not in-turn block; in this case, the in-turn block has higher total-difficulty, so client more likely choose the in-turn block as the main chain.
This protocol assumes the number of adversary nodes less than half of VALIDATOR_COUNT, so we can set ATTEST_INTERVAL to VALIDATOR_COUNT / 2, so the eviction couldn't be stopped under this assumption.

jjyr · 2020-01-16T10:50:39Z

I'm not sure this mechanism could ensure converge in high probability. Forks happen easily, a validator attests for different forks in ATTEST_INTERVAL could not be considered malicious, or the consensus may halt. Then malicious validators can make use of this to break converge.

The intention of this protocol is not to force converge to a single chain; instead of eliminating the fork, we eliminate the validators who make the fork; the core part of this protocol is to make sure that honest validators eventually can perform an eviction to remove malicious validators from the validator list.

shuyang-sjtu · 2020-01-16T11:39:37Z

I am also curious about the case when the corresponding validator of a round fails to propose a block. It is hard to catch the intuition behind the strategy. For example, what role the randomness plays in "rand(VALIDATOR_COUNT) * 0.5" and why the difficulty is set to "2 when an in-turn attester produces a block and set to 1 when a not in-turn attester produces a block". Is it possible that validators not-in-turn collude to propose a chain of continuous blocks of weight 1? I know this is what ATTEST_INTERVAL and the eviction rule are set for. However, for the eviction rule, it is hard to formally (by a program) tell adversaries from nodes suffering delays. I am worried that there can still be an attack by continuous 1-weight blocks despite the ATTEST_INTERVAL since adversary nodes may "pipeline" and the ATTEST_INTERVAL could not be too long since VALIDATOR_COUNT > ATTEST_INTERVAL must hold.

The randomness "rand(VALIDATOR_COUNT) * 0.5" is for reducing the possibility that two validators produce a block at the same time when an attester skips its turn.

The reason of set difficulty to 2 for an in-turn attestation, is that due to the desynchronize clock and network delay, a validator may not see the new block then attests a not in-turn block; in this case, the in-turn block has higher total-difficulty, so client more likely choose the in-turn block as the main chain.

This protocol assumes the number of adversary nodes less than half of VALIDATOR_COUNT, so we can set ATTEST_INTERVAL to VALIDATOR_COUNT / 2, so the eviction couldn't be stopped under this assumption.

For 1, I am totally convinced. For 2-3, what I meant is I am concerned that "1" weight for not-in-turn blocks might be too great (it might look more natural if it is something parameterized by ATTEST_INTERVAL / VALIDATOR_COUNT), here is an example (which tells what I meant by "pipeline") that the adversary could forever propose 1-weight blocks with half of all nodes being adversaries.

adversary no.	Round1	Round2	Round3	Round4	Round5	Round6	Round7
Adv1	1			1			1
Adv2		1			1
Adv3			1			1

Here we assume VALIDATOR_COUNT=6, ATTEST_INTERVAL=3 and 3 of them are malicious (sure the assumption is "<1/2 VALIDATOR_COUNT" rather than "<=1/2" so it might be somehow different). "1" (in the table) for Round_i and adv_j means the j-th adversary proposes a 1-weight block in round i. In this case, malicious nodes can cooperate to propose a chain of continuous 1-weight blocks of any large total weight they want. But I am not sure whether this will be a great problem to worry about, since this issue can be possibly solved by the eviction.

jjyr · 2020-01-16T11:41:23Z

It seems some questions are asked due to inconsistent views on the security model of testnet PoA:

what assumptions can we make?

what kind of adversarial behaviors should be tolerated?

in what circumstances testnet should be reset?

what assumptions can we make?
1. The number of adversary nodes less than half of VALIDATOR_COUNT
what kind of adversarial behaviors should be tolerated?
1. Since our purpose is to support testnet: censorship, collude, mined on forks won't hurt real assets, all these behaviors won't halt the chain either; on the contrary, these behaviors made the malicious nodes easy to be detected by a program.
in what circumstances testnet should be reset?
1. malicious nodes are more than half of VALIDATOR_COUNT
2. validators can't make an agreement of eviction on off-chain governance.

jjyr · 2020-01-16T11:49:38Z

@shuyang-sjtu

https://github.com/nervosnetwork/rfcs/pull/158/files#diff-6eca90ec8afbdba69e3e0b53a6dcae4dR33

After produces block 1, Adv1 must wait for ATTEST_INTERVAL to produce the next block which number is 5.

So Adv2 and Adv3 can't produce block 2, 3, 4 continuously; there must be at least one block within 2, 3, 4 that produced by an honest validator; honest validators should use this chance to evict adversaries.

shuyang-sjtu · 2020-01-16T11:53:51Z

@shuyang-sjtu

https://github.com/nervosnetwork/rfcs/pull/158/files#diff-6eca90ec8afbdba69e3e0b53a6dcae4dR33

After produces block 1, Adv1 must wait for ATTEST_INTERVAL to produce the next block which number is 5.

So Adv2 and Adv3 can't produce block 2, 3, 4 continuously; there must be at least one block within 2, 3, 4 that produced by an honest validator;

Oh sorry, I did not make it clear. I just now (for simplicity) assumed a second block can be proposed when "current_index - previous_index >= ATTEST_INTERVAL" (but in fact, it should be ">"), so the adversary cannot propose block 4 in the real protocol. This is what I tried to mean by "sure the assumption is "<1/2 VALIDATOR_COUNT" rather than "<=1/2" so it might be somehow different". Sorry for not conveying it clearly. Anyway, I think this is not a great issue in practice.

honest validators should use this chance to evict adversaries.

Sure! this issue can be solved by eviction rule if detected in time.

jjyr · 2020-01-16T12:04:32Z

I am curious when validator "n % VALIDATOR_COUNT" did not produce a block, why can't we ask the next validator to produce two blocks in a row? That would be much simpler, and we have no risk of short forks.

If we allow the validator to produce n blocks, how to distinguish that if there really n blocks are skipped or the validator is malicious? for example:

For a 4 validator case, A, B is malicious, and C, D is honest. after B produces a block, A can pretend that C and D do not produce blocks within timeout, how C and D(or other nodes) handling this situation?

nirenzang · 2020-01-19T09:00:05Z

I am curious when validator "n % VALIDATOR_COUNT" did not produce a block, why can't we ask the next validator to produce two blocks in a row? That would be much simpler, and we have no risk of short forks.

If we allow the validator to produce n blocks, how to distinguish that if there really n blocks are skipped or the validator is malicious? for example:

For a 4 validator case, A, B is malicious, and C, D is honest. after B produces a block, A can pretend that C and D do not produce blocks within timeout, how C and D(or other nodes) handling this situation?

C and D will keep producing blocks at their designated time slot, and their blocks will have heavier weights, so they will consider the chain constructed by themselves the authentic chain. Am I missing something? I am not familiar with POA so very likely I miss sth.

nirenzang · 2020-01-21T06:16:07Z

I see. As long as we

log all info on the testnet consensus (block proposer, block receiving time, vote on eliminating block proposer...),
raise an alarm when sth goes wrong (chain growth rate significantly lower than expected, number of orphaned blocks significantly higher than expected, blocks proposed by some proposers significantly larger than the others),

we can always solve the problem off-line when an alarm is raised via manual examination.

I have no more questions.

shuyang-sjtu · 2020-01-22T06:43:17Z

This protocol is extremely suitable for testing smart contracts on the testnet. As a matter of fact, it is very unlikely to have a great portion of adversaries in the real testnet since they have no incentive to undermine the system for development. This protocol can rule out all attacks come up by me even with almost half total validator nodes malicious, as long as an alarm is raised in time.

doitian · 2020-02-05T02:32:53Z

rfcs/0000-poa-testnet/0000-poa-testnet.md

+
+`ATTEST_INTERVAL` can be set to `VALIDATOR_COUNT / 2` the honest validators could eventually evict malicious validators unless the half of validators corrupted.
+
+One thing is that CKB uses 2-phase commitment, a transaction must be proposed first before committed in a block. This means the honest validators need at least produce two blocks to finally commit the eviction transaction, and these two blocks must be within the proposal window, so we choose a large enough value in the POA testnet: `TX_PROPOSAL_WINDOW` is set to `ProposalWindow(2, MAX_VALIDATOR_COUNT)`.


doitian · 2020-07-06T09:27:02Z

Closed. See #194

propose POA testnet

a5b62f1

stwith reviewed Dec 9, 2019

View reviewed changes

rfcs/0000-poa-testnet/0000-poa-testnet.md Outdated Show resolved Hide resolved

Update rfcs/0000-poa-testnet/0000-poa-testnet.md

2eef9d8

Co-Authored-By: Jason Chai <[email protected]>

janx requested changes Dec 17, 2019

View reviewed changes

jjyr and others added 4 commits December 18, 2019 12:51

Apply suggestions from code review

851857d

Co-Authored-By: Jan Xie <[email protected]>

Update POA network attestation

d6961c9

off-chain governance on POA testnet

077ff46

Update POA testnet some minor changes

7c18d78

jjyr marked this pull request as ready for review December 19, 2019 07:23

janx requested changes Dec 30, 2019

View reviewed changes

jjyr force-pushed the poa-testnet branch from 82adb45 to 0af1d61 Compare December 30, 2019 15:24

jjyr and others added 2 commits December 30, 2019 23:25

POATestnet: Apply suggestions from code review

0af1d61

Co-Authored-By: Jan Xie <[email protected]>

POWTestnet: introduce variable BLOCK_TIMEOUT

0f9c321

doitian previously approved these changes Jan 3, 2020

View reviewed changes

doitian requested review from janx, a team, quake and xxuejie and removed request for a team January 10, 2020 14:55

POATestnet: adjust TX_PROPOSAL_WINDOW in POA testnet

efba938

jjyr dismissed doitian’s stale review via efba938 January 23, 2020 03:10

jjyr requested a review from a team as a code owner January 23, 2020 03:10

doitian reviewed Feb 5, 2020

View reviewed changes

doitian approved these changes Feb 5, 2020

View reviewed changes

doitian marked this pull request as draft April 27, 2020 01:33

doitian mentioned this pull request Jul 6, 2020

Add an RFC to describe the Testnet Aggron PoW algorithm #194

Open

doitian closed this Jul 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

propose POA testnet #158

propose POA testnet #158

jjyr commented Dec 9, 2019 •

edited

Loading

janx Dec 30, 2019

jjyr Dec 30, 2019

nirenzang commented Jan 14, 2020

rev-chaos commented Jan 14, 2020

shuyang-sjtu commented Jan 14, 2020

janx commented Jan 16, 2020

jjyr commented Jan 16, 2020

jjyr commented Jan 16, 2020

shuyang-sjtu commented Jan 16, 2020 •

edited

Loading

jjyr commented Jan 16, 2020 •

edited

Loading

jjyr commented Jan 16, 2020

shuyang-sjtu commented Jan 16, 2020 •

edited

Loading

jjyr commented Jan 16, 2020

nirenzang commented Jan 19, 2020

nirenzang commented Jan 21, 2020 •

edited

Loading

shuyang-sjtu commented Jan 22, 2020

doitian Feb 5, 2020

doitian commented Jul 6, 2020


		`ATTEST_INTERVAL` can be set to `VALIDATOR_COUNT / 2` the honest validators could eventually evict malicious validators unless the half of validators corrupted.

		One thing is that CKB uses 2-phase commitment, a transaction must be proposed first before committed in a block. This means the honest validators need at least produce two blocks to finally commit the eviction transaction, and these two blocks must be within the proposal window, so we choose a large enough value in the POA testnet: `TX_PROPOSAL_WINDOW` is set to `ProposalWindow(2, MAX_VALIDATOR_COUNT)`.

propose POA testnet #158

propose POA testnet #158

Conversation

jjyr commented Dec 9, 2019 • edited Loading

janx Dec 30, 2019

Choose a reason for hiding this comment

jjyr Dec 30, 2019

Choose a reason for hiding this comment

nirenzang commented Jan 14, 2020

rev-chaos commented Jan 14, 2020

shuyang-sjtu commented Jan 14, 2020

janx commented Jan 16, 2020

jjyr commented Jan 16, 2020

jjyr commented Jan 16, 2020

shuyang-sjtu commented Jan 16, 2020 • edited Loading

jjyr commented Jan 16, 2020 • edited Loading

jjyr commented Jan 16, 2020

shuyang-sjtu commented Jan 16, 2020 • edited Loading

jjyr commented Jan 16, 2020

nirenzang commented Jan 19, 2020

nirenzang commented Jan 21, 2020 • edited Loading

shuyang-sjtu commented Jan 22, 2020

doitian Feb 5, 2020

Choose a reason for hiding this comment

doitian commented Jul 6, 2020

jjyr commented Dec 9, 2019 •

edited

Loading

shuyang-sjtu commented Jan 16, 2020 •

edited

Loading

jjyr commented Jan 16, 2020 •

edited

Loading

shuyang-sjtu commented Jan 16, 2020 •

edited

Loading

nirenzang commented Jan 21, 2020 •

edited

Loading