RFC: Overhauling the reputation system #2271

drcpu-github · 2022-09-13T19:59:37Z

drcpu-github
Sep 13, 2022
Collaborator

Problem statement

While the reputation system currently works reasonably well, over the past two years of the Witnet mainnet, it has become clear that there are two main issues:

If a node earns reputation, it often loses it quite fast. This leads to an extremely volatile Total Reputation Set (hereafter abbreviated as TRS), which defeats the point of a reputation system. There obviously should be a decent turnover rate of nodes to prevent reputation oligarchies forming and to give new nodes a fair chance of joining it, but right now it does not seem to be balanced.
It seems relatively easy for a node to reach the upper regions of the TRS. I have seen cases where a node on which collateral was staked reach the most-reputed position in less than two days. Again, this defeats the point of the reputation system as that means you can simply start a large amount of nodes of which some could reach the most-reputed positions quite quickly.

The consequences of these problems is that reputation does not correlate very well with actual trustworthiness and revealing an out-of-consensus result is not very punishing if you are likely to leave the TRS in a short timespan anyway. With this post, I want to gauge the sentiment of everyone on how to tackle these two problems.

Analysis of the first problem

At the moment the turnover rate of (highly) reputed nodes is quite high. An analysis of time in the TRS reveals that the average node stays in it for only 0.9 days. If we filter out nodes which gained less than 500 reputation during a stay in the TRS, the average residency time increases to about 2.8 days. Over 90% of these nodes stay in the TRS for less than a week.

The reason for the high turnover rate is most likely the small reputation_expire_alpha_diff consensus constant. This constant is set at 20000 which means that reputation earned by participating in a data request expires 20000 witnessing acts in the future.

During the month of July, the average number of witnessing acts per block was around ten. This means that if you earn reputation by revealing an in-consensus value, it will expire in about a day. Assuming we will see more data requests and more witnessing acts in the future, reputation will expire even faster. Obviously it is possible to earn new reputation during this period of time, but given how fast the average node leaves the TRS, a quick full reputation expiry seems to be the exception rather than the rule.

If the average node is prone to lose its reputation in less than a day, what stops said node to try and manipulate data requests? One could of course argue that it needs to post a collateral which it will lose if it is caught revealing an out-of-consensus answer. However, the reputation system is meant to serve as a second incentive and offer assurances to data requesters that reputed nodes have a vested interest in not manipulating data requests. Right now that is hardly the case.

Analysis of the second problem

To analyze problem two, I want to highlight a node which I recently saw reaching the spot of top-reputed quite fast. Note that I'm not attributing any malice to said node operator, up to now it has not revealed a single out-of-consensus value, I'm just using it as an example of what is wrong with the reputation system.

This node was funded in epoch 1333593. For the sake of this argument, it is safe to assume this was also the moment the node was started. It became the most-reputed node in the TRS with 3259 reputation at epoch 1336224, which is about 2600 epochs or less than a day and a half later. It stayed at that position for about 500 epochs or a quarter of a day. The reason it reached that position so fast is because it received two disproportionally large chunks of reputation at epochs 1336157 (1875 reputation) and 1336224 (1341 reputation). In both cases, this can be attributed to a large chunks of reputation expiring from many nodes combined with a couple of nodes being slashed due to revealing an out-of-consensus value.

The main problem is that this is not an isolated case. It happens quite a lot that a node gains a massive amount of reputation due to expired or slashed reputation being distributed to a small amount of nodes and this effect is constantly perpetuated and amplified over time. For example, the maximum reputation two nodes gained simultaneously is 20896 at epoch 1224707. Obviosuly this also catapulted those nodes to the top-reputed spots.

Potential solutions

I think both problems have a relatively simple (technical) solution.

The first problem could be tackled by increasing the reputation_expire_alpha_diff consensus constant. I think this constant needs to be doubled at least and probably quadrupled to offer nodes a fair chance to stay in the TRS for a longer time provided they are honest. As an added benefit, I would expect this to smoothen out the overall reputation curve to a more linear function instead of an exponential one.

The solution to problem two seems relatively obvious to me. Instead of distributing expired and slashed reputation in a single (subsequent) epoch, we need to smooth out the distribution curve over multiple epochs. For example, if we distribute these large chunks uniformly over the next ten data requests being solved, it would most likely solve the problem that a very small subset of nodes earns it all simply because it is quite unlikely for a node to solve multiple data requests in a short amount of time.

datgous · 2022-09-14T08:23:23Z

datgous
Sep 14, 2022

Re: first problem (from the Telegram group):

would reputation_expire_alpha_diff = k * moving_average(#_witnessing_acts) be fairer than a fixed number of data requests? 🤔

That's an interesting take. Given there is a fixed amount of reputation in the system (and no new reputation is created), I do suspect this would result in periods of time where solving a data request does not give you reputation (because no reputation expires).

5 replies

datgous Sep 14, 2022

That's an interesting take. Given there is a fixed amount of reputation in the system (and no new reputation is created), I do suspect this would result in periods of time where solving a data request does not give you reputation (because no reputation expires).

I see. So there is a need for a certain minimum amount (...) + m of expired reputation.

But let me rephrase: shouldn't reputation expiry be linked proportionally to the amount of witnessing acts, rather than in a fixed amount? The idea being that the more witnessing events there are, the more chances to make your reputation count (ie. have priority over others to solve data requests) and hence it would be fairer to have that reputation wane quicker than if there are little or no witnessing acts --does that make sense?

aesedepece Sep 14, 2022
Maintainer

Take into account that expiration of reputation is expressed in alpha diff instead of epochs in the first place to make it totally independent from the rate at which witnessing events happen.

That is, if the network is idle, your reputation will stay the same, but that's not an advantage because you are not getting new opportunities to grow your reputation either.

In other words, I think the system already behaves as you are describing!

drcpu-github Sep 14, 2022
Collaborator Author

Yes, spot on and that is essentially what happens with a constant reputation_expire_alpha_diff. If there are more witnessing events per epoch, your reputation will expire faster (e.g., after x/2 epochs instead of x epochs), but you will also have had more chances to participate in new data requests and earn new reputation.

My main gripe is that with the current setting, reputation expires very fast and nodes leave the TRS quite fast (which influences their chances of mining blocks) leading to what seems to be a pretty unstable TRS. That seems to be at odds with the idea behind a reputation system.

I do wonder whether a solution to problem two would result in the reputation system being more stable and render an update to reputation_expire_alpha_diff unnecessary.

aesedepece Sep 14, 2022
Maintainer

Some protocol developers always saw the volatility of reputation as a positive force that introduced additional unpredictability to the system and hindered attempts to accumulate reputation and perpetuate yourself into the ARS / TRS. I remember we used to explain that reputation was simply not what you would expect from its name.

Personally I never had a strong position on that take or the contrary, as I lacked the hard data to support one argument or another.

However, at this point, it is evident that reputation flows erratically, doing a poor service to its original goal. Hence I'm very positive to what is being proposed here.

datgous Sep 14, 2022

In other words, I think the system already behaves as you are describing!

Ah! My bad, I misunderstood the mechanism. Thanks for the clarification.

drcpu-github · 2022-09-14T11:15:13Z

drcpu-github
Sep 14, 2022
Collaborator Author

However, at this point, it is evident that reputation flows erratically, doing a poor service to its original goal. Hence I'm very positive to what is being proposed here.

I think this is the key conclusion here. While the short timespan in the TRS is somewhat worrisome to me, I don't consider some volatility to be bad. It's also noteworthy that most nodes rejoin the TRS multiple times. I looked at those numbers and it seems the average node rejoined the TRS five times in the past two years. Taking into account that a lot of nodes were probably stopped over the analysis timeframe, I'd say this number is even higher for active nodes.

0 replies

tmpolaczyk · 2022-09-20T10:10:25Z

tmpolaczyk
Sep 20, 2022

Regarding Problem 1, as @aesedepece mentioned, the reputation_expire_alpha_diff needs to be small in order to ensure that data request eligibility proofs are not predictable. Because for the protocol to be secure, it must be impossible to predict which nodes will participate in the resolution of a data request. Setting a higher reputation_expire_alpha_diff will increase the probability that the nodes that will resolve a future data request will be some of the nodes that have the most reputation today, and that increases the feasibility of bribery attacks (the attacker can announce that they are willing to lie for a price), as well as DDOS attacks (find the IP of the most reputed nodes and prevent them from participating in the data request). Both of these attacks are already possible, but since reputation changes quickly they are a bit harder to abuse.

If the average node is prone to lose its reputation in less than a day, what stops said node to try and manipulate data requests?

Well, it is not that simple. Why do nodes lose their reputation in less than a day?

No eligibility for any request: the node did not manage to create any valid VRF proof, and therefore they were unable to resolve any data requests, and their reputation expired.
Low quality answers to data requests: values out of consensus make a node lose half of their reputation, and committing errors makes them not gain any reputation.

In the first case, they can only try to manipulate one data request, the one they resolved to enter the TRS. And in the second case, they are already technically manipulating data requests, so it is true that they won't lose anything by manipulating them a bit more. There is a reputation incentive to behave honestly, because more reputation makes the node eligible for more data requests in the future. The same applies to the case of new nodes that don't have any reputation yet: they are incentivized to behave honestly in order to enter the TRS.

So I wouldn't consider this a problem unless it turns out that the majority of data requests get solved by nodes outside of the TRS, because then it is true that the reputation system is not working properly, and there is no incentive to behave honestly. I would expect something like a 80/20 split, where 80% of the nodes that solve data requests are from the TRS, and 20% are new nodes with 0 reputation. In that case, even if we assume that the 20% is malicious, data requests will keep being resolved properly as long as there is an honest TRS. Not sure what are the actual percentages though, but if the newcommer percentage is very high then this could be a problem.

Regarding Problem 2, it is true that it happens sometimes. If you think about it, the incentive makes sense, because why is there so much expired reputation? Either the current members of the TRS are no longer participating, or some of them have been penalized. In both cases we want new identities to enter the TRS, and the way to incentivize that is to provide a high reputation bounty. The problem is that the whole bounty can be divided amongst very few identities that got lucky and participated in the only data request in the first block. So, the idea of spreading out this expired reputation over a few blocks may work. Another option would be setting an upper limit to how much reputation can be gained per node, per data request, or per block.

Also there is a related issue, sometimes it is possible for the reputation reward to be 0 points, when there is no enough expired reputation. So it would be really nice if the solution to this problem also solved that other issue as well.

8 replies

aesedepece Sep 21, 2022
Maintainer

Reveals by reputed nodes: 21.226.456
Reveals by non-reputed nodes: 2.536.788 (11.95%)

Percentage troll here . Strictly speaking, non-reputed reveals over total reveals is actually 10.68%.

aesedepece Sep 21, 2022
Maintainer

More knit-picking here — what is the reputation packet size at 60%? It shows 34 but that's not possible in a histogram like this 🤔

drcpu-github Sep 21, 2022
Collaborator Author

Percentage troll here . Strictly speaking, non-reputed reveals over total reveals is actually 10.68%.

You're right, I fixed it.

More knit-picking here — what is the reputation packet size at 60%? It shows 34 but that's not possible in a histogram like this 🤔

The 60% boundary was a repeat mistake in my percentage list to calculate percentiles, the second number should've been 70% (updated the post).

tmpolaczyk Sep 22, 2022

Thank you for the analysis.

I agree that there needs to be a degree of unpredictability for which nodes will solve a specific data request, however, I'd argue that this is achieved by the eligibility curve (which significantly reduces the odds a high-reputation node will solve a data request) and the fact that the randomness of the VRF already makes it very unlikely you can predict which node will solve a specific data request.

Good point, also the greater the size of the TRS then the harder it is to predict.

If I read about a reputation system, I assume that it yields a set of actors which are known to behave well and trustworthy.

I understand that you would expect a fixed set of nodes that are known to be trustworthy, but that defies the benefits of having thousands of nodes, and enables the attacks mentioned before.

but I do believe that a residency of less than a day on average does not really yield the proper incentive to act honestly.

I still don't see this as a problem, because simply being part of the TRS already gives you benefits such as an increased probability of mining data requests and blocks. So by not acting honestly, a node would lose that benefits.

I'd say that earning more than 50 reputation for a single reveal (which happens in more than 30% of the cases) is probably not justifiable.

It's hard for me to say if 50 reputation is too much or too little, but I agree that there should be some sort of reputation range, so that if the average gain is 10 reputation, then gaining 1 reputation or 100 reputation should not be possible. But not sure how to implement that. The only indicator of how much is one reputation point is the penalization function which removes half of the reputation for each lie. So having 2^n reputation allows you to lie n times before leaving the TRS, although even without leaving the TRS less reputation means less eligibility.

If large packets of reputation get distributed at one point in time, they will also expire at the same point in time and this can result in an amplified effect.

That's true, and the only way for this reputation to split up later is if the revealers get penalized in different blocks. So I guess that in absence of lies the reputation will slowly converge into one single packet? That doesn't seem to be the case right now in mainnet, but maybe the reputation distribution algorithm will need to change a bit.

aesedepece Sep 23, 2022
Maintainer

If large packets of reputation get distributed at one point in time, they will also expire at the same point in time and this can result in an amplified effect.

Very interesting. The only force preventing that from happening is the occasional lies themselves. Otherwise, the trend would be to have many epochs without reputation distribution, and most of the reputation being distributed at once, only seldomly 🤔

drcpu-github · 2022-10-18T18:26:32Z

drcpu-github
Oct 18, 2022
Collaborator Author

Cross-posting this idea here so it does not get lost in a somewhat unrelated WIP discussion:

Distributing reputation over multiple epochs will directly influence the efficacy of reputation-stealing attacks. It will significantly decrease the profitability since the amount you'll steal will, for example, be spread uniformly over the next 10 epochs rather than all in one epoch.

Furthermore, we could choose to distributed slashed reputation starting at the next epoch, rather than the current one. This makes targetted attacks impossible since you cannot know if you'll be selected to solve a data request in the next epoch.

Those two updates yield an effect that is similar to burning reputation and creating it again in later epochs.

0 replies

aesedepece · 2023-03-23T21:54:39Z

aesedepece
Mar 23, 2023
Maintainer

May this be next in the WIP pipeline? I'll personally try to revisit and rethink this soon 🤔

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Overhauling the reputation system #2271

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 13 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

RFC: Overhauling the reputation system #2271

drcpu-github Sep 13, 2022 Collaborator

Problem statement

Analysis of the first problem

Analysis of the second problem

Potential solutions

Replies: 5 comments · 13 replies

datgous Sep 14, 2022

datgous Sep 14, 2022

aesedepece Sep 14, 2022 Maintainer

drcpu-github Sep 14, 2022 Collaborator Author

aesedepece Sep 14, 2022 Maintainer

datgous Sep 14, 2022

drcpu-github Sep 14, 2022 Collaborator Author

tmpolaczyk Sep 20, 2022

aesedepece Sep 21, 2022 Maintainer

aesedepece Sep 21, 2022 Maintainer

drcpu-github Sep 21, 2022 Collaborator Author

tmpolaczyk Sep 22, 2022

aesedepece Sep 23, 2022 Maintainer

drcpu-github Oct 18, 2022 Collaborator Author

aesedepece Mar 23, 2023 Maintainer

drcpu-github
Sep 13, 2022
Collaborator

Replies: 5 comments 13 replies

datgous
Sep 14, 2022

aesedepece Sep 14, 2022
Maintainer

drcpu-github Sep 14, 2022
Collaborator Author

aesedepece Sep 14, 2022
Maintainer

drcpu-github
Sep 14, 2022
Collaborator Author

tmpolaczyk
Sep 20, 2022

aesedepece Sep 21, 2022
Maintainer

aesedepece Sep 21, 2022
Maintainer

drcpu-github Sep 21, 2022
Collaborator Author

aesedepece Sep 23, 2022
Maintainer

drcpu-github
Oct 18, 2022
Collaborator Author

aesedepece
Mar 23, 2023
Maintainer