RFC: Overhauling the reputation system #2271
Replies: 5 comments 13 replies
-
Re: first problem (from the Telegram group):
|
Beta Was this translation helpful? Give feedback.
-
I think this is the key conclusion here. While the short timespan in the TRS is somewhat worrisome to me, I don't consider some volatility to be bad. It's also noteworthy that most nodes rejoin the TRS multiple times. I looked at those numbers and it seems the average node rejoined the TRS five times in the past two years. Taking into account that a lot of nodes were probably stopped over the analysis timeframe, I'd say this number is even higher for active nodes. |
Beta Was this translation helpful? Give feedback.
-
Regarding Problem 1, as @aesedepece mentioned, the reputation_expire_alpha_diff needs to be small in order to ensure that data request eligibility proofs are not predictable. Because for the protocol to be secure, it must be impossible to predict which nodes will participate in the resolution of a data request. Setting a higher reputation_expire_alpha_diff will increase the probability that the nodes that will resolve a future data request will be some of the nodes that have the most reputation today, and that increases the feasibility of bribery attacks (the attacker can announce that they are willing to lie for a price), as well as DDOS attacks (find the IP of the most reputed nodes and prevent them from participating in the data request). Both of these attacks are already possible, but since reputation changes quickly they are a bit harder to abuse.
Well, it is not that simple. Why do nodes lose their reputation in less than a day?
In the first case, they can only try to manipulate one data request, the one they resolved to enter the TRS. And in the second case, they are already technically manipulating data requests, so it is true that they won't lose anything by manipulating them a bit more. There is a reputation incentive to behave honestly, because more reputation makes the node eligible for more data requests in the future. The same applies to the case of new nodes that don't have any reputation yet: they are incentivized to behave honestly in order to enter the TRS. So I wouldn't consider this a problem unless it turns out that the majority of data requests get solved by nodes outside of the TRS, because then it is true that the reputation system is not working properly, and there is no incentive to behave honestly. I would expect something like a 80/20 split, where 80% of the nodes that solve data requests are from the TRS, and 20% are new nodes with 0 reputation. In that case, even if we assume that the 20% is malicious, data requests will keep being resolved properly as long as there is an honest TRS. Not sure what are the actual percentages though, but if the newcommer percentage is very high then this could be a problem. Regarding Problem 2, it is true that it happens sometimes. If you think about it, the incentive makes sense, because why is there so much expired reputation? Either the current members of the TRS are no longer participating, or some of them have been penalized. In both cases we want new identities to enter the TRS, and the way to incentivize that is to provide a high reputation bounty. The problem is that the whole bounty can be divided amongst very few identities that got lucky and participated in the only data request in the first block. So, the idea of spreading out this expired reputation over a few blocks may work. Another option would be setting an upper limit to how much reputation can be gained per node, per data request, or per block. Also there is a related issue, sometimes it is possible for the reputation reward to be 0 points, when there is no enough expired reputation. So it would be really nice if the solution to this problem also solved that other issue as well. |
Beta Was this translation helpful? Give feedback.
-
Cross-posting this idea here so it does not get lost in a somewhat unrelated WIP discussion: Distributing reputation over multiple epochs will directly influence the efficacy of reputation-stealing attacks. It will significantly decrease the profitability since the amount you'll steal will, for example, be spread uniformly over the next 10 epochs rather than all in one epoch. Furthermore, we could choose to distributed slashed reputation starting at the next epoch, rather than the current one. This makes targetted attacks impossible since you cannot know if you'll be selected to solve a data request in the next epoch. Those two updates yield an effect that is similar to burning reputation and creating it again in later epochs. |
Beta Was this translation helpful? Give feedback.
-
May this be next in the WIP pipeline? I'll personally try to revisit and rethink this soon 🤔 |
Beta Was this translation helpful? Give feedback.
-
Problem statement
While the reputation system currently works reasonably well, over the past two years of the Witnet mainnet, it has become clear that there are two main issues:
If a node earns reputation, it often loses it quite fast. This leads to an extremely volatile Total Reputation Set (hereafter abbreviated as TRS), which defeats the point of a reputation system. There obviously should be a decent turnover rate of nodes to prevent reputation oligarchies forming and to give new nodes a fair chance of joining it, but right now it does not seem to be balanced.
It seems relatively easy for a node to reach the upper regions of the TRS. I have seen cases where a node on which collateral was staked reach the most-reputed position in less than two days. Again, this defeats the point of the reputation system as that means you can simply start a large amount of nodes of which some could reach the most-reputed positions quite quickly.
The consequences of these problems is that reputation does not correlate very well with actual trustworthiness and revealing an out-of-consensus result is not very punishing if you are likely to leave the TRS in a short timespan anyway. With this post, I want to gauge the sentiment of everyone on how to tackle these two problems.
Analysis of the first problem
At the moment the turnover rate of (highly) reputed nodes is quite high. An analysis of time in the TRS reveals that the average node stays in it for only 0.9 days. If we filter out nodes which gained less than 500 reputation during a stay in the TRS, the average residency time increases to about 2.8 days. Over 90% of these nodes stay in the TRS for less than a week.
The reason for the high turnover rate is most likely the small
reputation_expire_alpha_diff
consensus constant. This constant is set at 20000 which means that reputation earned by participating in a data request expires 20000 witnessing acts in the future.During the month of July, the average number of witnessing acts per block was around ten. This means that if you earn reputation by revealing an in-consensus value, it will expire in about a day. Assuming we will see more data requests and more witnessing acts in the future, reputation will expire even faster. Obviously it is possible to earn new reputation during this period of time, but given how fast the average node leaves the TRS, a quick full reputation expiry seems to be the exception rather than the rule.
If the average node is prone to lose its reputation in less than a day, what stops said node to try and manipulate data requests? One could of course argue that it needs to post a collateral which it will lose if it is caught revealing an out-of-consensus answer. However, the reputation system is meant to serve as a second incentive and offer assurances to data requesters that reputed nodes have a vested interest in not manipulating data requests. Right now that is hardly the case.
Analysis of the second problem
To analyze problem two, I want to highlight a node which I recently saw reaching the spot of top-reputed quite fast. Note that I'm not attributing any malice to said node operator, up to now it has not revealed a single out-of-consensus value, I'm just using it as an example of what is wrong with the reputation system.
This node was funded in epoch 1333593. For the sake of this argument, it is safe to assume this was also the moment the node was started. It became the most-reputed node in the TRS with 3259 reputation at epoch 1336224, which is about 2600 epochs or less than a day and a half later. It stayed at that position for about 500 epochs or a quarter of a day. The reason it reached that position so fast is because it received two disproportionally large chunks of reputation at epochs 1336157 (1875 reputation) and 1336224 (1341 reputation). In both cases, this can be attributed to a large chunks of reputation expiring from many nodes combined with a couple of nodes being slashed due to revealing an out-of-consensus value.
The main problem is that this is not an isolated case. It happens quite a lot that a node gains a massive amount of reputation due to expired or slashed reputation being distributed to a small amount of nodes and this effect is constantly perpetuated and amplified over time. For example, the maximum reputation two nodes gained simultaneously is 20896 at epoch 1224707. Obviosuly this also catapulted those nodes to the top-reputed spots.
Potential solutions
I think both problems have a relatively simple (technical) solution.
The first problem could be tackled by increasing the
reputation_expire_alpha_diff
consensus constant. I think this constant needs to be doubled at least and probably quadrupled to offer nodes a fair chance to stay in the TRS for a longer time provided they are honest. As an added benefit, I would expect this to smoothen out the overall reputation curve to a more linear function instead of an exponential one.The solution to problem two seems relatively obvious to me. Instead of distributing expired and slashed reputation in a single (subsequent) epoch, we need to smooth out the distribution curve over multiple epochs. For example, if we distribute these large chunks uniformly over the next ten data requests being solved, it would most likely solve the problem that a very small subset of nodes earns it all simply because it is quite unlikely for a node to solve multiple data requests in a short amount of time.
Beta Was this translation helpful? Give feedback.
All reactions