-
-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of Splay Tree in crdt.Text data structure to prevent skewness #941
Comments
Regarding the above, I would like to share what @justiceHui previously thought and experimented with. How to prevent the structure of the splay tree from becoming a skew treeTo maintain the structure of the splay tree and prevent it from becoming a skew tree, it would be a good idea to use a method of randomly selecting about 10 vertices and splaying them every time an element is accessed about 100 times. Implementing this function is not difficult if you utilize existing implementations. He shared the results of testing a scenario in which five operations {insert_after, insert_before, order_of_node, erase, erase_range} are randomly performed and a scenario in which only insert_after is performed using code written in C++. The source code is from here, and the data used in the experiment is from here. As a result of randomly selecting and splaying one vertex every 500 operations, the number of rotations (sum.rot) performed in 260,000 operations have increased by 7%, and the maximum number of rotations performed in individual operations (mx.rot) have decreased by 66%. |
I tried the above code with a random seed, but the performance worsened on average. I experimented with incorporating the idea from this paper into @justiceHui's approach by adding a probabilistic splay of the parent vertex. In my opinion, the paper's deterministic method was not very effective. I used Scheme 2 from the paper because it yielded the best results and was easy to implement. I compared @justiceHui's method, the paper's method, and a hybrid of the two. When I tried the above code with the random seed, the performance worsened on average. So, I ran the experiments multiple times with different seeds and averaged the outcomes. The source code is available here, and the data used in the experiment is the same as mentioned above. The results showed that randomly selecting and splaying one vertex every 600 operations, combined with splaying the parent vertex with a 1/2 probability, led to a 6% increase in the total number of rotations (sum.rot) over 260,000 operations. However, the maximum number of rotations performed in individual operations (mx.rot) decreased by 60%. |
I discovered a bug in my code and, after fixing it and re-running the experiments, found that method3 did not significantly outperform method1. In fact, sometimes it performed worse. |
Through testing, we have confirmed that adding one splay every 500 splays is a good approach for 260,000 operations. However, since we have not experimented with fewer or more operations, it would be beneficial to conduct a few more experiments before starting the implementation. For example, we can test various functions, such as adding one additional splay for every If one function proves to be significantly better, we can try adjusting the parameters dynamically based on the cumulative number of splay operations. For instance, if we decide to add one splay operation every |
I conducted tests following your suggestions by generating data of various sizes using slicing or concatenating edit-trace data. The experiment results indicated that for sizes less than I suspect that having only one experimental dataset and not being able to create it properly might have resulted in inaccurate conclusions. To obtain more accurate results, do you know of any ways to generate larger datasets or any existing datasets that can be used for this purpose? |
Experiment with a New Splay MethodIn this experiment, we tested a new approach by splaying the leaf node that is furthest from the root, instead of a random node. The rationale behind this approach is that splaying the deepest node would naturally reduce skewness more effectively compared to selecting a random node. Although finding the deepest node takes longer than selecting a random one, the time complexity remains amortized Following the format of previous experiments, we conducted the The experimental results showed significant performance improvements for max rotation and max height, with nodes less than However, for node sizes greater than Questions
|
If you don't have any other ideas, I'd like to implement and bench it, is it okay? |
Thank you for sharing your experiment. About
|
What would you like to be added:
Currently, Splay trees are used in the
crdt.Text
data structure. While Splay trees are efficient when performing operations sequentially in a contiguous range, they suffer from the drawback that sequentially inserted elements are linearly arranged on the tree, leading to it becoming a skew tree.In a tree with N vertices, performing M operations guarantees performance in O((N+M) log N) time. However, if the tree becomes skewed, each operation may take O(N) time initially before returning to faster performance.
This issue may lead to performance degradation in the
crdt.Text
data structure due to the skewness of the Splay tree.It is essential to explore methods to prevent skewness that could affect the performance and efficiency of the Text data structure.
Why is this needed:
To maintain and enhance the performance and efficiency of the
crdt.Text
data structure in the face of potential skewness issues in the Splay trees.The text was updated successfully, but these errors were encountered: