-
Notifications
You must be signed in to change notification settings - Fork 556
Recordinality implementation #101
base: master
Are you sure you want to change the base?
Recordinality implementation #101
Conversation
positiveblue
commented
Jan 22, 2016
- I have implemented Recordinality, a stream algorithm. Recordinality class extends from ICardinality and Serialize. You should check the serialize process and the merge (return an exeption because Recordinality doesn't support it).
- I have implemented some tests for the Recordinality class.
- ReadMe modified, now with Recordinality paper information.
The main algorithm is working. Recordinality extends from ICardinality and Serialize but we need to performance some TODOs such as serialize, merge etc... There are a new test class for Recordinality with three tests.
The algorithm and implementation look broadly correct, but there's some formatting and minor implementation details that would be nice to be addressed. eg. If merge isn't supported, it is fine to just throw an unsupported operation exception. One note though: I'm not opposed to including even if only for completionist or academic reasons (there seems to be a variety of suggested areas of further research), however, it seems appropriate document the class accordingly. |
I don't mind handling such mundane things directly, so if you'd rather, I can just pull and then clean it up myself. Just lmk |
Hello Ian, Thanks for taking a look at my code. I have written a basic version of Recordinality. The main reason has been that this pull request is to learn how the library work, which is the process to contribute, etc... As you have said, it's possible to run the algorithm without hash the item before and it could be useful for strings for example. However, there are more Recordinality variations that could be interesting.For instance, Adaptive Recordinality (AR) has the feature that let them increase the memory usage if it's necessary. If you don't know which is te maximum cardinality you can run AR with only one position and let them increase if it's necesary. I'm going to address the format and the merge 'errors' and after that will resubmit the pull request. After this, I will iterate more times adding some features (if it's right). Thanks. |
Merge function throw an UnsupportedOperationException. Added an Ignore decorator to one test.
Well, I have reformated the code and added the UnsoprttedOperationException() to the merge function. If there are some problems with this implementation let me know. As I told, I will implement the adaptive version in a near future. Thanks. |
We use intellij (long live it's glory!), so we mostly just have a formatting style we import/export to share. Deviating from it is fine, but it usually covers most minor issues in an unobjectionable manor. I'll make sure a recent version is available somewhere -- at that point you can just import settings from the jar and it should properly create an "AddThis" codestyle without interfering with other things. |
As a quick kinda hint, generally the start of a file goes "copywriter/license headers", "package/imports", "class javadoc". I always have intellij fold (aka kinda hide) those first two components, so it works out fine in practice too. |
(That said, today is kind of tightly scheduled -- think I'm already late for things -- so you might not see results from my promises until tomorrow) |
Looks like all tests are passing. Can you take a look at an example file for formatting and headers? https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/ConcurrentStreamSummary.java The copyright should be at the top of the doc (before package)... thanks! |
I will recheck the headers this evening (PTZ). |
Thanks, also suggest moving this to the same 'research' package mentioned for the HyperBitBit since it doesn't seem likely that this would be a preferable option to HLLP in production uses. Let me know if you feel differently. |
I agree with your suggestion. Anyway, let me reimplement Recordinality and open some discussion on the mailing list because I think it has some "killer features" that could be useful. |