-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft bugfix: panic null value in dataset #91
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal for additional data processing step
I propose adding an additional data processing step before constructing the tree in the SIGO workflow. Currently, the workflow follows this pattern:
- Input -> Tree Construction -> Tree
- Tree -> Aggregation -> Output
I suggest the following modification:
- Input -> Data Validation -> Validated Table
- Validated Table -> Tree Construction -> Tree
- Tree -> Aggregation -> Output
This adjustment offers several advantages:
-
Isolation of Error Handling: By separating the data validation step, we ensure that the rest of the processing is not cluttered with error handling logic. This promotes cleaner and more focused code for each stage of the workflow.
-
Early Pre-processing: Introducing a pre-processing step allows us to address data integrity issues upfront, improving the overall quality of the data fed into subsequent stages of the workflow.
I believe this change will enhance the maintainability and robustness of the SIGO system. Looking forward to your feedback and discussion on this proposal.
pkg/sigo/kdtree.go
Outdated
less := func(i, j int) bool { | ||
valueI, err := n.cluster[i].QuasiIdentifer() | ||
if err != nil { | ||
// Stocker l'erreur dans la variable globale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use English
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, adding a step Data Validation is a better way to handling error. I will revert to the first commit just keep the venom test.
I propose adding an new interface dataValidator
in case we use other type than float64
in the future. After by default we use float64DataValidator
after source created to valide input data then we can focus in other step of the workflow.
No description provided.