Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft bugfix: panic null value in dataset #91

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

Chao-Ma5566
Copy link

No description provided.

@Chao-Ma5566 Chao-Ma5566 linked an issue Apr 22, 2024 that may be closed by this pull request
@Chao-Ma5566 Chao-Ma5566 changed the title Draft bugfix: panic null value in dataset bugfix: panic null value in dataset Apr 24, 2024
Copy link
Collaborator

@youen youen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal for additional data processing step

I propose adding an additional data processing step before constructing the tree in the SIGO workflow. Currently, the workflow follows this pattern:

  1. Input -> Tree Construction -> Tree
  2. Tree -> Aggregation -> Output

I suggest the following modification:

  1. Input -> Data Validation -> Validated Table
  2. Validated Table -> Tree Construction -> Tree
  3. Tree -> Aggregation -> Output

This adjustment offers several advantages:

  1. Isolation of Error Handling: By separating the data validation step, we ensure that the rest of the processing is not cluttered with error handling logic. This promotes cleaner and more focused code for each stage of the workflow.

  2. Early Pre-processing: Introducing a pre-processing step allows us to address data integrity issues upfront, improving the overall quality of the data fed into subsequent stages of the workflow.

I believe this change will enhance the maintainability and robustness of the SIGO system. Looking forward to your feedback and discussion on this proposal.

less := func(i, j int) bool {
valueI, err := n.cluster[i].QuasiIdentifer()
if err != nil {
// Stocker l'erreur dans la variable globale
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use English

Copy link
Author

@Chao-Ma5566 Chao-Ma5566 Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding a step Data Validation is a better way to handling error. I will revert to the first commit just keep the venom test.

I propose adding an new interface dataValidator in case we use other type than float64 in the future. After by default we use float64DataValidator after source created to valide input data then we can focus in other step of the workflow.

@Chao-Ma5566 Chao-Ma5566 changed the title bugfix: panic null value in dataset Draft bugfix: panic null value in dataset Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: null value in dataset
2 participants