v0.2.0 #141

cristianberneanu · 2021-09-01T13:26:32Z

cristianberneanu
Sep 1, 2021
Maintainer

Setup packages are NOT signed.

Version 0.2.0

Added UI feedback on data export.
Added support for column generalization.
Removed anonymized count value in low count rows.
Added relative noise column in combined view.
Added side navigation in Notebooks.
Sets a default path on data export.

This discussion was created from the release v0.2.0.

fjab · 2021-09-01T15:44:30Z

fjab
Sep 1, 2021
Collaborator

I really, really like this 👍

Couple comments:

In generalisation, it's unclear when it's enough to fill one field and when it isn't.

Filling "substring start" but not "substring length" leads to no generalisation, whereas
filling "substring length" but not "substring start" works with substring start = 1 as default (suggestion: default values could be shown in light grey if the box is not filled out).

Bug with substrings: if (substring length + substring start) > (length of string), it throws me an "Anonymization Error".

For the Anonymization Summary, it occurs to me that the distribution of distortions would be useful as well. Right now I see the average and the maximum distortion, but I can't really tell if there are just a few rows with really high distortion or how this comes together. Sorting by "Count noise" only helps if there are relatively few rows.

The above also shows: We need unified terminology here... what is it now, distortion or noise?

0 replies

cristianberneanu · 2021-09-01T15:56:04Z

cristianberneanu
Sep 1, 2021
Maintainer Author

The above also shows: We need unified terminology here... what is it now, distortion or noise?

In my mind, they are slightly different, noise makes me think of the SD of the generator and distortion makes me think of the final output value.

2 replies

fjab Sep 2, 2021
Collaborator

That may be the right way of thinking about it, but looking at our interface, that is absolutely not clear:

"Maximum distortion" = highest "Count Noise"

edongashi Sep 2, 2021
Maintainer

This appears to be inconsistent. We'll get back to it once we do the stats properly.

cristianberneanu · 2021-09-01T16:00:20Z

cristianberneanu
Sep 1, 2021
Maintainer Author

Sorting by "Count noise" only helps if there are relatively few rows.

Sorting is not very relevant because we only show a subset of the output values (even after we'll make the summary accurate).

To improve sorting utility, we can either:

Select top/bottom X values for each column when sampling the output.
Offload sorting to the anonymization service to always get the actual top values for a column.

1 reply

fjab Sep 2, 2021
Collaborator

Good point.

I still maintain that we should just take a randomised sample (of 1000 rows or whatever) instead of the head. Then we don't have to do any other magic – top / bottom X rows may be the right approach for this specific problem, but likely not always.

cristianberneanu · 2021-09-02T13:21:42Z

cristianberneanu
Sep 2, 2021
Maintainer Author

What do you think about changing "Substring start / Substring length" to something shorter, like "Text start / Length"?

1 reply

fjab Sep 2, 2021
Collaborator

I have no problem with that. Although I also don't feel strongly either way; it never seemed too long to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0 #141

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

v0.2.0 #141

cristianberneanu Sep 1, 2021 Maintainer

Version 0.2.0

Replies: 4 comments · 4 replies

fjab Sep 1, 2021 Collaborator

cristianberneanu Sep 1, 2021 Maintainer Author

fjab Sep 2, 2021 Collaborator

edongashi Sep 2, 2021 Maintainer

cristianberneanu Sep 1, 2021 Maintainer Author

fjab Sep 2, 2021 Collaborator

cristianberneanu Sep 2, 2021 Maintainer Author

fjab Sep 2, 2021 Collaborator

cristianberneanu
Sep 1, 2021
Maintainer

Replies: 4 comments 4 replies

fjab
Sep 1, 2021
Collaborator

cristianberneanu
Sep 1, 2021
Maintainer Author

fjab Sep 2, 2021
Collaborator

edongashi Sep 2, 2021
Maintainer

cristianberneanu
Sep 1, 2021
Maintainer Author

fjab Sep 2, 2021
Collaborator

cristianberneanu
Sep 2, 2021
Maintainer Author

fjab Sep 2, 2021
Collaborator