Script to convert proto graphs to Beam #230

Shoeboxam · 2020-05-14T19:21:43Z

This could just be written in Python, although I would be interested in learning how the Beam communications layer works first.

mikephelan · 2020-05-27T14:59:46Z

Is it correct to understand that the input to Apache Beam is the group of templates found in whitenoise-core/validator-rust/prototypes/components ?

Shoeboxam · 2020-05-27T16:44:03Z

Yes! Each json file needs to have an equivalent Beam representation.

To be DP, there must be no two neighboring datasets (any two datasets that differ by one row) where the runtime succeeds on one dataset and fails on the other.

If a runtime does not implement a component, then it will fail on every dataset-- which is great, because this means runtimes (like beam) do not need to implement every component.

You can also ignore components that have no concrete implementation- like min, max, dp_xxx, to_xxx.

Shoeboxam · 2020-05-27T16:51:06Z

The overall signature is- given a computation graph, privacy definition and release, return a release.

About the overall function (let's call it, distribute_release), I can help with the portion that traverses the graph and calls into the lower level functions (next comment)

Shoeboxam · 2020-05-27T17:01:33Z

From my initial glance at Beam, you might expect each beam component you implement to take in a PCollection for each argument, and elementary python types for each option.

Each component implementation will need similar arguments as the rust runtime:

an option dict, equivalent to &self in rust
an arguments dict of PCollections
a privacy definition

The privacy definition can generally be ignored for now. The only relevant info inside the privacy definition are the user preferences to force constant time, constant memory, etc. We haven't matured to support that yet.

Let me know if you see issues with this proposed code structure!

mikephelan · 2020-05-29T15:01:24Z

Using the example of the analysis notebook, I'm going to pose an example here.
In the sixth code block, with the first comment line:

attempt 4 - succeeds!

there is an example using dp_mean and dp_variance.
For this case, would we be sending a DAG to Apache Beam, containing two nodes, one for dp_mean and one for dp_variance?

Shoeboxam · 2020-05-30T14:21:03Z

More likely, translate

materialize -> cast -> clamp -> impute -> resize -> (mean, variance)

Into a beam pipeline. I'm happy to add custom data loading components if beam doesn't make the same assumptions materialize does.

mikephelan self-assigned this May 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Script to convert proto graphs to Beam #230

Script to convert proto graphs to Beam #230

Shoeboxam commented May 14, 2020

mikephelan commented May 27, 2020

Shoeboxam commented May 27, 2020

Shoeboxam commented May 27, 2020 •

edited

Loading

Shoeboxam commented May 27, 2020 •

edited

Loading

mikephelan commented May 29, 2020

Shoeboxam commented May 30, 2020

Script to convert proto graphs to Beam #230

Script to convert proto graphs to Beam #230

Comments

Shoeboxam commented May 14, 2020

mikephelan commented May 27, 2020

Shoeboxam commented May 27, 2020

Shoeboxam commented May 27, 2020 • edited Loading

Shoeboxam commented May 27, 2020 • edited Loading

mikephelan commented May 29, 2020

attempt 4 - succeeds!

Shoeboxam commented May 30, 2020

Shoeboxam commented May 27, 2020 •

edited

Loading

Shoeboxam commented May 27, 2020 •

edited

Loading