Clamp #1

silviacasac · 2021-07-08T15:25:11Z

First pull request for the Clamp transformation, which corresponds to this GitHub permalink.

It has undergone a first correction by Prof. Vadhan on June 22 and a first correction by Mike on July 1 and 7. All these corrections have been implemented. The pseudocode has not been changed to include Andy's comment on July 1 on adding the domains and metrics as declarative statements in the preconditions. (Edit: the updated version now includes said preconditions)

Files:

clamp.tex -- the LaTeX document with the proof.
clamp.pdf -- the PDF generated from the TeX document.
clamp_rust.png -- the screenshot of the corresponding Rust code.

Update: July 12, 2021

Resolved comments by Phil and Mike above.
Added the preconditions (and taken them out from the pseudocode) as discussed last week. We also decided to separate the preconditions into "user-specified types" and the 4 "input metric, output metric, input metric, output metric" that all transformations will have. We also decided to add a postcondition to deal with the fallible case.
Changed floats to non-null floats after Mike's comment on total ordering not working for possibly null floats.
Changed the statement of Theorem 1 to make it as general as possible (i.e., "under input_metric" instead of "under symmetric distance" so that the Theorem statement can be re-used in the proofs. Changed "appropriately quantified" to specifying the types of din, dout; which is what it meant.
Changed some of the "rustiness" expressions that were left in the proof after the preconditions explained "in words" change (e.g., delete VectorDomain(IntervalDomain(L, U)).
Deleted the section "Conditions as specified in the pseudocode" because it is repetitive with the new preconditions. There is also no equation form for the stability relation and I refer directly to the pseudocode lines.
Added table of contents

Further questions on the new version:

Type sign L and U as it is now (def MakeClamp(L: T, U: T) or add that L and U have type T in the preconditions?
Move the if L>U: raise Exception to the preconditions? a) But this would not be a compile time error. b) Mike said that the Rust construction for IntervalDomain is enforced to check that L<=U, so do I need to check this condition in the pseudocode and raise an exception if IntervalDomain(L, U) is in the preconditions?
We were discussing to maybe define symmetric distance for vectors (as opposed to multisets) so that we can drop having to write MultiSet(v) every time in the proofs.

Update: July 14, 2021

Domains and metrics are still listed as preconditions (pending a double-check with @andrewvyrros), but they are likely to be put back into the code, as discussed with @SalilVadhan on July 13. Main issue: the declaration of certain domains requires variables that are only defined in the pseudocode.
Added another part to the theorem statement (which should, from now on, appear in all of the proofs): "2. Domain-metric compatibility". As discussed on July 13, it is necessary to ensure that the input domain is part of the list of domains included in the definition of said metric. This usually follows directly from our list of definitions, but some later proofs might need to include some checking in the code.
With Mike we decided not to do inf_cast on d_in given that d_in and d_out have the same type (namely u32).

Update: September 6, 2021

All the revisions suggested in this pull request have been implemented.
The major change constitutes the simplification of the stability guarantee proof by means of defining a row transform and proving their general stability lemma, which is now included in the "List of definitions used in the proofs". For now, we are only using the row transform as a proof strategy and is not in the code, and hence there is no discussion about pure functions in this documents. We wanted to separate the lemmas that relate to row transform as a proof strategy as opposed to the actual transformation that would be called in the code.

raprasad

Looks good!

pdurbin

Good stuff. The math is beyond be but I left a few comments.

clamp/clamp.tex

SalilVadhan · 2021-07-19T13:24:36Z

clamp/clamp.tex

+\begin{itemize}
+    \item \textbf{User-specified types:}
+    \begin{itemize}
+        \item Type \texttt{T} must have trait \texttt{TotalOrd}.\footnote{For now, the OpenDP library only implements \texttt{PartialOrd}, but \texttt{TotalOrd} will soon be implemented.}


add a cross-reference to the pseudocode_defs file so that the reviewer knows where to find the def of the trait TotalOrd.

We have been meaning to add cross-references between the tex files, but it didn't quite work (tex-wise) -- we have to look again into it. But in any case, are you then saying that we should cross-reference every single term that appears defined in pseudocode_defs and proof_defs? So what I mean is that it is not just TotalOrd, but also many other terms in the document

SalilVadhan · 2021-07-19T20:46:10Z

clamp/clamp.tex

+The current OpenDP library contains the \texttt{make\_clamp\_vec} function implementing the clamping function. This is defined in lines 25-38 of the file \texttt{manipulation.rs} in the Git repository\footnote{As of June 16, 2021. Since then, the code has been updated to include a more general clampable domain, which is not yet finished.} (\url{https://github.com/opendp/opendp/blob/58feb788ec78ce739caaf3cad8471c79fd5e7132/rust/opendp/src/trans/manipulation.rs#L25-L38}).
+
+\begin{figure}[ht]
+    \includegraphics[width=16cm]{clamp_rust.png}


is this an old version of the code? still has PartialOrd in it.

Also it looks like it doesn't enforce that M is SymmetricDistance.

No, TotalOrd is still not implemented in the OpenDP code. Mike said it is in the to do list, but it has not yet been changed, which is why PartialOrd still appears in the pseudocode. Still, I do the proof with TotalOrd because Mike said it will for sure be updated in the near future.

Got it, thanks

I haven't made the change to Ord a priority because it hasn't been clear to me if this is what we want for the library. I wrote a bit more about this in an issue here: opendp/opendp#203

I'd be happy to make the changes once it's clear that we want to adopt noisy-float newtypes to strengthen this trait bound.

(Discussed on the architecture meeting on August 3rd)

Until the code has been fixed, the proof document should say explicitly that the proof is being given assuming Ord even though the Rust code only requires PartialOrd, so the Rust code is not quite correct yet. In general, any gaps like this should be highlighted clearly at the beginning of the document under a section titled something like "Remaining Gaps between Proof and Code".

It is a good idea to have such a section -- but TotalOrd is fixed now, so I deleted it

clamp/clamp.tex

SalilVadhan · 2021-07-19T20:55:01Z

clamp/clamp.tex

+            return max(min(x, U), L)
+        return list(map(clamp, data)) |\label{line:map}|


where can I find definitions of max, min, list, map? Are these built-in Rust functions?

Max and min are defined in the pseudocode definitions file (see Definitions 4.17 and 4.18). Map and list are built-in functions, but I can add them to the pseudocode definitions file. In the case of clamp, the Rust code does not directly use a list, but talking to Mike we settled on that the best Python-Rust blend term for this line would be to use a list, instead of the Rust .collect() function that appears in the code.

I do think it would be good to add map and state in the pseudocode defs file that it is representing the Rust function collect. Actually doesn't map by definition return a list? So what is the list(.) function doing?

nvm, I see my questions are answered below- but the fact that Grace and I had the same question reinforces that it would be good to add defs of map and list into the pseudocode_defns file

Yes, they are added now

clamp/clamp.tex

SalilVadhan · 2021-07-19T21:33:23Z

clamp/clamp.tex

+\textbf{(Domain-metric compatibility).}  For \texttt{MakeClamp}, both cases correspond to showing that \texttt{VectorDomain(T)} is compatible with symmetric distance. This follows directly from the definition of symmetric distance, as stated in \href{https://www.overleaf.com/project/60d214e390b337703d200982}{``List of definitions used in the proofs"}. The \textit{appropriate output domain property} shown above ensures that \texttt{output\_domain} is \texttt{VectorDomain(T)}, and hence also compatible with the \texttt{output\_metric} symmetric distance.
+
+\smallskip
+\textbf{(Stability guarantee).} Throughout the stability guarantee proof, we can assume that $\function(v)$ and $\function(w)$ are in the correct output domain, by the \textit{appropriate output domain property} shown above. 


I think the following will give a simpler proof. Whenever we have a function from Vector Domains to Vector Domains of the form function(data)=list(map(f,data)), then h_{function(v)}(z) = sum_{y in f^{-1}(z)) h_v(y). And then apply triangle inequality and that the sets f^{-1}(z) form a partition of the domain of f. This will avoid a need to do a case analysis, and we can use the same proof for all of transformations of this form. In fact, this can be a general lemma in the proof_defns file that we cite, or a proof for "make_rowtransform" (which I believe that Mike may have written, but which will require a precondition on the function f).

the requirement is that f should be a "pure" function, i.e. a function in the standard mathematical sense, not a piece of code that has side effects or can access some global state.

I am typing up proof of stability relations for row by row transformations. Would it make more sense for this to be in the list of definitions in proofs or a separate transformation proof for make_row_by_row?

Decided on August 4: clamp will not include this simplified row by row proof until the Rust code for clamp actually calls the make_row_by_row transformation inside of it. But see e.g., is_equal for a use case of the make_row_by_row transformation.

clamp/clamp.tex

SalilVadhan · 2021-07-19T21:39:04Z

clamp/clamp.tex

+The current OpenDP library contains the \texttt{make\_clamp\_vec} function implementing the clamping function. This is defined in lines 25-38 of the file \texttt{manipulation.rs} in the Git repository\footnote{As of June 16, 2021. Since then, the code has been updated to include a more general clampable domain, which is not yet finished.} (\url{https://github.com/opendp/opendp/blob/58feb788ec78ce739caaf3cad8471c79fd5e7132/rust/opendp/src/trans/manipulation.rs#L25-L38}).
+
+\begin{figure}[ht]
+    \includegraphics[width=16cm]{clamp_rust.png}


Also it looks like it doesn't enforce that M is SymmetricDistance.

SalilVadhan · 2021-07-20T21:13:29Z

clamp/clamp.tex

+The current OpenDP library contains the \texttt{make\_clamp\_vec} function implementing the clamping function. This is defined in lines 25-38 of the file \texttt{manipulation.rs} in the Git repository\footnote{As of June 16, 2021. Since then, the code has been updated to include a more general clampable domain, which is not yet finished.} (\url{https://github.com/opendp/opendp/blob/58feb788ec78ce739caaf3cad8471c79fd5e7132/rust/opendp/src/trans/manipulation.rs#L25-L38}).
+
+\begin{figure}[ht]
+    \includegraphics[width=16cm]{clamp_rust.png}


Got it, thanks

clamp/clamp.tex

gracetian6 · 2021-07-23T20:57:52Z

clamp/clamp.tex

+\textbf{(Appropriate output domain).} In the case of \texttt{MakeClamp}, this corresponds to showing that for every vector $v$ of elements of type \texttt{T}, $\function(v)$ is a vector of elements of type \texttt{T} which are contained in the interval \texttt{[L, U]}. For that, we need to show two things: first, that \texttt{function(v)} has type \texttt{Vec(T)}.
+Second, that they belong to the interval \texttt{[L, U]}.
+
+Firstly, that $\function(v)$ has type \texttt{Vec(T)} follows from the assumption that element $v$ is in \texttt{input\_domain} and from the type signature of \texttt{function} in line~\ref{line:fn} of the pseudocode (Section~\ref{sec:pseudocode}), which takes in an element of type \texttt{Vec(T)} and returns an element of type \texttt{Vec(T)}. If the Rust code compiles correctly, then the type correctness follows from the definition of the type signature enforced by Rust. Otherwise, the code raises an exception for incorrect input type. 
+
+Secondly, we need to show that the vector entries belong to the interval \texttt{[L, U]}. This follows from the definition of \texttt{clamp} in line \ref{line:clamp}. According to line \ref{line:clamp} in the pseudocode, there are 3 possible cases to consider:


very well organized; it really helps that you broke down each case to consider with firstly, secondly, etc.

gracetian6 · 2021-07-23T21:04:57Z

clamp/clamp.tex

+        def clamp(x: T) -> T: |\label{line:clamp}|
+            return max(min(x, U), L)
+        return list(map(clamp, data)) |\label{line:map}|


Why do we use list()? By definition of map, it should apply to clamp function elementwise to the entire input vector. https://en.wikipedia.org/wiki/Map_(higher-order_function)#:~:text=In%20many%20programming%20languages%2C%20map,when%20considered%20in%20functional%20form.

Nvm, Silvia explained that the reaosn is because

Mike: “2. returning a map has a slightly different meaning. list(map(...)) or [clamp(L, U, v) for v in arg] is more accurate. In rust, the .collect exhausts the iterator into a vector. So the analogue here is list or the list comprehension (not a generator comprehension). It would be cool if we chained iterators, but that makes other things more complicated (edited)”

Right so we want to return a vector (for appropriate output domain) hence we use the Python list, not a map

typos

Shoeboxam · 2021-08-02T06:48:05Z

clamp/clamp.tex

+            return max(min(x, U), L)
+        return list(map(clamp, data)) |\label{line:map}|
+
+    return Transformation(input_domain, output_domain, function, input_metric, output_metric, stability_relation = Relation)


Why the special naming "Relation" instead of "stability_relation"? The code makes sense. Are you guys planning on switching type hints to the type hint syntax? I prefer the typing syntax for hints as it looks less like a function invocation. It's a pretty minor pick, basically Vec(T) -> Vec[T]. Also, are we keeping Vec from rust or using python's list in the pseudocode? The iterator collection uses list, but the type is a Vec. I would change one or the other to be consistent.

Regarding the special naming of "Relation": I mainly did this to save space in Section 2. Proof when proving the stability relation, so that only "Relation" appears in the text instaed of "stability_relation". And also so that the theorem statement for the stability/privacy guarantee looks the same for measurements, where we would have privacy_relation = Relation. But if you think this is not enough to change it I can add stability_relation back, no problem.

About the type hint syntax, are you saying that the correct function type signing for clamp is def function(data: Vec(T)) -> Vec[T]? How so? So initially we had Vec(T) -> Vec(T) but you said that this was not correct, and so we introduced the rule "Parentheses, (), are used to create an instance of a domain. Square brackets, [], are used to describe the type of a domain. For example, AllDomain(i8) is the domain of all values of type i8. However, the type of AllDomain(i8) -- the domain itself, not the elements of the domain -- is AllDomain[i8]; note the square brackets. This typing style is inspired by the notation used in Python; see \url{https://docs.python.org/3/library/typing.html}." (this is now included in the pseudocode definitions).

Regarding Vec vs list: so I thought we wanted the pseudocode to look as Python as possible, which is why I wrote it in list format. But then it seemed not correct to change the input and output domains, because the VectorDomain formulation is important and appears everywhere, and also for showing correct output domain. When you say you would change either or the other are you proposing to either: 1. Change line 148 to return Vec(map(clamp, data)) and keep the domains as they are, or 2. To keep line 148 as it is and change the domains to something different? Approach 2 seems worse I think, because that is inventing new domains that don't exist in the Rust library. And could we not just say that list has type Vec in the list of definitions to resolve this? But perhaps approach 1 is better then, although less Python

I think the part throwing me is that camel-case code in python usually represents classes. I have a soft preference towards relation or stability_relation.

No, the arrow in my statement was meant to communicate "replace Vec(T) with Vec[T]". The current syntax in the pseudocode is def function(data: Vec(T)) -> Vec(T):. The revised syntax would be def function(data: Vec[T]) -> Vec[T]:. The type hints should use the type syntax, because you're not making instances of a Vec within the signature.

You're right, there's no win here. I think we leave Vec/list inconsistent unless someone else has strong feelings.

That makes sense, I had not considered the implications of having it capitzalied. I will change all of them to relation then.

Ah okay, I thought the TeX had already been changed to Vec(T), but it is fixed now.

Okay!

Shoeboxam · 2021-08-02T07:11:19Z

clamp/clamp.tex

+    input_metric = SymmetricDistance()
+    output_metric = SymmetricDistance()
+
+    def Relation(d_in: u32, d_out: u32) -> bool: |\label{line:rel}|


Will switching to IntDistance cause headaches elsewhere? I'm thinking no, but sometimes making things more flexible complicates things. IntDistance is just a type alias, the user can't change it, but having it behind an alias makes it easy to change across the library. The naming could be adjusted, as always. It might make sense to add an explicit list of types IntDistance can be- basically all primitive int types.

Oh right you had changed the associated type of symmetric distance to IntDistance, right? But then set IntDistance = u32 -- so you say to add an explicit list of types IntDistance can be, but isn't it only used for u32 in symmetric distance no? And are you saying that I should change the above to def Relation(d_in: IntDistance, d_out: IntDistance)? Or perhaps not yet because it does not appear in the clamp Rust code?

Okay I think you are saying to write def Relation(d_in: IntDistance, d_out: IntDistance) instead, but then should IntDistance = u32 be specified anywhere in the clamp pseudocode or do we add the explicit list of types for IntDistance in the definitions for the pseudocode document?

It would be nice to pull this u32 constant out in favor of an alias, but it does seem to cause headaches--

The name is much longer

We use all sorts of traits/methods on IntDistance that aren't qualified in the trait bounds because it's a type alias, not a generic. I wouldn't want to have to pull these details down into proofs everywhere.

I vote we don't bother making changes at this point. If we did, I would advocate:

a better alias name

the alias definition only makes a definitive statement that it is defined to be u32

include a weaker statement that any type from a list of int types should be isomorphic, but not give explicit guarantees.

Maybe a better approach is to just add the statement in (3) to the definition of SymmetricDistance, and leave the proofs as-is.

Okay, for now I will leave u32 then.

silviacasac · 2021-08-05T21:27:29Z

Remaining things to double-check (all minor):

Vec vs list
IntDistance vs u32
Vec(T) vs Vec[T]

Will require future edits if make_row_by_row is added to the clamp Rust code in the library.

Edit: the remaining things have been double-checked.

SalilVadhan · 2021-08-06T14:16:39Z

clamp/clamp.tex

+\section{Algorithm Implementation}
+
+\subsection{Code in Rust}
+The current OpenDP library contains the \texttt{make\_clamp\_vec} function implementing the clamping function. This is defined in lines 25-38 of the file \texttt{manipulation.rs} in the Git repository\footnote{As of June 16, 2021.}  (\url{https://github.com/opendp/opendp/blob/58feb788ec78ce739caaf3cad8471c79fd5e7132/rust/opendp/src/trans/manipulation.rs#L25-L38}).


comment on the Rust code (should probably be a comment on manipulation.rs): 1-stability and Lipschitz constant of 1 are the same thing (so "as well as" should be "i.e.")

SalilVadhan · 2021-08-06T14:21:15Z

clamp/clamp.tex

+\begin{figure}[ht]
+    \includegraphics[width=16cm]{clamp3.png}
+    \centering
+    \label{fig:code}
+\end{figure}
+
+\begin{figure}[ht]
+    \includegraphics[width=16cm]{clamp4.png}
+    \centering
+    \label{fig:code}
+\end{figure}


why include this code if the proof is only going to be for make_clamp_vec?

I took it out

SalilVadhan · 2021-08-06T14:33:50Z

clamp/clamp.tex

+
+        \item \textup{(Domain-metric compatibility).} The domain \texttt{input\_domain} matches one of the possible domains listed in the definition of \texttt{input\_metric}, and likewise \texttt{output\_domain} matches one of the possible domains listed in the definition of \texttt{output\_metric}.
+
+        \item \textup{(Stability guarantee).} For every pair of elements $v, w$ in \texttt{input\_domain} and for every pair $(\din, \dout)$, where $\din$ is of the associated type for \texttt{input\_metric} and $\dout$ is the associated type for \texttt{output\_metric}, if $v,w$ are $\din$-close under \texttt{input\_metric} and $\Relation(\din, \dout) = \True$, then $\function(v), \function(w)$ are $\dout$-close under \texttt{output\_metric}.


minor nit "dout is the associated type" -> "dout is of the associated type" (actually "dout has the associated type" reads more nicely)

SalilVadhan · 2021-08-06T14:39:12Z

clamp/clamp.tex

+\subsection{Symmetric Distance}
+\begin{theorem}
+    For every setting of the input parameters \texttt{(L, U)} to \texttt{MakeClamp} such that the given preconditions
+    hold, the transformation returned by \texttt{MakeClamp} has the following properties:


shouldn't we say "MakeClamp raises a exception (at compile time or run time) or returns a valid transformation."? (And then we can/should put the def of what constitutes a valid transformation in either the pseudocode_defns or proof_defns file (not sure which).)

SalilVadhan · 2021-08-06T14:40:57Z

clamp/clamp.tex

+\end{enumerate}
+In all three cases, the returned value of type \T is contained in the interval \texttt{[L, U]}. Hence, the vector $\function(v)$ returned in line~\ref{line:map} of the pseudocode is an element of \texttt{output\_domain}.
+
+Lastly, the necessary condition that \texttt{L} $\leq$ \texttt{U} is checked when declaring \texttt{output\_domain = VectorDomain(IntervalDomain(L, U))} in line~\ref{line:output} of the pseudocode. This check already exists via the construction of \texttt{IntervalDomain}, which returns an error if \texttt{L} $>$ \texttt{U}. Both \texttt{L} and \texttt{U} have type \texttt{T} by the type signature of \texttt{MakeClamp}. Both the definition of \texttt{IntervalDomain} and that of the \texttt{clamp} function (line~\ref{line:clamp} in the pseudocode, which uses the \texttt{min} and \texttt{max} functions) require that \texttt{T} implements \texttt{TotalOrd}, which holds by the preconditions.


I think the fact that IntervalDomain raises an exception when L>U should be stated at the beginning of the proof, before item 1. Otherwise much of what follows doesn't make sense (e.g. case analysis above for clamp).

SalilVadhan · 2021-08-06T14:43:21Z

clamp/clamp.tex

+as we wanted to show.
+\end{proof}
+
+\silvia{Flag: will change to the simplified general proof scheme for row-by-row transformations that Prof. Vadhan suggested in the Git repo once the \texttt{make\_row\_by\_row} transformation is included as part of the clamp transformation code.}


I think it's worth doing that here in this proof even prior to make_row_by_row. It is much simpler and shorter than what's above.

silviacasac added 6 commits July 8, 2021 16:39

Add files via upload

4e5e293

Delete Privacy Proofs for OpenDP_ Clamping.zip

5c40d00

Create thing

d5fe4c1

Add files via upload

391aab5

Delete thing

9d768f7

Add files via upload

b8994df

silviacasac requested review from pdurbin, ecowan, Shoeboxam, andrewvyrros, gracetian6 and cwagaman July 8, 2021 15:25

silviacasac self-assigned this Jul 8, 2021

raprasad approved these changes Jul 8, 2021

View reviewed changes

pdurbin reviewed Jul 8, 2021

View reviewed changes

clamp/clamp.tex Outdated Show resolved Hide resolved

clamp/clamp.tex Outdated Show resolved Hide resolved

clamp/clamp.tex Outdated Show resolved Hide resolved

clamp/clamp.tex Outdated Show resolved Hide resolved

silviacasac added 2 commits July 8, 2021 18:17

Add files via upload

6283081

Add files via upload

c48d68a

silviacasac requested a review from SalilVadhan July 14, 2021 09:22

Add files via upload

bc6a67a