-
Notifications
You must be signed in to change notification settings - Fork 1
Clamp #1
base: main
Are you sure you want to change the base?
Clamp #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff. The math is beyond be but I left a few comments.
clamp/clamp.tex
Outdated
\begin{itemize} | ||
\item \textbf{User-specified types:} | ||
\begin{itemize} | ||
\item Type \texttt{T} must have trait \texttt{TotalOrd}.\footnote{For now, the OpenDP library only implements \texttt{PartialOrd}, but \texttt{TotalOrd} will soon be implemented.} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a cross-reference to the pseudocode_defs file so that the reviewer knows where to find the def of the trait TotalOrd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have been meaning to add cross-references between the tex files, but it didn't quite work (tex-wise) -- we have to look again into it. But in any case, are you then saying that we should cross-reference every single term that appears defined in pseudocode_defs and proof_defs? So what I mean is that it is not just TotalOrd, but also many other terms in the document
The current OpenDP library contains the \texttt{make\_clamp\_vec} function implementing the clamping function. This is defined in lines 25-38 of the file \texttt{manipulation.rs} in the Git repository\footnote{As of June 16, 2021. Since then, the code has been updated to include a more general clampable domain, which is not yet finished.} (\url{https://github.com/opendp/opendp/blob/58feb788ec78ce739caaf3cad8471c79fd5e7132/rust/opendp/src/trans/manipulation.rs#L25-L38}). | ||
|
||
\begin{figure}[ht] | ||
\includegraphics[width=16cm]{clamp_rust.png} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this an old version of the code? still has PartialOrd in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it looks like it doesn't enforce that M is SymmetricDistance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, TotalOrd is still not implemented in the OpenDP code. Mike said it is in the to do list, but it has not yet been changed, which is why PartialOrd still appears in the pseudocode. Still, I do the proof with TotalOrd because Mike said it will for sure be updated in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't made the change to Ord a priority because it hasn't been clear to me if this is what we want for the library. I wrote a bit more about this in an issue here: opendp/opendp#203
I'd be happy to make the changes once it's clear that we want to adopt noisy-float newtypes to strengthen this trait bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Discussed on the architecture meeting on August 3rd)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until the code has been fixed, the proof document should say explicitly that the proof is being given assuming Ord even though the Rust code only requires PartialOrd, so the Rust code is not quite correct yet. In general, any gaps like this should be highlighted clearly at the beginning of the document under a section titled something like "Remaining Gaps between Proof and Code".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good idea to have such a section -- but TotalOrd is fixed now, so I deleted it
return max(min(x, U), L) | ||
return list(map(clamp, data)) |\label{line:map}| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where can I find definitions of max, min, list, map? Are these built-in Rust functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Max and min are defined in the pseudocode definitions file (see Definitions 4.17 and 4.18). Map and list are built-in functions, but I can add them to the pseudocode definitions file. In the case of clamp, the Rust code does not directly use a list, but talking to Mike we settled on that the best Python-Rust blend term for this line would be to use a list, instead of the Rust .collect() function that appears in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think it would be good to add map and state in the pseudocode defs file that it is representing the Rust function collect. Actually doesn't map by definition return a list? So what is the list(.) function doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, I see my questions are answered below- but the fact that Grace and I had the same question reinforces that it would be good to add defs of map and list into the pseudocode_defns file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are added now
\textbf{(Domain-metric compatibility).} For \texttt{MakeClamp}, both cases correspond to showing that \texttt{VectorDomain(T)} is compatible with symmetric distance. This follows directly from the definition of symmetric distance, as stated in \href{https://www.overleaf.com/project/60d214e390b337703d200982}{``List of definitions used in the proofs"}. The \textit{appropriate output domain property} shown above ensures that \texttt{output\_domain} is \texttt{VectorDomain(T)}, and hence also compatible with the \texttt{output\_metric} symmetric distance. | ||
|
||
\smallskip | ||
\textbf{(Stability guarantee).} Throughout the stability guarantee proof, we can assume that $\function(v)$ and $\function(w)$ are in the correct output domain, by the \textit{appropriate output domain property} shown above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following will give a simpler proof. Whenever we have a function from Vector Domains to Vector Domains of the form function(data)=list(map(f,data)), then h_{function(v)}(z) = sum_{y in f^{-1}(z)) h_v(y). And then apply triangle inequality and that the sets f^{-1}(z) form a partition of the domain of f. This will avoid a need to do a case analysis, and we can use the same proof for all of transformations of this form. In fact, this can be a general lemma in the proof_defns file that we cite, or a proof for "make_rowtransform" (which I believe that Mike may have written, but which will require a precondition on the function f).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the requirement is that f should be a "pure" function, i.e. a function in the standard mathematical sense, not a piece of code that has side effects or can access some global state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am typing up proof of stability relations for row by row transformations. Would it make more sense for this to be in the list of definitions in proofs or a separate transformation proof for make_row_by_row
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided on August 4: clamp will not include this simplified row by row proof until the Rust code for clamp actually calls the make_row_by_row
transformation inside of it. But see e.g., is_equal
for a use case of the make_row_by_row
transformation.
The current OpenDP library contains the \texttt{make\_clamp\_vec} function implementing the clamping function. This is defined in lines 25-38 of the file \texttt{manipulation.rs} in the Git repository\footnote{As of June 16, 2021. Since then, the code has been updated to include a more general clampable domain, which is not yet finished.} (\url{https://github.com/opendp/opendp/blob/58feb788ec78ce739caaf3cad8471c79fd5e7132/rust/opendp/src/trans/manipulation.rs#L25-L38}). | ||
|
||
\begin{figure}[ht] | ||
\includegraphics[width=16cm]{clamp_rust.png} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it looks like it doesn't enforce that M is SymmetricDistance.
The current OpenDP library contains the \texttt{make\_clamp\_vec} function implementing the clamping function. This is defined in lines 25-38 of the file \texttt{manipulation.rs} in the Git repository\footnote{As of June 16, 2021. Since then, the code has been updated to include a more general clampable domain, which is not yet finished.} (\url{https://github.com/opendp/opendp/blob/58feb788ec78ce739caaf3cad8471c79fd5e7132/rust/opendp/src/trans/manipulation.rs#L25-L38}). | ||
|
||
\begin{figure}[ht] | ||
\includegraphics[width=16cm]{clamp_rust.png} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks
clamp/clamp.tex
Outdated
\textbf{(Appropriate output domain).} In the case of \texttt{MakeClamp}, this corresponds to showing that for every vector $v$ of elements of type \texttt{T}, $\function(v)$ is a vector of elements of type \texttt{T} which are contained in the interval \texttt{[L, U]}. For that, we need to show two things: first, that \texttt{function(v)} has type \texttt{Vec(T)}. | ||
Second, that they belong to the interval \texttt{[L, U]}. | ||
|
||
Firstly, that $\function(v)$ has type \texttt{Vec(T)} follows from the assumption that element $v$ is in \texttt{input\_domain} and from the type signature of \texttt{function} in line~\ref{line:fn} of the pseudocode (Section~\ref{sec:pseudocode}), which takes in an element of type \texttt{Vec(T)} and returns an element of type \texttt{Vec(T)}. If the Rust code compiles correctly, then the type correctness follows from the definition of the type signature enforced by Rust. Otherwise, the code raises an exception for incorrect input type. | ||
|
||
Secondly, we need to show that the vector entries belong to the interval \texttt{[L, U]}. This follows from the definition of \texttt{clamp} in line \ref{line:clamp}. According to line \ref{line:clamp} in the pseudocode, there are 3 possible cases to consider: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very well organized; it really helps that you broke down each case to consider with firstly, secondly, etc.
def clamp(x: T) -> T: |\label{line:clamp}| | ||
return max(min(x, U), L) | ||
return list(map(clamp, data)) |\label{line:map}| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use list()? By definition of map, it should apply to clamp function elementwise to the entire input vector. https://en.wikipedia.org/wiki/Map_(higher-order_function)#:~:text=In%20many%20programming%20languages%2C%20map,when%20considered%20in%20functional%20form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, Silvia explained that the reaosn is because
Mike: “2. returning a map has a slightly different meaning. list(map(...)) or [clamp(L, U, v) for v in arg] is more accurate. In rust, the .collect exhausts the iterator into a vector. So the analogue here is list or the list comprehension (not a generator comprehension). It would be cool if we chained iterators, but that makes other things more complicated (edited)”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right so we want to return a vector (for appropriate output domain) hence we use the Python list
, not a map
clamp/clamp.tex
Outdated
return max(min(x, U), L) | ||
return list(map(clamp, data)) |\label{line:map}| | ||
|
||
return Transformation(input_domain, output_domain, function, input_metric, output_metric, stability_relation = Relation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the special naming "Relation" instead of "stability_relation"? The code makes sense. Are you guys planning on switching type hints to the type hint syntax? I prefer the typing syntax for hints as it looks less like a function invocation. It's a pretty minor pick, basically Vec(T)
-> Vec[T]
. Also, are we keeping Vec
from rust or using python's list
in the pseudocode? The iterator collection uses list, but the type is a Vec
. I would change one or the other to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the special naming of "Relation": I mainly did this to save space in Section 2. Proof when proving the stability relation, so that only "Relation" appears in the text instaed of "stability_relation". And also so that the theorem statement for the stability/privacy guarantee looks the same for measurements, where we would have privacy_relation = Relation
. But if you think this is not enough to change it I can add stability_relation
back, no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the type hint syntax, are you saying that the correct function type signing for clamp is def function(data: Vec(T)) -> Vec[T]
? How so? So initially we had Vec(T) -> Vec(T)
but you said that this was not correct, and so we introduced the rule "Parentheses, ()
, are used to create an instance of a domain. Square brackets, []
, are used to describe the type of a domain. For example, AllDomain(i8)
is the domain of all values of type i8.
However, the type of AllDomain(i8)
-- the domain itself, not the elements of the domain -- is AllDomain[i8];
note the square brackets. This typing style is inspired by the notation used in Python; see \url{https://docs.python.org/3/library/typing.html}." (this is now included in the pseudocode definitions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding Vec vs list: so I thought we wanted the pseudocode to look as Python as possible, which is why I wrote it in list format. But then it seemed not correct to change the input and output domains, because the VectorDomain
formulation is important and appears everywhere, and also for showing correct output domain. When you say you would change either or the other are you proposing to either: 1. Change line 148 to return Vec(map(clamp, data))
and keep the domains as they are, or 2. To keep line 148 as it is and change the domains to something different? Approach 2 seems worse I think, because that is inventing new domains that don't exist in the Rust library. And could we not just say that list
has type Vec
in the list of definitions to resolve this? But perhaps approach 1 is better then, although less Python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I think the part throwing me is that camel-case code in python usually represents classes. I have a soft preference towards
relation
orstability_relation
. -
No, the arrow in my statement was meant to communicate "replace Vec(T) with Vec[T]". The current syntax in the pseudocode is
def function(data: Vec(T)) -> Vec(T):
. The revised syntax would bedef function(data: Vec[T]) -> Vec[T]:
. The type hints should use the type syntax, because you're not making instances of a Vec within the signature. -
You're right, there's no win here. I think we leave Vec/list inconsistent unless someone else has strong feelings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- That makes sense, I had not considered the implications of having it capitzalied. I will change all of them to
relation
then. - Ah okay, I thought the TeX had already been changed to
Vec(T)
, but it is fixed now. - Okay!
clamp/clamp.tex
Outdated
input_metric = SymmetricDistance() | ||
output_metric = SymmetricDistance() | ||
|
||
def Relation(d_in: u32, d_out: u32) -> bool: |\label{line:rel}| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will switching to IntDistance cause headaches elsewhere? I'm thinking no, but sometimes making things more flexible complicates things. IntDistance is just a type alias, the user can't change it, but having it behind an alias makes it easy to change across the library. The naming could be adjusted, as always. It might make sense to add an explicit list of types IntDistance can be- basically all primitive int types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right you had changed the associated type of symmetric distance to IntDistance
, right? But then set IntDistance = u32
-- so you say to add an explicit list of types IntDistance
can be, but isn't it only used for u32
in symmetric distance no? And are you saying that I should change the above to def Relation(d_in: IntDistance, d_out: IntDistance)
? Or perhaps not yet because it does not appear in the clamp Rust code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I think you are saying to write def Relation(d_in: IntDistance, d_out: IntDistance)
instead, but then should IntDistance = u32
be specified anywhere in the clamp pseudocode or do we add the explicit list of types for IntDistance in the definitions for the pseudocode document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to pull this u32 constant out in favor of an alias, but it does seem to cause headaches--
- The name is much longer
- We use all sorts of traits/methods on IntDistance that aren't qualified in the trait bounds because it's a type alias, not a generic. I wouldn't want to have to pull these details down into proofs everywhere.
I vote we don't bother making changes at this point. If we did, I would advocate:
- a better alias name
- the alias definition only makes a definitive statement that it is defined to be u32
- include a weaker statement that any type from a list of int types should be isomorphic, but not give explicit guarantees.
Maybe a better approach is to just add the statement in (3) to the definition of SymmetricDistance, and leave the proofs as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, for now I will leave u32
then.
Remaining things to double-check (all minor):
Will require future edits if Edit: the remaining things have been double-checked. |
clamp/clamp.tex
Outdated
\section{Algorithm Implementation} | ||
|
||
\subsection{Code in Rust} | ||
The current OpenDP library contains the \texttt{make\_clamp\_vec} function implementing the clamping function. This is defined in lines 25-38 of the file \texttt{manipulation.rs} in the Git repository\footnote{As of June 16, 2021.} (\url{https://github.com/opendp/opendp/blob/58feb788ec78ce739caaf3cad8471c79fd5e7132/rust/opendp/src/trans/manipulation.rs#L25-L38}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on the Rust code (should probably be a comment on manipulation.rs): 1-stability and Lipschitz constant of 1 are the same thing (so "as well as" should be "i.e.")
clamp/clamp.tex
Outdated
\begin{figure}[ht] | ||
\includegraphics[width=16cm]{clamp3.png} | ||
\centering | ||
\label{fig:code} | ||
\end{figure} | ||
|
||
\begin{figure}[ht] | ||
\includegraphics[width=16cm]{clamp4.png} | ||
\centering | ||
\label{fig:code} | ||
\end{figure} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why include this code if the proof is only going to be for make_clamp_vec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took it out
clamp/clamp.tex
Outdated
|
||
\item \textup{(Domain-metric compatibility).} The domain \texttt{input\_domain} matches one of the possible domains listed in the definition of \texttt{input\_metric}, and likewise \texttt{output\_domain} matches one of the possible domains listed in the definition of \texttt{output\_metric}. | ||
|
||
\item \textup{(Stability guarantee).} For every pair of elements $v, w$ in \texttt{input\_domain} and for every pair $(\din, \dout)$, where $\din$ is of the associated type for \texttt{input\_metric} and $\dout$ is the associated type for \texttt{output\_metric}, if $v,w$ are $\din$-close under \texttt{input\_metric} and $\Relation(\din, \dout) = \True$, then $\function(v), \function(w)$ are $\dout$-close under \texttt{output\_metric}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit "dout is the associated type" -> "dout is of the associated type" (actually "dout has the associated type" reads more nicely)
clamp/clamp.tex
Outdated
\subsection{Symmetric Distance} | ||
\begin{theorem} | ||
For every setting of the input parameters \texttt{(L, U)} to \texttt{MakeClamp} such that the given preconditions | ||
hold, the transformation returned by \texttt{MakeClamp} has the following properties: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we say "MakeClamp raises a exception (at compile time or run time) or returns a valid transformation."? (And then we can/should put the def of what constitutes a valid transformation in either the pseudocode_defns or proof_defns file (not sure which).)
clamp/clamp.tex
Outdated
\end{enumerate} | ||
In all three cases, the returned value of type \T is contained in the interval \texttt{[L, U]}. Hence, the vector $\function(v)$ returned in line~\ref{line:map} of the pseudocode is an element of \texttt{output\_domain}. | ||
|
||
Lastly, the necessary condition that \texttt{L} $\leq$ \texttt{U} is checked when declaring \texttt{output\_domain = VectorDomain(IntervalDomain(L, U))} in line~\ref{line:output} of the pseudocode. This check already exists via the construction of \texttt{IntervalDomain}, which returns an error if \texttt{L} $>$ \texttt{U}. Both \texttt{L} and \texttt{U} have type \texttt{T} by the type signature of \texttt{MakeClamp}. Both the definition of \texttt{IntervalDomain} and that of the \texttt{clamp} function (line~\ref{line:clamp} in the pseudocode, which uses the \texttt{min} and \texttt{max} functions) require that \texttt{T} implements \texttt{TotalOrd}, which holds by the preconditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fact that IntervalDomain raises an exception when L>U should be stated at the beginning of the proof, before item 1. Otherwise much of what follows doesn't make sense (e.g. case analysis above for clamp).
clamp/clamp.tex
Outdated
as we wanted to show. | ||
\end{proof} | ||
|
||
\silvia{Flag: will change to the simplified general proof scheme for row-by-row transformations that Prof. Vadhan suggested in the Git repo once the \texttt{make\_row\_by\_row} transformation is included as part of the clamp transformation code.} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth doing that here in this proof even prior to make_row_by_row. It is much simpler and shorter than what's above.
First pull request for the Clamp transformation, which corresponds to this GitHub permalink.
It has undergone a first correction by Prof. Vadhan on June 22 and a first correction by Mike on July 1 and 7. All these corrections have been implemented. The pseudocode has not been changed to include Andy's comment on July 1 on adding the domains and metrics as declarative statements in the preconditions. (Edit: the updated version now includes said preconditions)
Files:
Update: July 12, 2021
VectorDomain(IntervalDomain(L, U))
.Further questions on the new version:
def MakeClamp(L: T, U: T)
or add that L and U have type T in the preconditions?if L>U: raise Exception
to the preconditions? a) But this would not be a compile time error. b) Mike said that the Rust construction for IntervalDomain is enforced to check thatL<=U
, so do I need to check this condition in the pseudocode and raise an exception ifIntervalDomain(L, U)
is in the preconditions?Update: July 14, 2021
inf_cast
ond_in
given thatd_in
andd_out
have the same type (namelyu32
).Update: September 6, 2021