-
Notifications
You must be signed in to change notification settings - Fork 1
Impute Constant #8
base: main
Are you sure you want to change the base?
Conversation
impute_constant/impute_constant.tex
Outdated
\item \textbf{(Appropriate output domain).} | ||
In the case of \texttt{make\_impute\_constant}, this corresponds to showing that for every vector $v$ of elements of type \texttt{DA}, \texttt{function}$(v)$ is a vector of elements of type \texttt{DA::NonNull}. | ||
|
||
The $\function(v)$ has type \texttt{Vec(DA)} follows from the assumption that element $v$ is in \texttt{input\_domain} and from the type signature of \texttt{function} in line~\ref{line:fn} of the pseudocode (Section~\ref{sec:pseudocode}), which takes in an element of type \texttt{Vec(DA)} and returns an element of type \texttt{Vec(DA::NonNull)}. If the Rust code compiles correctly, then the type correctness follows from the definition of the type signature enforced by Rust. Otherwise, the code raises an exception for incorrect input type. | ||
|
||
Lastly, we ensure that return type must be \texttt{NonNull} \grace{Attribute? Property? Carrier} \texttt{DA::NonNull} because we check to make sure the \texttt{constant} being imputed in the function must be null. \grace{Not sure if this needs to be explicit in the pseudo code with the error check, or if I can just cite some trait property.} This check already exists because in the input of the \texttt{make\_impute\_constant}, \texttt{constant} is required to have type of \texttt{DA::NonNull}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want feedback on appropriate output domain - not sure how to make it more rigorous, and if I need to consider more than just the type signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on this message I think there should first be a definition of AllDomain along the lines of "the set of all non-null values that T may take on." You want to show that the output domain on this constructor is VectorDomain<AllDomain<T>>
, or "any Vec<T>
without nulls". You can first lean on the type system to get you most of the way there- the type system constrains the output set to "any Vec<T>
". You can then show that the set of potential outputs is equivalent to the set described by VectorDomain<AllDomain<T>>
by showing that vectors with nulls are not members of the output set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unmarking this as resolved, because you resolved it before I finished writing my response!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Ord changes make this comment is partially outdated. AllDomain will not be adjusted.
Changes as of Thu Jul 29:
PRIORITIZE (see my comments on the code below):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below I have marked down the questions and parts I'd like reviewers to prioritize.
impute_constant/impute_constant.tex
Outdated
|
||
The $\function(v)$ has type \texttt{Vec(DA)} follows from the assumption that element $v$ is in \texttt{input\_domain} and from the type signature of \texttt{function} in line~\ref{line:fn} of the pseudocode (Section~\ref{sec:pseudocode}), which takes in an element of type \texttt{Vec(DA)} and returns an element of type \texttt{Vec(DA::NonNull)}. If the Rust code compiles correctly, then the type correctness follows from the definition of the type signature enforced by Rust. Otherwise, the code raises an exception for incorrect input type. | ||
|
||
Lemma \ref{lemma-NonNull} tells us the type signature is not sufficient check because (1) the output type of \texttt{VectorDomain(DA::NonNull)} is more general than the output domain of \texttt{VectorDomain(DA::NonNull)}. \grace{?? The output type matches the output domain I wrote, so something's wrong.} (2) \texttt{constant} can be \texttt{null} even if it is of type \texttt{DA::NonNull}. For (2), if the constant is null, then \texttt{impute\_constant} function (line \ref{line:atomfn}) could potentially return \texttt{null} values since it replaces a \texttt{null} input with \texttt{constant}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- for appropriate output domain:
What is the reason the type signature is not sufficient? Is it because the output type is more general than the output domain? Or is it theoretically because constant can be null value even if it has the type DA::NonNull?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, Vec(DA)
or "a vector of atomic domains" doesn't really make sense in this context. I think you meant Vec(DA::Carrier)
in these blocks instead of Vec(DA)
.
And yes, in this case the output type can represent a value (float nan) that is not a member of the output domain. The purpose of the impute function is to ensure that these outputs are never null at runtime, so it replaces null with a constant. Since constant
has type DA::NonNull, and DA::NonNull may itself be f64, we must check that constant
is non-null. This check should make it impossible for analysts to "impute" by nonsensically replacing nulls with nulls, thus violating the output domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Mike. This helps clarify things a lot!
impute_constant/impute_constant.tex
Outdated
\begin{lemma}[\texttt{DA::NonNull} contains \texttt{null}] \label{lemma-NonNull} | ||
\texttt{var} of type \texttt{DA::NonNull} can be of type \texttt{null}. | ||
\end{lemma} | ||
\begin{proof} | ||
Let the domain of atom variable \texttt{DA} be \texttt{InherentNullDomain<AllDomain<f64>>}. Recall that \texttt{InherentNullDomain} exists for types that can represent null inherently in the carrier type. Then the type $$\texttt{DA::NonNull == InherentNullDomain<AllDomain<f64>>::NonNull == f64}.$$ The latter holds because in the \texttt{InherentNullDomain} implementation in the rust code \url{https://github.com/opendp/opendp/blob/main/rust/opendp/src/trans/impute.rs#L48-L56}, the \texttt{type NonNull = Self::Carrier}. The \texttt{Carrier} of \texttt{VectorDomain<AllDomain<T>>} has type \texttt{T}, so in this case the \texttt{::Carrier} is type \texttt{f64}. | ||
|
||
Therefore \texttt{var} is also of type \texttt{f64}. \texttt{f64} can contain \texttt{null} values, so we are done. | ||
\end{proof} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is it correct that a variable with type DA::NonNull can contain null values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DA::NonNull may be capable of representing a null value. This is the case when DA is InherentNullDomain<AllDomain<f64>>
, meaning DA::NonNull is f64. If an instance of DA::NonNull is null, it is either imputed or a construction-time error is raised by 2.
If it helps with clarity, we could rename |
In the context of the overall library, this constructor slightly changes the meaning of See also opendp/opendp#203. If we adopted noisy-float, then we wouldn't be changing the AllDomain description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes as of Aug 5 (ready to be reviewed)
- stability guarantee pf references row transform
- proof of pure function (satisfies row transform criteria)
- row transform incorporated into pseudocode
PRIORITIZE comments specifically added by me below!
|
||
The $\function(v)$ has type \texttt{Vec[DA::Imputed]} follows from the assumption that element $v$ is in \texttt{input\_domain} and from the type signature of \texttt{function} in line~\ref{line:fn} of the pseudocode (Section~\ref{sec:pseudocode}). The type signature takes in an element of type \texttt{Vec[DA::Carrier]} and returns an element of type \texttt{Vec[DA::Imputed]}. If the Rust code compiles correctly, then the type correctness follows from the definition of the type signature enforced by Rust. Otherwise, the code will halt at compile time. | ||
|
||
The type signature is not a sufficient check, since by Lemma \ref{lemma:Imputed}, the function's output type can represent a value (e.g. \texttt{float nan}) that is not a member in the \texttt{output\_domain}. \grace{How do I clarify that the output domain \\ \texttt{VectorDomain(AllDomain(DA::Imputed))} should actually contain NO NULLS? The code seems too loose because in Lemma 1.1 we just showed that \texttt{DA::Imputed} can contain null values.} To ensure that function returns the appropriate output domain at run time, we must ensure that at run time the \texttt{function}$(v)$ is not null. To do so, we must check whether the \texttt{constant} is not null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like feedback on how to clarify that the output domain VectorDomain(AllDomain(DA::Imputed))
should actually contain NO NULLS but the output type Vec[DA::Imputed]
can contain a null value?
- Seems contradictory since Lemma 1.1 we just showed that
DA::Imputed
can contain null values
New changes as of Sept 4, 2021:
|
make_impute_constant
is a function that "imputes" or inserts aconstant
input variable whenever the vector entry is null.Changes as of July 26:
PRIORITIZE: Appropriate Output Domain Proof. Not really sure if checking type signature is enough? Unsure whether the error check in the code needs to be included in pseudo code (and is needed to justify output domain pf).
Please look at impute_constant.pdf as you review the .tex!