Skip to content
This repository has been archived by the owner on Jul 30, 2024. It is now read-only.

Impute Constant #8

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open

Impute Constant #8

wants to merge 24 commits into from

Conversation

gracetian6
Copy link
Contributor

make_impute_constant is a function that "imputes" or inserts a constant input variable whenever the vector entry is null.

Changes as of July 26:

  • pseudocode with preconditions and post conditions
  • proofs with appropriate output domain, domain metric, stability guarantee

PRIORITIZE: Appropriate Output Domain Proof. Not really sure if checking type signature is enough? Unsure whether the error check in the code needs to be included in pseudo code (and is needed to justify output domain pf).

Please look at impute_constant.pdf as you review the .tex!

Comment on lines 142 to 147
\item \textbf{(Appropriate output domain).}
In the case of \texttt{make\_impute\_constant}, this corresponds to showing that for every vector $v$ of elements of type \texttt{DA}, \texttt{function}$(v)$ is a vector of elements of type \texttt{DA::NonNull}.

The $\function(v)$ has type \texttt{Vec(DA)} follows from the assumption that element $v$ is in \texttt{input\_domain} and from the type signature of \texttt{function} in line~\ref{line:fn} of the pseudocode (Section~\ref{sec:pseudocode}), which takes in an element of type \texttt{Vec(DA)} and returns an element of type \texttt{Vec(DA::NonNull)}. If the Rust code compiles correctly, then the type correctness follows from the definition of the type signature enforced by Rust. Otherwise, the code raises an exception for incorrect input type.

Lastly, we ensure that return type must be \texttt{NonNull} \grace{Attribute? Property? Carrier} \texttt{DA::NonNull} because we check to make sure the \texttt{constant} being imputed in the function must be null. \grace{Not sure if this needs to be explicit in the pseudo code with the error check, or if I can just cite some trait property.} This check already exists because in the input of the \texttt{make\_impute\_constant}, \texttt{constant} is required to have type of \texttt{DA::NonNull}.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want feedback on appropriate output domain - not sure how to make it more rigorous, and if I need to consider more than just the type signature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this message I think there should first be a definition of AllDomain along the lines of "the set of all non-null values that T may take on." You want to show that the output domain on this constructor is VectorDomain<AllDomain<T>>, or "any Vec<T> without nulls". You can first lean on the type system to get you most of the way there- the type system constrains the output set to "any Vec<T>". You can then show that the set of potential outputs is equivalent to the set described by VectorDomain<AllDomain<T>> by showing that vectors with nulls are not members of the output set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unmarking this as resolved, because you resolved it before I finished writing my response!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ord changes make this comment is partially outdated. AllDomain will not be adjusted.

impute_constant/impute_constant.tex Outdated Show resolved Hide resolved
@gracetian6
Copy link
Contributor Author

gracetian6 commented Jul 29, 2021

Changes as of Thu Jul 29:

  • incorporated check for nullity in the pseudo code (per Mike's comments)
  • added lemma that shows that DA::NonNull can contain null values
  • updated proof for appropriate output domain to take into account the check for nullity (of the lemma above)

PRIORITIZE (see my comments on the code below):

  1. In the pseudocode: Is the type signature correct (input and output type)? Are the input and output domain correct?
  2. In the appropriate output domain proof: What is the reason the type signature is not sufficient? Is it because the output type is more general than the output domain? Or is it theoretically because constant can be null value even if it has the type DA::NonNull?
  3. First lemma in proof: so, is it correct that a variable with type DA::NonNull can contain null values?

Copy link
Contributor Author

@gracetian6 gracetian6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below I have marked down the questions and parts I'd like reviewers to prioritize.

impute_constant/impute_constant.tex Outdated Show resolved Hide resolved
impute_constant/impute_constant.tex Outdated Show resolved Hide resolved

The $\function(v)$ has type \texttt{Vec(DA)} follows from the assumption that element $v$ is in \texttt{input\_domain} and from the type signature of \texttt{function} in line~\ref{line:fn} of the pseudocode (Section~\ref{sec:pseudocode}), which takes in an element of type \texttt{Vec(DA)} and returns an element of type \texttt{Vec(DA::NonNull)}. If the Rust code compiles correctly, then the type correctness follows from the definition of the type signature enforced by Rust. Otherwise, the code raises an exception for incorrect input type.

Lemma \ref{lemma-NonNull} tells us the type signature is not sufficient check because (1) the output type of \texttt{VectorDomain(DA::NonNull)} is more general than the output domain of \texttt{VectorDomain(DA::NonNull)}. \grace{?? The output type matches the output domain I wrote, so something's wrong.} (2) \texttt{constant} can be \texttt{null} even if it is of type \texttt{DA::NonNull}. For (2), if the constant is null, then \texttt{impute\_constant} function (line \ref{line:atomfn}) could potentially return \texttt{null} values since it replaces a \texttt{null} input with \texttt{constant}.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. for appropriate output domain:

What is the reason the type signature is not sufficient? Is it because the output type is more general than the output domain? Or is it theoretically because constant can be null value even if it has the type DA::NonNull?

Copy link
Member

@Shoeboxam Shoeboxam Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, Vec(DA) or "a vector of atomic domains" doesn't really make sense in this context. I think you meant Vec(DA::Carrier) in these blocks instead of Vec(DA).

And yes, in this case the output type can represent a value (float nan) that is not a member of the output domain. The purpose of the impute function is to ensure that these outputs are never null at runtime, so it replaces null with a constant. Since constant has type DA::NonNull, and DA::NonNull may itself be f64, we must check that constant is non-null. This check should make it impossible for analysts to "impute" by nonsensically replacing nulls with nulls, thus violating the output domain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Mike. This helps clarify things a lot!

Comment on lines 139 to 146
\begin{lemma}[\texttt{DA::NonNull} contains \texttt{null}] \label{lemma-NonNull}
\texttt{var} of type \texttt{DA::NonNull} can be of type \texttt{null}.
\end{lemma}
\begin{proof}
Let the domain of atom variable \texttt{DA} be \texttt{InherentNullDomain<AllDomain<f64>>}. Recall that \texttt{InherentNullDomain} exists for types that can represent null inherently in the carrier type. Then the type $$\texttt{DA::NonNull == InherentNullDomain<AllDomain<f64>>::NonNull == f64}.$$ The latter holds because in the \texttt{InherentNullDomain} implementation in the rust code \url{https://github.com/opendp/opendp/blob/main/rust/opendp/src/trans/impute.rs#L48-L56}, the \texttt{type NonNull = Self::Carrier}. The \texttt{Carrier} of \texttt{VectorDomain<AllDomain<T>>} has type \texttt{T}, so in this case the \texttt{::Carrier} is type \texttt{f64}.

Therefore \texttt{var} is also of type \texttt{f64}. \texttt{f64} can contain \texttt{null} values, so we are done.
\end{proof}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. is it correct that a variable with type DA::NonNull can contain null values?

Copy link
Member

@Shoeboxam Shoeboxam Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DA::NonNull may be capable of representing a null value. This is the case when DA is InherentNullDomain<AllDomain<f64>>, meaning DA::NonNull is f64. If an instance of DA::NonNull is null, it is either imputed or a construction-time error is raised by 2.

@Shoeboxam
Copy link
Member

Shoeboxam commented Jul 29, 2021

If it helps with clarity, we could rename DA::NonNull to DA::Imputed or something else. I'm indifferent towards renaming as it's not clear to me personally if that's an improvement.

@Shoeboxam
Copy link
Member

Shoeboxam commented Jul 29, 2021

In the context of the overall library, this constructor slightly changes the meaning of AllDomain<T>. AllDomain<T> used to be "the set of values T may take on." It is now "the set of non-null values T may take on." This change to AllDomain simplifies the domains of all transformations/measurements after imputation, as it keeps descriptors for nullity isolated to two specific domains early in the data processing pipeline. On the other hand, this distinction is not currently captured in the member check for AllDomain.

See also opendp/opendp#203. If we adopted noisy-float, then we wouldn't be changing the AllDomain description.

Copy link
Contributor Author

@gracetian6 gracetian6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes as of Aug 5 (ready to be reviewed)

  • stability guarantee pf references row transform
  • proof of pure function (satisfies row transform criteria)
  • row transform incorporated into pseudocode

PRIORITIZE comments specifically added by me below!

impute_constant/impute_constant.tex Show resolved Hide resolved
impute_constant/impute_constant.tex Outdated Show resolved Hide resolved
impute_constant/impute_constant.tex Show resolved Hide resolved
impute_constant/impute_constant.tex Outdated Show resolved Hide resolved
impute_constant/impute_constant.tex Show resolved Hide resolved
impute_constant/impute_constant.tex Outdated Show resolved Hide resolved

The $\function(v)$ has type \texttt{Vec[DA::Imputed]} follows from the assumption that element $v$ is in \texttt{input\_domain} and from the type signature of \texttt{function} in line~\ref{line:fn} of the pseudocode (Section~\ref{sec:pseudocode}). The type signature takes in an element of type \texttt{Vec[DA::Carrier]} and returns an element of type \texttt{Vec[DA::Imputed]}. If the Rust code compiles correctly, then the type correctness follows from the definition of the type signature enforced by Rust. Otherwise, the code will halt at compile time.

The type signature is not a sufficient check, since by Lemma \ref{lemma:Imputed}, the function's output type can represent a value (e.g. \texttt{float nan}) that is not a member in the \texttt{output\_domain}. \grace{How do I clarify that the output domain \\ \texttt{VectorDomain(AllDomain(DA::Imputed))} should actually contain NO NULLS? The code seems too loose because in Lemma 1.1 we just showed that \texttt{DA::Imputed} can contain null values.} To ensure that function returns the appropriate output domain at run time, we must ensure that at run time the \texttt{function}$(v)$ is not null. To do so, we must check whether the \texttt{constant} is not null.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like feedback on how to clarify that the output domain VectorDomain(AllDomain(DA::Imputed)) should actually contain NO NULLS but the output type Vec[DA::Imputed] can contain a null value?

  • Seems contradictory since Lemma 1.1 we just showed that DA::Imputed can contain null values

@gracetian6
Copy link
Contributor Author

New changes as of Sept 4, 2021:

  • addressed comments about pure function (updated to proofs def doc)
  • stylistic changes to pseudo code, pre conditions
  • want feedback on appropriate output domain proof

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants