Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Data.Fin.Coding and Data.Fin.Enum #1007

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Conversation

fgdorais
Copy link
Collaborator

WIP replacement for Mathlib's FinEnum class.

@fgdorais fgdorais added the WIP work in progress label Oct 25, 2024
@fgdorais fgdorais force-pushed the fin-coding branch 4 times, most recently from c69d46c to 3c36b0e Compare October 25, 2024 01:22
@leanprover-community-bot
Copy link
Collaborator

leanprover-community-bot commented Oct 25, 2024

Mathlib CI status (docs):

leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Oct 25, 2024
leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Oct 26, 2024
@lyphyser
Copy link

lyphyser commented Oct 29, 2024

This still seems asymptotically inefficient and thus not acceptable for actual computation.

For instance:

  • (decodePi f x) has runtime linear in n. Instead it should be sublinear by keeping a data structure of partial products and just doing an (x / a) % b where a is the product up to the input index and b is the size of the Fin (f i) corresponding to the input index i (both a and b need to be retrieved in sublinear time from a lazy data structure)
  • (decodeSigma f) is also linear in n. Instead it should be sublinear by performing a binary search in an array or a search in a binary tree of partial sums to find the first member of the pair as the index, and then subtract the value to get the second member.
  • haven't checked others, but in general any decode function that recurses is probably suboptimal

Note that both data structures also need to be computed lazily, so that for instance decoding 0 or in general a small value doesn't require to compute set of partial products or partial sum (which would again be linear time rather than sublinear time, and in fact might be larger than available memory). This might require to use some sort of binary tree or set of thunks of increasing-power-of-2-sized arrays instead or to modify core Lean to provide some sort of ThunkArray primitive to.

This is important since computable functions should always be asymptotically optimal, and also because I think the "decide" tactic would end up decoding all possible values, so if those functions are linear, then decide will be quadratic, which is very bad.

@digama0
Copy link
Member

digama0 commented Oct 30, 2024

I don't think it's that simple. Most functions don't have the ability to do precomputation, writing let x := precomputed; fun y => f x y doesn't do what you think it does. So you have to think about the cost of doing the calculation from scratch. Even if they could do precomputation, it's not necessarily a win - keeping an array of partial products could be hugely expensive if the number of possible values is large.

This is important since computable functions should always be asymptotically optimal, and also because I think the "decide" tactic would end up decoding all possible values, so if those functions are linear, then decide will be quadratic, which is very bad.

Not exactly. decide uses kernel computation, and kernel computation is pervasively cached. So calling a function which is linear time, a linear number of times, might also be linear time depending on how similar the computations are.

@fgdorais
Copy link
Collaborator Author

@lyphyser I totally agree about potential optimizations. This is partly why this PR is marked WIP. Some of your thoughts are also things I am considering for the final PR.

Your optimizations focus on time only. Part of the reason for this PR is that these functions should, in principle, far extend the memory capacity of a single machine and perhaps even physical reality! So memory is actually more of a concern than time for the "generic" implementation. This might sound high brow but there are incredibly large arrays of data out there that could easily overload any kind of local storage situation as you propose.

Anyway, as far as time optimization is concerned, I think there could be a more general class that allows decoding gaps, where decode returns none sometimes. That could be used to delay range compression until the end. That would certainly help in some places, but I'm not sure there is a great enough overall gain in doing so.

@lyphyser
Copy link

lyphyser commented Oct 30, 2024

Most functions don't have the ability to do precomputation, writing let x := precomputed; fun y => f x y doesn't do what you think it does

What do you mean? What does it do? This only computes "precomputed" at most once in most/all programming languages and I would expect it to work in Lean as well under compilation (and also with reduction, why not?)

Or do you mean that it could compute "precomputed" less than once/only partially? (which would be good) Perhaps "precomputation" is not the right word for this and "memoization" or "lazy computation" is a better fit.

So memory is actually more of a concern than time for the "generic" implementation

Both are a concern, which is why I suggested that the precomputed data structures need to be lazily implemented, so that getting item at index i of the data structure only requires to create a data structure of size O(i).

Note that I think this means they can't literally be simple arrays, but I think that having a lazy list of thunks of arrays where the i-the array has 2^i items which are also thunks (each referencing the previous thunk) and where the concatenation of arrays is the whole sequence should work. A better solution would to implement a ThunkArray in core that can grow the underlying array lazily as needed.

There is the downside that this doesn't support the case where the data produced would exceed available memory but still be computable in the available time; I'm not sure how to support this well and it seems to be less important than making decoding of all/almost all values linear instead of quadratic; it's possible to try to band-aid over this by querying total available memory from the OS and not creating the data structure if it would be too large, although of course this will just make the program hang for more than the user is willing to wait anyway if it was relying on sublinear running time.

Not exactly. decide uses kernel computation, and kernel computation is pervasively cached

But the user might also want to use native_decide (here e.g. I'm talking about deciding statements of the form "for all x in finitely enumerable type, P(x)", which requires decoding all numbers if no enumerator is provided) or otherwise use it in compiled code so it needs to be asymptotically optimal without relying on that, and if it is asymptotically optimal under compilation, then I think it will be under reduction as well as long as no proofs that aren't constant-sized are present (right?).

Anyway, as far as time optimization is concerned, I think there could be a more general class that allows decoding gaps, where decode returns none sometimes

Mathlib currently has this for infinite types, in the form of Encodable allowing gaps and Denumerable extending Encodable and not allowing gaps.

I think it would be good to support this.

It would also be good to support an iterable class where elements are computed one-by-one in order, which would be the most efficient way to decide "forall x in finenum, P(x)" statements: this would be like Rust IntoIterator/Iterator, but also including a proof that every value in type is returned at least once, then extended by a stronger interface that also guarantees that they are returned exactly once.

But the equivalence-based class in the current pull request should be asymptotically optimal anyway even if those interfaces are present since some use cases might need it and not be able to use other classes.

BTW, it would also be good to have more general classes that support efficient encoding/decoding and partial enumeration of infinite countable types in addition to finite types, and ideally also supporting encoding/decoding of an arbitrary countable subset in case of non-countable types (this would be useful if one wants to just choose any N distinct elements computably); some classes expressing that Zero.zero encodes to 0, One.one encodes to 1, Inhabited.default encodes to 0 would also be useful.

@fgdorais
Copy link
Collaborator Author

fgdorais commented Oct 30, 2024

BTW, it would also be good to have more general classes that support efficient encoding/decoding and partial enumeration of infinite countable types in addition to finite types, and ideally also supporting encoding/decoding of an arbitrary countable subset in case of non-countable types (this would be useful if one wants to just choose any N distinct elements computably); some classes expressing that Zero.zero encodes to 0, One.one encodes to 1, Inhabited.default encodes to 0 would also be useful.

Indeed, that is part of the plan for Nat.Enum and perhaps *.Enum, eventually.

@digama0
Copy link
Member

digama0 commented Oct 30, 2024

Most functions don't have the ability to do precomputation, writing let x := precomputed; fun y => f x y doesn't do what you think it does

What do you mean? What does it do? This only computes "precomputed" at most once in most/all programming languages and I would expect it to work in Lean as well under compilation (and also with reduction, why not?)

Like I said, it doesn't do what you think. It will be eta-expanded to fun y => (let x := precomputed; fun y => f x y) y and simplified to fun y => f precomputed y, which means it will not be precomputed at all. If it's a closed expression, it will be precomputed and stored in a global variable, but most of the cases you want to precompute here are not closed expressions, they are depending on some earlier variables. Lean doesn't do staged evaluation unless you coax it really well, it is very eager to eta expand because it prefers not to create closures.

@lyphyser
Copy link

lyphyser commented Oct 31, 2024

[let x := precomputed; fun y => f x y] will be eta-expanded to fun y => (let x := precomputed; fun y => f x y) y and simplified to fun y => f precomputed y, which means it will not be precomputed at all

This seems a very serious bug in the compiler, since a transformation like this can change the asymptotic running time of a program for the worse, and hence potentially make the program unusable in practice, which a compiler clearly should never do. More precisely, code should never be moved inside a function, closure or loop unless it's a constant that compiles as either a machine-word-sized immediate or a reference to read-only global data. This is also what any programmer would expect, and it's very surprising to have a compiler behave otherwise.

Has it already been filed as an issue?

@fgdorais
Copy link
Collaborator Author

@lyphyser This is not a compiler bug. These are rules of the type theory itself! These are all pure functions, so the very idea of precomputation doesn't make sense.

To do what you want in Lean, you can either carry a state by wrapping in a monad or pass the precomputed value as an argument everywhere. You can make sure the precomputed value is calculated just once by passing a default value as a Thunk. It's not impossible to do exactly what you want, it's just not as straightforward as you think.

@digama0
Copy link
Member

digama0 commented Oct 31, 2024

I believe that this kind of thing is regarded as a compiler bug, or at least a thing worth improving, but the compiler is on hold and has been for years now, so I try not to plan around getting punctual fixes to such issues. One day in the land of the future we will get a new compiler with what amounts to a completely different operational semantics, possibly coupled with new mechanisms for fine tuning the desired behavior, but in the here and now we have a compiler and programs run relative to the operational semantics you can get out of it.

I also don't think an issue is the right place to deal with this, because it's a core aspect of the design. The people working on the compiler are very aware of the tradeoffs here, it's hard not to notice this because doing eta expansion in the compiler is not trivial. This was a conscious decision and perhaps they will make a different conscious decision in the new compiler but it will have to be weighed against all the reasons why they chose this design in the first place.

@lyphyser
Copy link

I think this can and should be fixed without having to redesign or rewrite the compiler; a completely naive and unoptimized compiler does not have this bug, so it's only a matter of disabling or changing the offending transformations. I filed it as lean4#5908.

@digama0
Copy link
Member

digama0 commented Nov 1, 2024

so it's only a matter of disabling or changing the offending transformations.

You are very much underestimating the amount of work this is. The compiler does this eta expansion in order to preserve a certain invariant about all applications being full applications and all function bodies being in eta-long form, it can't just not do it without breaking those invariants, so every other part of the compiler would have to be audited to ensure it doesn't rely on the broken invariant and patches would have to be inserted in lots of places.

@fgdorais
Copy link
Collaborator Author

fgdorais commented Nov 1, 2024

@lyphyser @digama0 Maybe this discussion should continue at lean4#5908? While of its own independent interest, the main topic has little to do with this PR.

@leanprover-community-mathlib4-bot leanprover-community-mathlib4-bot added the merge-conflict This PR has merge conflicts with the `main` branch which must be resolved by the author. label Nov 26, 2024
leanprover-community-mathlib4-bot added a commit to leanprover-community/mathlib4 that referenced this pull request Dec 2, 2024
@leanprover-community-mathlib4-bot leanprover-community-mathlib4-bot added merge-conflict This PR has merge conflicts with the `main` branch which must be resolved by the author. and removed merge-conflict This PR has merge conflicts with the `main` branch which must be resolved by the author. labels Dec 2, 2024
@quangvdao
Copy link
Contributor

Just noticed this PR now and it seems very helpful to me; is there anything I can help with to push this to main?

protected def find? (P : Fin n → Prop) [DecidablePred P] : Option (Fin n) :=
foldr n (fun i v => if P i then some i else v) none

/-- Custom recursor for `Fin (n+1)`. -/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this exactly the same as Fin.inductionOn?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's also not needed here so it will eventually disappear.

@quangvdao
Copy link
Contributor

Just noticed this PR now and it seems very helpful to me; is there anything I can help with to push this to main?

One potential problem with this PR is that a bunch of stuff in Mathlib will need to be refactored to incorporate the new API. Is there a separate Mathlib PR for this?

@fgdorais
Copy link
Collaborator Author

fgdorais commented Dec 5, 2024

This is still WIP. I won't get back to it until January. The consensus is that someone needs to move Equiv to Batteries before this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
builds-mathlib merge-conflict This PR has merge conflicts with the `main` branch which must be resolved by the author. WIP work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants