-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please publish end-to-end application example #2
Comments
Unfortunately, there is no built-in support. As long as I understood you correctly, you want to convert def to_linear(self: LoTRLinear) -> Linear:
self.linear.weights += torch.einsum('ij,jk,kl->il', self.lotr.rhs, self.lotr.mid, self.lotr.lhs)
return self.linear In this way, you can restore original model architecture and save checkpoint which can be easily restored later. |
I'm not sure. To simplify this discussion, let's speak in terms of linear algebra. Suppose that I have a linear equation with a dense matrix Ax = b. Assume it can only be solved with an approximate method, such as Krylov method, e.g. BiCGStab. BiCGStab requires to multiply the problem matrix by a vector many times. But the method is generic: it does not care how and where the matrix is multiplied by a vector, it only requests from me to give it a result of multiplication. Multiply operation is a black box from the solver's point of view. So I would compress the matrix in some way that it's not too dense and easy to multiply, and I multiply it directly whenever solver requests. This is what I practically expect from LoTR. In my understanding, the whole purpose of LoTR is to compress the weights and never go back to the original weights again. So, we should not ever do |
If I am understand you correctly, you think that the LoTR is used to compress the whole weight matrix. However, I think there is a confusion. The LoTR is used to represent correction |
Dear all,
It would be great to see an end-to-end practical example of LoTR. By "practical" I mean that one takes, for example some existing LLM weights file, compresses it into a smaller weights file with LoTR, and then uses the new weights file for inference. For the first part I imagine something like this:
Does this make sense?
The text was updated successfully, but these errors were encountered: