-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new solver that supports L1 penalty to Logistic Regression #70
Comments
I took a brief look at this. SAGA/SAG/SG solvers take advantage of the fact that the loss function (e.g. least-squares or cross entropy loss) is a sum of more simple convex functions. The main idea of SAGA/SAG/SG is that at each iteration not the whole gradient is computed but only one term of the whole sum is being updated. So computation is significantly less expensive for large n (vs gradient descent) but achieves similar rate of convergence. Thus we can not use the general API that is nicely made in this lib:
Unfortunately it is not enough to know how f or df are computed. We need to know how f and df are being presented as a sum and how each term of the sum can be computed and differentiated. If we look at sklearn library, SAG and SAGA solvers are not implemented in a general way but for exactly three types of loss functions: Thus choosing the SAGA solver we would need to change the implementation of It is quite a few changes to the library so just wanted to get a green light from you before going ahead. I would probably go as I briefly described - changing the To save time I suggest reading the SAG paper - it is very comprehensive. (https://arxiv.org/pdf/1309.2388.pdf) |
Actually now I am thinking that if we add |
cool. which basic building blocks do you suggest to start from? And which existing implementations we can take as reference? |
I would start with the vanila SAG/SAGA algorithm.
there is one trick to add to the above description - rescaling the weights for numerical stability However the SAGA algorithm is one of the not so numerous family allowing for L1 regularisation and L1 regularisation works well in sparse problems. The above algorithm may be significantly accelerated for sparse problems (and it is explained in the original SAGA paper). The idea is to not make updates to weights at every step but only update them just-in-time when current randomly drawn x is dense in the not updated gradients. The accelerated version would work for dense matrices too. The reference implementation from sklearn is full and addresses the sparsity problem. So I have a question. I see DenseMatrix object in the lib but no SparseMatrix object. Is there anything special that was planned for sparse matrices. If not I would probably copy the implementation from sklearn. Sklearn has a SequentialDataSet object which holds the matrix rows with only non-zero values and the indices of these non-zero values in those rows |
Ideally I wanted to stick with CSR standard as defined in |
Any objections to this crate then - https://docs.rs/sprs/latest/sprs/ ? It is CSR. The algo would work this way then. The optimizer would accept normal Matrices and Vectors from this lib. Then under the hood transform into CSR matrices, vectors before doing any computations. There could be a branch - if the Matrix is specified as DenseMatrix then no CSR transformation and a different algorithm - but probably it is an unnecessary complication as sklearn addresses every input homogenously. Certainly can be done as an improvement rather than first shot |
The usual questions to answer about a library to add as dependency are:
Having a conversion dense->sparse is ok, there are maybe some memory-performance considerations to be made but for now should be ok. cc: @morenol @montanalow |
Ok, great. Working on it |
Just to give you a heads up. I am still on it. So far not much code done. But I have solidified my understanding of SGD->SAG->SAGA. Tried to write a simplified version here. I've looked into whether SAGA is obsolete as we're in 2022 and SAGA was done in 2014. And found 3 promising developments based on SAGA:
But this is rather for future development I found three reference implementations:
3-d implementation is best structured and clean IMO but it lacks l1 regularisation. 2d one is most full but it uses strange notations and algorithm deviates in some places (e.g. here). So I'll use 1) trying to structure more like in 3) and add missing parts from 2). |
We need to implement a new solver that supports L1 penalty for Logistic Regression. A good candidate is saga solver. Another alternative would be to use a coordinate descent algorithm that is widely used in LIBLINEAR
The text was updated successfully, but these errors were encountered: