-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Wilkinson formula interface and scikit-learn style estimators #103
Conversation
@jGaboardi just curious, was there any off-line discussion about this before closing? IIRC, this was a pretty good effort towards adding the formula capabilities that we have talked about for a while. |
I'm not a fan of the Wilkinson formula. For any but the simplest models it creates all kinds of issues, e.g., how to introduce WX variables without having to explicitly list them, etc. For example, compare to the in my opinion overly complicated interface in spatialreg in R to accommodate all the situations related to spatial Durbin, endogenous variables, instruments, etc. Btw, instruments are never part of an actual formula but added on by means of these (again, in my opinion) very awkward | | options. While not ideal, I very much like the most recent setup (1.4) which allows for specification of a range of models such as SLX, spatial Durbin, SLX error and even GNS, using GMM_Error and slx_lags and add_wy arguments. The WX and Wy variables never need to be listed explicitly but are computed (and variable names created) under the hood. The fewer variable names need to be passed, the fewer opportunities for typos. My golden rule in life :-) I think our efforts are better spent at enhancing functionality in supporting different models and estimation methods. I was never a fan of this in the first place, not sure why it even became a GOSC project ... |
@TaylorOshan I did not close this... Perhaps I screwed something up with the |
@TaylorOshan I very much apologize for the screw up. I have never seen this happen with |
I say no. Neither Pedro nor I like this approach. As long as I am bdfl for spreg, I nix it. The sklearn interface is built for prediction, spreg is mostly about inference, so it actually turns out to be very awkward. |
Just to make clear, I would like it to be closed. We are done with that. |
The many additional combinations of specifications introduced by v1.4 make this outdated. As @lanselin pointed out, these formulas can't address them all easily, especially when combined with regimes and/or seemingly unrelated regressions. I believe the suggestion made by @martinfleis in #127 is a more interesting approach to simplify the function calls given the changes introduced by v1.4. Regardless, adding other functionalities we have already mapped, such as additional tests and the computation of direct/indirect effects, seems to me to be more of a priority now than enhancing the arguments' structure. |
Understood @lanselin. Thanks for the additional feedback @pedrovma. My intention was not to advocate for this in spreg. There were several conversations during the GSOC period about formulas more generally and in the context of other modules and I was curious if this would be worth preserving as a proof-of-concept in case it is useful elsewhere in the future. Thanks to @jGaboardi for pointing out that the original branch is in tact and takes care of that. |
This pull request supercedes #102. It adds the following features:
spreg/formula.py
)spreg
calledspreg.sklearn
that includes scikit-learn interfaces toError
,Lag
,DurbinError
, andDurbinLag
models, as well as Anselin LM tests in the scikit-learn metrics style. This submodule has its own formula interface independent ofspreg.from_formula
that properly dispatches to thesklearn
models.spreg/notebooks/formula_example.ipynb
andspreg/notebooks/sklearn_example.ipynb
)spreg/tests
for formulas,spreg/sklearn/tests
for thePursuant to this comment on #101, everything in this PR is fully backwards compatible. These are add-ons to the package which sit on top of existing code and provide alternative ways for users to interface with the core functionality of
spreg
. To ensure the changes are not breaking, thespreg.sklearn
submodule must be directly loaded by users (e.g. viaimport spreg.sklearn
) as it will not be automatically imported otherwise.Currently, the formula interface function dispatches to all of the spatial regression models. While it does not support regime regression or seemingly unrelated regression, it provides a solid base of code to which these features can be added.
More information about the design process for these features can be found in my GSoC 2022 Progress Journal, and of course I'm more than happy to answer any questions and make any changes!