Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh: specify models using patsy formulas #32

Open
knaaptime opened this issue Feb 14, 2020 · 9 comments
Open

enh: specify models using patsy formulas #32

knaaptime opened this issue Feb 14, 2020 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@knaaptime
Copy link
Member

I think it would be a major boon to usability to allow users to specify models using patsy formulas with geodataframes.

It looks like integrating it would be pretty straightforward, and I'm happy to get it started if it would be a welcome addition, so wanted to raise for discussion:

  1. are folks onboard with this idea?
  2. if so, do we want two APIs (like statsmodels) or a single one that could accept either signature?
@lanselin
Copy link
Member

this is something we talked about a couple of years ago, but i don't think patsy was quite mature enough then. ljwolf may have worked on this. i'm all for it, also to make sure the new panel stuff we are working on conforms. probably need a hangout to agree on API.

@knaaptime
Copy link
Member Author

awesome. I'll start some experiments as a proof of concept, then maybe we can circle back on some API choices

@pedrovma
Copy link
Member

This is awesome, @knaaptime. As Luc said, we talked about this at some point, and also about changing the way we show the results.

I tend to always favor a single API that takes both structures. This is how I am currently working for panels: the code can take data in either long or wide formats. It’s just a matter of a try/except statement to figure out what to do. So my initial feeling is that a “don’t ask, just do” approach is better. But it would be great to further discuss this over a hangout!

@knaaptime
Copy link
Member Author

cool, that's what I'll start first. After some poking around last night, I'm pretty sure we're in exactly the situation described in the patsy docs
image

which would basically mean just adding a data argument to the existing functions that patsy would use to generate the design matrices internally (otherwise, we just fall back to the existing API with data=None). Everything else is already set up nicely (e.g. we could use the patsy metadata to pass names to the name_xlist argument)

maybe we could chat a bit about this at the next dev meeting?

@knaaptime knaaptime added the enhancement New feature or request label Feb 14, 2020
@ljwolf
Copy link
Member

ljwolf commented Feb 15, 2020

Yeah, I recall writing a prototype back in 2016, and we couldn't agree on whether it should be a separate constructor (like... ML_Lag.from_formula()) or a separate module with different inits (like spreg.formula.ML_Lag), and how to deal with the instrumental variable formulas.

I like .from_formula(), and I think having an instrumental variables equation like nlm in R makes sense, more than extending the patsy grammar.

@darribas
Copy link
Member

Just to make the plug here too, I've added a comment over at mgwr#77 that affects this discussion too.

@lanselin
Copy link
Member

not only is there a potential issue with spatial lags, there are also regime variables. how would those fit into the patsy syntax?
same with spatially lagged explanatory variables (SLX, spatial Durbin), ideally computed on the fly (but not in the current implementation). and where would the weights be specified?

@sjsrey
Copy link
Member

sjsrey commented Feb 17, 2020

Just to make the plug here too, I've added a comment over at mgwr#77 that affects this discussion too.

To @darribas point, if the patsy approach would be used throughout multiple packages (spreg, spvcm, spint, tobler, ....) then maybe we should think about putting it in libpysal?

@darribas
Copy link
Member

To @darribas point, if the patsy approach would be used throughout multiple packages (spreg, spvcm, spint, tobler, ....) then maybe we should think about putting it in libpysal?

I was thinking that, or even in pysal? It's not something "core" as weights would be, but something build "atop" the federation. Just to throw an idea out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants