enh: specify models using patsy formulas #32

knaaptime · 2020-02-14T16:22:48Z

I think it would be a major boon to usability to allow users to specify models using patsy formulas with geodataframes.

It looks like integrating it would be pretty straightforward, and I'm happy to get it started if it would be a welcome addition, so wanted to raise for discussion:

are folks onboard with this idea?
if so, do we want two APIs (like statsmodels) or a single one that could accept either signature?

lanselin · 2020-02-14T17:09:00Z

this is something we talked about a couple of years ago, but i don't think patsy was quite mature enough then. ljwolf may have worked on this. i'm all for it, also to make sure the new panel stuff we are working on conforms. probably need a hangout to agree on API.

knaaptime · 2020-02-14T17:23:19Z

awesome. I'll start some experiments as a proof of concept, then maybe we can circle back on some API choices

pedrovma · 2020-02-14T17:44:17Z

This is awesome, @knaaptime. As Luc said, we talked about this at some point, and also about changing the way we show the results.

I tend to always favor a single API that takes both structures. This is how I am currently working for panels: the code can take data in either long or wide formats. It’s just a matter of a try/except statement to figure out what to do. So my initial feeling is that a “don’t ask, just do” approach is better. But it would be great to further discuss this over a hangout!

knaaptime · 2020-02-14T17:54:45Z

cool, that's what I'll start first. After some poking around last night, I'm pretty sure we're in exactly the situation described in the patsy docs

which would basically mean just adding a data argument to the existing functions that patsy would use to generate the design matrices internally (otherwise, we just fall back to the existing API with data=None). Everything else is already set up nicely (e.g. we could use the patsy metadata to pass names to the name_xlist argument)

maybe we could chat a bit about this at the next dev meeting?

ljwolf · 2020-02-15T13:01:09Z

Yeah, I recall writing a prototype back in 2016, and we couldn't agree on whether it should be a separate constructor (like... ML_Lag.from_formula()) or a separate module with different inits (like spreg.formula.ML_Lag), and how to deal with the instrumental variable formulas.

I like .from_formula(), and I think having an instrumental variables equation like nlm in R makes sense, more than extending the patsy grammar.

darribas · 2020-02-17T12:12:38Z

Just to make the plug here too, I've added a comment over at mgwr#77 that affects this discussion too.

lanselin · 2020-02-17T14:48:26Z

not only is there a potential issue with spatial lags, there are also regime variables. how would those fit into the patsy syntax?
same with spatially lagged explanatory variables (SLX, spatial Durbin), ideally computed on the fly (but not in the current implementation). and where would the weights be specified?

sjsrey · 2020-02-17T15:34:40Z

Just to make the plug here too, I've added a comment over at mgwr#77 that affects this discussion too.

To @darribas point, if the patsy approach would be used throughout multiple packages (spreg, spvcm, spint, tobler, ....) then maybe we should think about putting it in libpysal?

darribas · 2020-02-17T17:25:34Z

To @darribas point, if the patsy approach would be used throughout multiple packages (spreg, spvcm, spint, tobler, ....) then maybe we should think about putting it in libpysal?

I was thinking that, or even in pysal? It's not something "core" as weights would be, but something build "atop" the federation. Just to throw an idea out.

knaaptime assigned knaaptime and ljwolf Feb 14, 2020

knaaptime mentioned this issue Feb 14, 2020

enh: support patsy model formulas pysal/mgwr#77

Open

knaaptime added the enhancement New feature or request label Feb 14, 2020

sjsrey mentioned this issue Mar 9, 2020

ENH: start of patsy branch for discussion pysal/libpysal#260

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enh: specify models using patsy formulas #32

enh: specify models using patsy formulas #32

knaaptime commented Feb 14, 2020

lanselin commented Feb 14, 2020

knaaptime commented Feb 14, 2020

pedrovma commented Feb 14, 2020

knaaptime commented Feb 14, 2020

ljwolf commented Feb 15, 2020

darribas commented Feb 17, 2020

lanselin commented Feb 17, 2020

sjsrey commented Feb 17, 2020

darribas commented Feb 17, 2020

enh: specify models using patsy formulas #32

enh: specify models using patsy formulas #32

Comments

knaaptime commented Feb 14, 2020

lanselin commented Feb 14, 2020

knaaptime commented Feb 14, 2020

pedrovma commented Feb 14, 2020

knaaptime commented Feb 14, 2020

ljwolf commented Feb 15, 2020

darribas commented Feb 17, 2020

lanselin commented Feb 17, 2020

sjsrey commented Feb 17, 2020

darribas commented Feb 17, 2020