Skip to content

Latest commit

 

History

History
78 lines (53 loc) · 3.3 KB

README.rst

File metadata and controls

78 lines (53 loc) · 3.3 KB

Travis AppVeyor Codecov CircleCI ReadTheDocs

sklearn-lmer - Scikit-learn estimator wrappers for pymer4 wrapped LME4 mixed effects models

sklearn-lmer is a simple package to wrap the convienience of pymer4's lme4 wrapping in a mostly sklearn compatible regressor class.

Refer to the documentation for examples and api.

Linear mixed effects regressions

Linear mixed effects regressions are great, but if you're here, you probably already agree. You can find more infomration about them elsewhere, the links lme4 aren't a bad place to start.

Installation

Mixing r and python used to be a bit more fraught, but rpy2 and conda seem to be working together better these days. To install first get a conda environment with the dependencies:

>>> conda create -n sklmer -c conda-forge numpy scipy rpy2 r-lme4 r-lmertest r-lsmeans tzlocal

Then pip install sklearn-lmer:

>>> pip install sklearn-lmer

Usage

It can be imported as:

>>> from sklmer import LmerRegressor

Now the mostly part of that compatiblity is that init does have two required paramters: a formula and the names of the columns holding independent variables and grouping variables (I've called this parameter X_cols even though it is more than just X). When I use this I've got my data in a dataframe and just pass dataframe.columns with X_cols like so

>>> df = pd.read_csv(os.path.join(get_resource_path(),'sample_data.csv'))
>>> lreg = LmerRegressor('DV ~ IV2 + (IV2|Group)', X_cols=df.columns)

If you want the best compatibility with sklearn it probably makes sense to split out the dataframe into X, y, and group variables, though since you've defined a formula it's ok if the y and group columns are in X

>>> X = df.values
>>> y = df.DV.values
>>> groups = df.Group.values

Once you've done that, it seems to work fine with other sklearn tools, like cross_val_score

>>> logo = LeaveOneGroupOut()
>>> cross_val_score(lreg, X=X, y=y, cv=logo.split(X, groups=groups), scoring='neg_mean_squared_error')