Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could keras-pandas work with Dask? #107

Open
solalatus opened this issue Mar 16, 2019 · 2 comments
Open

Could keras-pandas work with Dask? #107

solalatus opened this issue Mar 16, 2019 · 2 comments

Comments

@solalatus
Copy link

solalatus commented Mar 16, 2019

I am very fond of the "humane" approach of keras-pandas to get tabular data into DL models. As far as I see, in TF Pandas is still a poster-child. (No obvious reference to it in 2.0 docs yet.) So I believe, for large, bigger than memory datasets processed with Pandas like tools there is no good solution in sight from TF side.
I was wondering, if keras-pandas could work with a Dask DF instead of a Pandas one, so as to be able to scale to bigger datasets?
This maybe would make a good use case here: dask/dask-ml#268 (Though not just Dask-ML, but more of a general Dask DF question...)

What are your thoughts?

Thanks for the input and for this great lib, I am spreading the word about it! 👍

@bjherger
Copy link
Owner

Hey, great idea! Dask support shouldn't be too difficult, and changing the internal data type from pandas to dask would allow easy data scaling, and support for both pandas and dask.

I'd be happy to support a PR if @solalatus or someone else has the time to put together code to replace pandas w/ dask internally, and provide support for both pandas and dask.

@Sammi-Smith
Copy link

@bjherger Is Dask support still a work in progress? Any suggestions for what to do in the meantime to be able to use the features of keras-pandas with Pandas DFs that are considerably too big to fit into memory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants