Skip to content

Using the R Metaflow bindings to make NLP model tuning a little less painful

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

mstei4176/NLPRMetaflow

 
 

Repository files navigation

NLPRMetaflow

This repository accompanies my blog post Using Metaflow to Make Model Tuning Less Painful.

I have a machine learning model that takes some time to train. Data pre-processing and model fitting can take 15–20 minutes. That’s not so bad, but I also want to tune my model to make sure I’m using the best hyper-parameters. With 16 different combinations of hyperparameters and 5-fold cross-validation, my 20 minutes can become a day or more.

Metaflow is an open-source tool from the folks at Netflix that can be used to make this process less painful. It lets me choose which parts of my model training flow I want to execute on the cloud. To speed things up I’m going to ask Metaflow to spin up enough compute resources so that every hyperparameter combination can be evaluated in parallel in separate environments.

The best part is that my flow is pure R code.

Requirements

This repository uses Urban Dictionary data available on Kaggle. The CSV should be copied into a data/ directory.

To run the flow on AWS Batch, the appropriate resources must exist and be configured with Metaflow. The ecr_repository value should point to an image built by the Dockerfile in this repository. Alternatively, the flow can be run locally by removing the "batch" decorators from all steps.

About

Using the R Metaflow bindings to make NLP model tuning a little less painful

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 93.3%
  • Dockerfile 6.7%