Releases: KxSystems/automl
Releases · KxSystems/automl
Patch release for drafting of updated docker image
Update to reflect ML Toolkit refactor (#17) * update to new ml format * update ml functions for new refactor * feb 3rd automl code review * march 3rd code review * update to infreplace * response to comments on automl * update test to run on windows * changes after comments part 2 * second review of comments * reply to latest comments * review * predict - > transform * resolved comments Co-authored-by: andrewmorrison1 <[email protected]>
0.4.0
- Refactor coding/commenting style to be up to date with coding standards
- AutoML now requires ML Toolkit >=3.0. This change was necessary as the function signatures have been changed significantly
Refactor, api update and functional additions
- Complete backend change to AutoML.
- Framework now uses a directed acyclic graphing and pipelining structure provided by the ML toolkits last release to define the code base. This dramatically improves code cleanliness and makes modifying and expanding the code base significantly easier.
- Addition of command line interface option for AutoML allowing configuration for the session to be updated or complete run and exit to be undertaken
- Fitting model and predicting now uses a .automl.fit function which returns a dictionary containing the predict function call, this replaces the .automl.new functionality which required users to retrieve fit models from disk on each invocation.
- To retrieve models from disk for use the .automl.getModel function is provided which will return a dictionary containing the predict function as one of its keys.
- This model retrieval finds the prevailing model s.t. if the latest model needs to be found you can pass in current date/time in the appropriate format. Retrieval by this method can also retrieve named models.
- Models can now be named rather than just dated and timed.
- A function .automl.deleteModels is provided to allow individual models or regex matching string representations of models to be deleted.
- Support added for Theano models
- The stdout printing of AutoML can now be logged to the outputs folder associated with a run or redirected to a user defined location.
- Some warnings/errors are now flexible, for example previously data with > 10000 targets would remove fitting of neural networks, this can now be ignored or the number of targets modified.
- 3 warning levels are provided, ignore everything = 0, tell me the action that you're taking and continue anyway = 1, raise an appropriate error = 2
- All configuration that a user may be required to change is now defined using JSON, this includes
- Models which are to be applied
- Hyperparameter sets for applied models
- Scoring functions supported and the expected ordering of these
- Any updates to default parameters which a user wants to persist for the entire process, run command line or use a custom configuration for.
There are a number of other changes and the above is only a brief overview, more in depth explanations of functionality will be provided in documentation.
Initial release of v0.2-beta
What's New:
Natural Language Processing:
- Feature engineering techniques to transform text data into appropriate numerical representations for the application of the machine learning algorithms provided. Techniques include:
- Named entity recognition
- Sentiment analysis
- word2vec embedding
- Stop word/part of speech/numerical decomposition
Grid Search:
- Provide the ability to change the grid search methodology from exhaustive grid search to random/pseudo random (Sobol) grid searching
Report Generation:
- Default report generation will now produce LaTeX reports rather than reportlab. These reports are generated using pyLatex and rely on a user having the appropriate compilers installed so report generation will fail through to reportlab.
PyTorch:
- Support to allow users test their own PyTorch models against the models provided by default
Initial Release
Kx AutoML builds upon our existing machine learning libraries (particularly the ML Toolkit and FRESH libraries) to provides a full end-to-end ML workflow for users.
This automates the entire task of applying machine learning solutions to real-world problems, from the raw dataset to the deployment of an optimized model.
Features include
- Data preprocessing and encoding
- Feature generation (including time-series features via FRESH)
- Feature significance testing
- Selection of models applicable to the data available and task at hand
- Training and validation of models
- Selection of optimal models, based on statistical scoring metrics
- Hyperparameter tuning based on grid search methods
- Model and final report generation
- Options to modify and extend workflows, including bring-your-own algorithms and scoring functions