v0.14
mhamilton723
released this
18 Jul 02:17
·
1287 commits
to master
since this release
New Features
- The Cognitive Services on Spark: A simple and scalable integration between the Microsoft Cognitive Services and SparkML
- Bing Image Search
- Computer Vision: OCR, Recognize Text, Recognize Domain Specific Content,
Analyze Image, Generate Thumbnails - Text Analytics: Language Detector, Entity Detector, Key Phrase Extractor,
Sentiment Detector, Named Entity Recognition - Face: Detect, Find Similar, Identify, Group, Verify
- Added distributed model interpretability with LIME on Spark
- 100x lower latencies (<1ms) with Spark Serving
- Expanded Spark Serving to cover the full HTTP protocol
- Added the
SuperpixelTransformer
for segmenting images - Added a Fluent API,
mlTransform
andmlFit
, for composing pipelines more elegantly
New Examples
- Chain together cognitive services to understand the feelings of your favorite celebrities with
CognitiveServices - Celebrity Quote Analysis.ipynb
- Explore how you can use Bing Image Search and Distributed Model Interpretability to get an Object Detection system without labeling any data in
ModelInterpretation - Snow Leopard Detection.ipynb
- See how to deploy any spark computation as a Web service on any Spark platform with the
SparkServing - Deploying a Classifier.ipynb
notebook
Updates and Improvements
LightGBM
- More APIs for loading LightGBM Native Models
- LightGBM training checkpointing and continuation
- Added tweedie variance power to LightGBM
- Added early stopping to lightGBM
- Added feature importances to LightGBM
- Added a PMML exporter for LightGBM on Spark
HTTP on Spark
- Added the
VectorizableParam
for creating column parameterizable inputs - Added
handler
parameter added to HTTP services - HTTP on Spark now propagates nulls robustly
Version Bumps
- Updated to Spark 2.3.1
- LightGBM version update to 2.1.250
Misc
- Added Vagrantfile for easy windows developer setup
- Improved Image Reader fault tolerance
- Reorganized Examples into Topics
- Generalized Image Featurizer and other Image based code to handle Binary Files as well as Spark Images
- Added
ModelDownloader
R wrapper - Added
getBestModel
andgetBestModelInfo
toTuneHyperparameters
- Expanded Binary File Reading APIs
- Added
Explode
andLambda
transformers - Added
SparkBindings
trait for automating spark binding creation - Added retries and timeouts to
ModelDownloader
- Added
ResizeImageTransformer
to removeImageFeaturizer
dependence on OpenCV
Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark. (In alphabetical order)
- Abhiram Eswaran, Anand Raman, Ari Green, Arvind Krishnaa Jagannathan, Ben Brodsky, Casey Hong, Courtney Cochrane, Henrik Frystyk Nielsen, Ilya Matiach, Janhavi Suresh Mahajan, Jaya Susan Mathew, Karthik Rajendran, Mario Inchiosa, Minsoo Thigpen, Soundar Srinivasan, Sudarshan Raghunathan, @terrytangyuan