Skip to content

v0.14

Compare
Choose a tag to compare
@mhamilton723 mhamilton723 released this 18 Jul 02:17
· 1287 commits to master since this release

New Features

  • The Cognitive Services on Spark: A simple and scalable integration between the Microsoft Cognitive Services and SparkML
    • Bing Image Search
    • Computer Vision: OCR, Recognize Text, Recognize Domain Specific Content,
      Analyze Image, Generate Thumbnails
    • Text Analytics: Language Detector, Entity Detector, Key Phrase Extractor,
      Sentiment Detector, Named Entity Recognition
    • Face: Detect, Find Similar, Identify, Group, Verify
  • Added distributed model interpretability with LIME on Spark
  • 100x lower latencies (<1ms) with Spark Serving
  • Expanded Spark Serving to cover the full HTTP protocol
  • Added the SuperpixelTransformer for segmenting images
  • Added a Fluent API, mlTransform and mlFit, for composing pipelines more elegantly

New Examples

  • Chain together cognitive services to understand the feelings of your favorite celebrities with CognitiveServices - Celebrity Quote Analysis.ipynb
  • Explore how you can use Bing Image Search and Distributed Model Interpretability to get an Object Detection system without labeling any data in ModelInterpretation - Snow Leopard Detection.ipynb
  • See how to deploy any spark computation as a Web service on any Spark platform with the SparkServing - Deploying a Classifier.ipynb notebook

Updates and Improvements

LightGBM

  • More APIs for loading LightGBM Native Models
  • LightGBM training checkpointing and continuation
  • Added tweedie variance power to LightGBM
  • Added early stopping to lightGBM
  • Added feature importances to LightGBM
  • Added a PMML exporter for LightGBM on Spark

HTTP on Spark

  • Added the VectorizableParam for creating column parameterizable inputs
  • Added handler parameter added to HTTP services
  • HTTP on Spark now propagates nulls robustly

Version Bumps

  • Updated to Spark 2.3.1
  • LightGBM version update to 2.1.250

Misc

  • Added Vagrantfile for easy windows developer setup
  • Improved Image Reader fault tolerance
  • Reorganized Examples into Topics
  • Generalized Image Featurizer and other Image based code to handle Binary Files as well as Spark Images
  • Added ModelDownloader R wrapper
  • Added getBestModel and getBestModelInfo to TuneHyperparameters
  • Expanded Binary File Reading APIs
  • Added Explode and Lambda transformers
  • Added SparkBindings trait for automating spark binding creation
  • Added retries and timeouts to ModelDownloader
  • Added ResizeImageTransformer to remove ImageFeaturizer dependence on OpenCV

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark. (In alphabetical order)

  • Abhiram Eswaran, Anand Raman, Ari Green, Arvind Krishnaa Jagannathan, Ben Brodsky, Casey Hong, Courtney Cochrane, Henrik Frystyk Nielsen, Ilya Matiach, Janhavi Suresh Mahajan, Jaya Susan Mathew, Karthik Rajendran, Mario Inchiosa, Minsoo Thigpen, Soundar Srinivasan, Sudarshan Raghunathan, @terrytangyuan