A Survey of Machine Learning Techniques Using Spark.ml 1.5.0

This session will provide an overview of using several core algorithms and performing common machine learning operations using the preferred Pipelines architecture of the latter releases of Spark ml/mllib. This session will focus on *Scala* API's.

About This Session

Recent releases of Spark machine learning libraries have shifted focus from the individual algorithms approach of the spark.mllib package to the data-driven pipelines approach of spark.ml. We will look at how to structure ML processes of data loading, modeling, predictions, and results analysis and distribution using the latest spark.ml api's.

Note: this year's session will focus only on the scala API's.

We will touch on one or more of the algorithms in the following areas:

Dimensionality Reduction / Feature extraction
Clustering
Classification and Regression

Depending on time available we may also touch on the following topics:

Statistical tools
Data generation and randomization
Evaluators

Time: 10:45 AM Sunday Room: SC-127

The Speaker(s)

Stephen Boesch

Scala/Spark/Machine Learning Developer, Intuit

I am a developer focusing on scalable data pipelines and machine learning apps on Spark and Hadoop

A Survey of Machine Learning Techniques Using Spark.ml 1.5.0

About This Session

The Speaker(s)

Stephen Boesch

Scala/Spark/Machine Learning Developer, Intuit

Download