Machine Learning Pipelines and Models need to be managed through a robust system.

DURABLE Data pipeline need to be
-- Discoverable
-- Understandable
-- Repeatable
-- Accurate
-- Bottoms Up
-- Lineage Aware

DURABLE Models are just not a combination of linear coefficients.
— Models need to be stored, searchable, well-documented and continuously reviewed.
— Usually a DSL on Models help access list of features, score calibrations, link functions, feature transformations

Continuous Experimentation
Feature Selection Automation:
— Data Mntc (versioning and dockerized containers) : DOMINO
— Versioned data and results (encapsulated in a Docker Container)
Automation encourages Experimentation:
— easy baseline
— try different variations
— regular retraining

Model Monitoring
– Periodically check if model-performance is dropping due to changes in underlying domain data

Metric Calculations

A/B Testing , match offline metrics with online metrics.

(A) ALATION speeds up Data Preparation by offering a Data Catalog management system.

b1

(B) TENSORFLOW help distribute parameters across servers and maintain data parallelism as models and , data, parameters reside in separate servers.

b2

It offers a Serving mechanism for continuously training pipelines.

(C) Domino Labs offer a comprehensive suite of Continuous Experimentation, Model Management and Metrics Calculation Platform
b3

 

b4

 

 

 

 

 

 

 

(D) Netflix has developed a system for managing and running Machine Learning Pipeline using Mesos ~ http://techblog.netflix.com/2016/05/meson_31.html

A heterogeneous workload of Spark, Python, R & Scala tasks run thousands of computations concurrently on an elastic Mesos cluster of hundreds of nodes.

~ https://www.youtube.com/watch?v=UyjUf1xT6Qg
~ http://techblog.netflix.com/2016/02/distributed-time-travel-for-feature.html

 

 

 

Advertisements