0%

机器学习相关项目

  • github.com/jpmml/sklearn2pmml
  • github.com/combust/mleap

jpmml-spark

项目地址: https://github.com/jpmml/jpmml-sparkml

项目特点:

相关博客:

Converting Apache Spark ML pipeline models to PMML documents

MLEAP

项目地址: https://github.com/combust/mleap

项目特点:

MLeap is a common serialization format and execution engine for machine learning pipelines.
It supports Spark, Scikit-learn and Tensorflow for training pipelines and exporting them to an MLeap Bundle.
Serialized pipelines (bundles) can be deserialized back into Spark for batch-mode scoring or the MLeap runtime to power realtime API services.

标榜: transform 耗时比传统的spark少1000倍以上。

相关博客:

docker启动mleap

  • Linear Regression: 0.0062 milliseconds with mleap vs 106 milliseconds with Spark LocalRelation
  • Random Forest: 0.0068 milliseconds with mleap vs 101 milliseconds with Spark LocalRelation

spark_streaming

由于 spark_streaming 是基于时间窗口进行的微批处理,

在实时环境使用模型

在本地运行 spark-web

spark-jetty-server

spark-as-service-using-embedded-server