- github.com/jpmml/sklearn2pmml
- github.com/combust/mleap
jpmml-spark
项目地址: https://github.com/jpmml/jpmml-sparkml
项目特点:
相关博客:
Converting Apache Spark ML pipeline models to PMML documents
MLEAP
项目地址: https://github.com/combust/mleap
项目特点:
MLeap is a common serialization format and execution engine for machine learning pipelines.
It supports Spark, Scikit-learn and Tensorflow for training pipelines and exporting them to an MLeap Bundle.
Serialized pipelines (bundles) can be deserialized back into Spark for batch-mode scoring or the MLeap runtime to power realtime API services.
标榜: transform 耗时比传统的spark少1000倍以上。
相关博客:
- Linear Regression: 0.0062 milliseconds with mleap vs 106 milliseconds with Spark LocalRelation
- Random Forest: 0.0068 milliseconds with mleap vs 101 milliseconds with Spark LocalRelation
spark_streaming
由于 spark_streaming 是基于时间窗口进行的微批处理,