Real-time online data forecasting using Spark Streaming and machine learning

Question

Real-time online data forecasting using Spark Streaming and machine learning

How to create an architecture for real-time transactional data to be classified as fraudulent or not?

Developed, trained and tested a model of a random forest ML classifier using historical data using Scala and Spark MLLib and saved.

Real-time transaction data gets using Apache Kafka from one topic, and Spark Streaming is processed and written to another topic for prediction using the ML class model.

My concern: How can I provide and receive the predicted current transaction data obtained from the Kafka topic using the ML mode mentioned above?

What is the best practice for getting predictable current running transactions in real time using an already prepared and testable ML model?

Any design suggestions are welcome.

+3

random-forest apache-spark apache-spark-mllib spark-streaming

Gopinathan KM 30 jul. 17 at 1:54

source to share

1 answer

gosanjeev · Answer 1 · 2017-08-01T20:36:53+0000

You can save the model after training and use it in real time api for prediction. For example, for example https://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/predict.html Another solution could be to use ball water and use POJO: https://github.com/h2oai/ sparkling-water / tree / master / examples # step-by-step-through-weather-data-example

Real-time online data forecasting using Spark Streaming and machine learning

More articles: