Real-time online data forecasting using Spark Streaming and machine learning

How to create an architecture for real-time transactional data to be classified as fraudulent or not?

Developed, trained and tested a model of a random forest ML classifier using historical data using Scala and Spark MLLib and saved.

Real-time transaction data gets using Apache Kafka from one topic, and Spark Streaming is processed and written to another topic for prediction using the ML class model.

My concern: How can I provide and receive the predicted current transaction data obtained from the Kafka topic using the ML mode mentioned above?

What is the best practice for getting predictable current running transactions in real time using an already prepared and testable ML model?

Any design suggestions are welcome.

+3


source to share


1 answer


You can save the model after training and use it in real time api for prediction. For example, for example https://databricks.gitbooks.io/databricks-spark-reference-applications/content/twitter_classifier/predict.html Another solution could be to use ball water and use POJO: https://github.com/h2oai/ sparkling-water / tree / master / examples # step-by-step-through-weather-data-example



0


source







All Articles