Flink or spark? when streaming is not important

I recently compared Spark and Flink for a completely new project. In this project, the streaming function is not that important. Batch analysis of ~ 90 TB is the most important. Later, I will apply ML and data mining in data analysis.

When I search, I find many articles, presentations, and videos claiming Flink is the next generation analytics solution. I don't see many articles protecting Spark. On the other hand, Spark is (or was?) Very popular and widely used in a very large production system.

My question is , for my use case, i.e. streaming is not important, should I adopt Flink or start with Spark 2?

By the way, I read this thread . This doesn't give me a good answer.

Update April 2018 : We end up choosing Spark. There are obviously other issues besides performance. Cloudera, Hortonworks, and HDInsight provide a good level of confidence / proof of security, stability, scale, roadmap, etc. For corporate architects and security reviewers.

+3


source to share


1 answer


As per your requirement Apache Spark is the best . Both Iskra and Flink are advanced data processing technologies. In terms of features , stability , ecosystem , community , with other systems, and adaptability, Spark is far ahead of Flink .

The main difference between Spark and Flink : Spark is a batch processing system and it has a streaming abstraction, whereas Flink is a flow system for processing unlimited datasets and has a batch > abstraction processing system for processing limited datasets in batch.



Spark is best suited for ETL , machine learning, streaming, data storage, and graphing on large volumes of datasets. Flink is best suited to stream processing for large and unlimited datasets.

[Apache Flink] [Apache Spark]

+2


source







All Articles