How do I move the stream of events to cold storage?
I have a stream of events (we could also call them "messages" or even just "data") coming from a timed update event broker. An event broker can be Kafka or Amazon Kinesis or Microsoft Event Hubs , although let's say it's Kafka.
My goal is to take this stream of events and put it in cold storage; that is, store data for future analysis via Hadoop / Spark. This means that I would like to take this "chat" stream of events and convert it to "short" files in HDFS. In a cloud environment, I would rather use S3 or Azure Storage instead of HDFS.
I would also like my solution to be cost effective; for example using serialization formats like Avro / ORC to reduce the cost of disk space. I also remind you that this event is stored in cold storage (bonus points at a time and only once).
My main questions are:
- How do people solve this problem?
- Are there components that already handle this scenario?
- Do I need to develop a solution myself?
- At the very least, are they recommended templates?
source to share