Component design for both producer and consumer in Kafka
as the main components of my data pipeline, which processes thousands of requests every second. I use it
as a real-time data processing tool for small transformations that I need to do to the data.
My problem is that one of my consumers (say
) is consuming multiple topics from
and handling them. Basically, a summary of the shared topics is created. I also want to push this data to Kafka as a separate topic, but this creates a loop on Kafka and my component.
This is what worries me, is this the desired architecture in Kafka?
Should I do all processing in
and store only the digested (summary) information in
. But the amount of processing I'm going to do is pretty heavy, so I want to use a separate component (
) for it . I think my question can be generalized to all kinds of data pipelines.
How do I properly use the consumer and producer component in the data pipeline?
source to share
As long as Samza writes on different topics, what he consumes, no, there will be no problems. Samza works that are read and written by Kafka are the norm and are designed by the architecture. It is also possible to have Samza jobs that bring in some data from another system, or jobs that write some data from Kafka to another system (or even jobs that don't use Kafka at all).
With work read and written on the same topic, there is, however, where you get the loop and should be avoided. It has the ability to quickly fill your Kafka brokerage disks.
source to share