How to model service dependencies in a microservice architecture?
We are trying to build our system as untethered as possible. Ideally, we would like microservices to do one thing and one thing well. They don't have to know about dependencies. They have to get a job from the queue, complete the job, and somehow emit a job completed event (I'll come back to that).
Our system contains " Images " (images) as the basic atomic unit. " Event " is a grouping of snapshots with a maximum length of 5 minutes.
Once we get the snapshots on our system and figure out which event they belong to, we put those snapshots into a RabbitMQ instance for some analysis of the image to be done. Then we have the " snapshot-analyzer " microservices pulling this queue and doing image analysis. These microservices write directly to the database adding some more metadata to the image objects. They are also stateless and scale easily horizontally.
The problem is that there are tasks that need to be done AFTER the snapshot parser finishes . If we find certain attributes in the snapshot, we want to work on that event with the "event-analyzer" . We don't want to do work on this event more than once (so if multiple snapshots have these attributes, it doesn't matter - we still just want to do work on the event once). This is quite difficult to design, especially in a distributed environment where we have several of these image analyzers going out of line. What we are currently doing is if we find these attributes in a snapshot (which means we want to work on an Event containing that snapshot), we write this for the event. if this is the first time that is written to the event, we insert it into our second queue to handle events. This ensures that the event is only queued at most once.
The problems with the above approach are as follows:
- The dependency between the snapshot analyzer and the event analyzer is inside the snapshot analyzer. Ideally, I would like the snapshot parser to be unaware of the parser event. He should just do the job, and not bother to lure anything. I'm not sure where this dependency is supposed to be coded.
- Compute the queue in an event when multiple snapshots for that event are processed at the same time. Once the payback is for the event, just once. We are "abusing" MongoDB atomic update returning whether it was successful or not when $ set is called.
Does anyone have any thoughts or examples of how such dependencies are declared? Do I need a dispatch service that is responsible for queuing in the right things and pulling from a given queue or something like that.
source to share
Ultimately, your problem is the need for global synchronization of the distributed processing system. This is a very old problem, and most people fix it exactly the way you fix it, using the built-in capabilities of their databases to handle distributed system synchronization. There are many other methodologies out there, but if you're already using a piece of infrastructure that does this well (and most databases), then go ahead and use it.
I would say that for another issue (decomposition snapshot parser from event parser) you either need to make snapshot parser aware of the requirements only to parse the event once (like you) or have the parser event aware of the requirement. If you have a snapshot parser, just blindly queue messages for the event parser, and the event parser is the one that works with the database, to avoid double processing, you will nicely encapsulate this requirement with the caveat of adding extra messages to the queue ... This has the bonus of having a fading point where you can accumulate these things in memory at a single gate point and don't have to make external calls to the database.
source to share