Specialized or generic streams

Specialized or generic streams.

Hi I am working on a system where objects go through several steps.

the first. Basically database queries

second. Basically hd parsing I / O and xml

third. Basically Webservice communication

fourth. Mostly Xml serialization and deserialization

fifth. Some extra work

The system needs to handle thousands of objects per hour, so I will be using many threads, but my question is, what's the best approach?

  • Some specialized threads for each step: like 5 threads in each step, some threads receive objects in the 1st step, work on them, update the status on these objects, so another specialized thread in the second step receives these objects and work on it.

  • All generalized streams, each stream receives some object from the first step and goes to the end of step 5.

+2


source to share


3 answers


Coincidentally, we had some similar discussion a while ago. We have come to these questions that must be taken into account before making a decision:

  • How long does the middle step take? β†’ If each step takes minimal time, then it is generally better to have one thread doing all the steps or the context switch becomes a service
  • Is each step high-domain? β†’ If so, it is better to keep them in separate threads. While people might argue that it is enough to just separate the execution code, I don't always do it that way. Ex. a particular thread may require special or higher priority.
  • The cost of context switches? β†’ No need to explain
  • Threading Model and Resources -> Eg. your system has run out of threads and has a higher priority request. Will you keep the priorities low for this request?


There are a few more points I will add to the comments as I remember !!

+1


source


Some things to consider



  • Failure Mode: What happens if a step fails? Do you need to redo or undo the completed work? Will there be failure modes where threads die and recreate, or will threads live forever? If the work needs to be repeated, then specialized threads make more sense, since on failure, objects become available in the queue.

  • Coordination. In a script object, custom threads will usually run longer in a general framework if the steps are strictly sequential, which can hurt your bandwidth. Thus, if all stages are strictly sequential, it is easier to have generalized flows to facilitate coordination of efforts.

+2


source


I can imagine that you can limit the number of concurrent DB and WS calls, so you can benefit from having different levels of consensus at different stages along your pipeline. So I might well consider using specialists. This will increase the overall complexity of the solution. So I would first start by building and performance testing the Generalist approach. If you get the bandwidth you want, then keep it simple, leave it alone.

+1


source







All Articles