Aggregator / middle tier structure query for expensive queries

I am working on a program that will have multiple threads requiring information from a web service that can handle requests such as "Give me [Var1, Var2, Var3]

for [Object1, Object2, ... Object20]

"

and the resulting answer will give me in this case 20-node XML (one for each object), each node with three sub-nodes (one for each var).

My challenge is that every request made from this web service costs the organization's money and whether it is for 1 var for 1 object or 20 vars for 20 objects, the cost is the same.

So, in this case, I'm looking for an architecture that will:

  • Create a request in each thread as needed.
  • You have a middle aggregator that receives all queries
  • After the X number of requests has been aggregated (or the time limit has been reached), the middle tier makes one web service request
  • The middle tier receives a response from the web service
  • The middle layers convey information back to pending objects

My thoughts currently are to use a library like NetMQ with my middle tier as a server and each thread as a poll, but I'm looping over the actual implementation and before going too far down the rabbit hole. I hope there is already a design template / library out there that does this significantly more efficiently than I understand.

Please understand that I am a nob, and thus ANY help / guidance would be greatly appreciated!

Thank!!!

+3


source to share


2 answers


Overview

From an architectural point of view, you just sketched a good approach to the problem:

  • Insert a proxy server between the requesting applications and the remote web service
  • In the proxy server, place requests in the request queue until at least one of the following events occurs
    • The request queue reaches the specified length
    • The oldest request in the request queue reaches a certain age
  • Group all requests in a request queue in one request by removing duplicate objects or attributes
  • Send request for remote web service
  • Move requests to the waiting queue (waiting)
  • Wait for a response until one of the following occurs:
    • the oldest request in the response queue reaches a certain age (timeout)
    • the answer comes
  • Get a response (if applicable) and match it to the corresponding requests in the response queue
  • Respond to all requests in the response queue that have an answer
  • Submit a timeout error for all requests exceeding the timeout limit
  • Remove all reply requests from the reply queue

Technology

You probably won't find a finished product or frame that exactly matches your requirements. But there are several structural / architectural patterns that you can use to build your solution.

C #: RX and LINQ

If you want to use C #, you can use reactive extensions to get time and grouping rights.

You can then use LINQ to select attributes from the queries to construct the response, and select the queries in the response queue that match to a specific part of the response or time out.

Scala / Java: Akka



You can model the solution as a system of actors using multiple participants:

  • Actor as a gateway for requests
  • Actor containing the request queue
  • Actor sends a request to a remote web service and receives a response
  • Actor containing the response queue
  • Actor sending replies or timeouts

The actor system makes it easy to deal with concurrency and separates issues in a testable way.

When using Scala you can use it "monadic" API-interface Collection ( filter

, map

, flatMap

) to do basically the same thing and LINQ in C # approach.

The acting approach really shines when you want to test individual elements. It is very easy to check each actor separately , without having to mock the whole workflow.

Erlang / Elixir: Actor System

This is similar to Akka's approach, only with a different (functional!) Language. Erlang / Elixir has a lot of support for distributed actor systems, so when you need an ultra stable or scalable solution, you should look into this one.

NetMQ / ZeroMQ

This is probably too low level and results in multiple infrastructures. When you are using an actor system, you can try to include NetMQ / ZeroMQ as your transport system.

+1


source


Your idea of ​​using a queue looks good to me.

This is one possible solution to your problem, and I'm sure there are many other solutions that might do what you need.

  • Have a Publish Queue (PQ) and a Consumption Queue (CQ)
  • Clients subscribe to CQ and MT subscribe to PQ
  • Clients post requests for PQ
  • MT Listens for PQ, aggregates requests and sends the farm to the stream
  • Once the results are back, this thread splits the results into a req / res pair
  • Then he posts req / res pairs for CQ
  • Each client chooses the correct message and processes it

Long (er) version:

Have your "middle tier" listen to the queue (to which clients post messages) and aggregates requests until N requests have passed or X amount of time has passed.



You are ready, offload the aggregated query to a stream to call your farm and get the results. A big problem is likely to arise when you need to communicate this to customers.

To do this, you probably need a different queue that all of your clients are subscribed to, and once your batch of results is ready (say 20 responses in XML) from the farm, the thread that called the farm separates the XML results into its corresponding query / response and publish to this queue. Each client needs to pick the correct request / response pair from the queue and process it.

This will not be a web service in the traditional sense, as the latency can be prohibitively long and you do not want to maintain a connection, so I suggest a queue.

You can also have your consumer queue per topic, meaning you only publish req / res pairs to the consumer who asked for it, and don't broadcast it (so the client doesn't have to "choose the correct" req / res "s. It will take care of it by name topics) Almost all queues support this.

+1


source







All Articles