Should I queue my events after receiving them from the Azure Event Hub?
I am currently developing an Azure hosted application that uses the Azure Event Hub. Basically I am sending messages (or should I say events) to the Event Hub from the web API and I have two listeners:
- Stream Analytics task for real-time analysis
- a standard worker role that calculates some things based on received events and then stores them in an Azure SQL Database (this is a lambda architecture).
I am currently using the EventProcessorHost library to retrieve my events from the Event Hub inside my worker role.
I'm trying to find some guidelines for using the Event Hub (it's a little harder to use Event Hubs than Service Bus queues, i.e. message streaming), and I've found some people saying I shouldn't be doing much post-event processing
from my Event Hub .
Keep in mind that you want to keep what you are doing relatively quickly - i.e. don't try to do many processes here - that's for consumer groups.
The author of this article added a queue between the Event Hub and the worker role (it's not clear from the comments if this is really necessary or not).
So the question is, should I do all my processing directly after the Event Hub (i.e. inside a method of
) , or should I use a queue between the Event Hub and the processing stuff ?
Any advice on how to properly consume events from the Event Hub would be appreciated, the documentation is currently a bit ... missing.
source to share
This falls into the category of questions, the answer to which will be much more obvious once the source of the EventProcessorHost becomes available, which I was told this would happen.
The short answer is that you don't need to use a queue; however, I would save the ProcessEventsAsync time to get the task back relatively short.
While this advice sounds the same as in the first article , the key difference is that it is time to return the task, not the time to complete the task. My guess was that ProcessEventsAsync is being called on the thread used by EventProcessorHost for other purposes. In this case, you need to return quickly to keep other work going; this work might call ProcessEventsAsync for another section (but we won't know without debugging, I haven't seen fit to do or read the code when it's available).
I am doing my processing in a separate thread per section, passing the entire IEnumerable from ProcessEventsAsync. This is in contrast to pulling all elements from the IEnumerable and enqueuing them for the processing thread. Another thread completes the task returned by ProcessEventsAsync when it has finished processing messages. (I actually pass my processing thread just one IEnumerable, which hides the details of ProcessEventsAsync, concatenating the chunks together and completing the task as needed when MoveNext is called.)
So, in short: in ProcessEventsAsync, transfer the work to another thread, or you already knew how to communicate or start a new task using TPL.
Queuing all messages inside ProcessEventsAsync is not bad, it just isn't the most efficient way to pass a chunk of events to another thread.
If you choose to queue events (OR have a downstream queue in your processing code) and perform a task for the batch, you must make sure that you limit the number of items you have left in the code / queue to avoid running out of memory in case, when EventHub provides you with items faster than your code can handle them due to the traffic spike.
Note to Java EventHub users 2016-10-27: Since it got my attention there is this description describing how onEvents is called, and slower onEvents won't be tragic as it is in a stream per section, its speed appears to affect the speed. from which the next batch is received. So, depending on how much you care about low enough latency, this can be relatively important to your scenario.
source to share