100,000). Doing this pro...">

How to use threads to handle many tasks

I have a C # requirement to handle "very many" records individually (possibly> 100,000). Doing this process consistently turns out to be very slow with each write taking a second second or so (with a timeout error set to 5 seconds).

I would like to try to accomplish these tasks asynchronously using a specific number of worker threads (I use the term "thread" here with caution since I'm not sure if I should be looking for a thread or a task or whatever).

I looked at ThreadPool

but I can't imagine that it can enqueue the volume of requested requests. My ideal pseudo code would look something like this ...

public void ProcessRecords() {
    SetMaxNumberOfThreads(20);
    MyRecord rec;
    while ((rec = GetNextRecord()) != null) {
        var task = WaitForNextAvailableThreadFromPool(ProcessRecord(rec));
        task.Start()
    }
}

      

I will also need a mechanism that the handler method can communicate to the parent / calling class.

Can anyone point me in the right direction, perhaps with some sample code?

+3


source to share


2 answers


A possible simple solution would be to use a TPL dataflow block, which is a higher abstraction over TPL with configurations for degree of parallelism, etc. You just create a block ( ActionBlock

in this case), Post

everything to it, wait asynchronously for completion, and TPL Dataflow handles the rest for you:

var block = new ActionBlock<MyRecord>(
    rec => ProcessRecord(rec), 
    new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});

MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
     block.Post(rec);
}

block.Complete();
await block.Completion

      

Another advantage is that the block starts working as soon as the first record arrives, and not only after all records have been received.



If you need to report each record, you can use TransformBlock

to do the actual processing and link to it ActionBlock

that does the updates:

var transform = new TransfromBlock<MyRecord, Report>(rec =>
{
    ProcessRecord(rec);
    return GenerateReport(rec);
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});

var reporter = new ActionBlock<Report>(report =>
{
    RaiseEvent(report) // Or any other mechanism...
});

transform.LinkTo(reporter, new DataflowLinkOptions { PropagateCompletion = true });

MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
     transform.Post(rec);
}

transform.Complete();
await transform.Completion

      

+4


source


Have you thought about using parallel processing with Actions? those. create a way to handle one record, add each write method as an action to the list, and then execute parrallel.for on the list.

Dim list As New List(Of Action)
list.Add(New Action(Sub() MyMethod(myParameter)))
Parallel.ForEach(list, Sub(t) t.Invoke())

      



It's at vb.net, but I think you are getting the gist.

+1


source







All Articles