Maintaining the Order of DataTable Rows in Parallel Processing

Below is the code snippet:

 Parallel.ForEach(dataTable.AsEnumerable(),row => {

     // Code to process the data row to Dictionary<object,object>
     // Unique Column name is the Dictionary Key
     // ConcurrentDictionary is used for thread safety      
     });

      

Here I am using Parallel.ForEach

to process a string DataTable

for an object of type, the Dictionary<object,object>

end result is of type List<Dictionary<object,object>>

achieved with an intermediate safe structure ConcurrentQueue<Dictionary<object,object>>

, the source DataTable

sorts the data in a given order, but this is inevitably lost in parallel processing. Since order is important, so I came up with the following workaround:

Parallel.For(0,RowCount,index => {

  int rowIndex = index;

  // Access the rows using the Index
  // Final structure will be of type ConcurrentDictionary<Custom>, 
  // with the RowIndex assigned based on original index
});

Class Custom
{
  public int RowIndex { get; set; }

  public Dictionary<object,object> DataDictionary {get; set;}
}

      

The end result of the type is ConcurrentQueue<Dictionary<Custom>> customObj

processed using the following code:

customObj.OrderBy(x=>x.RowIndex).Select(y=>y.DataDictionary).ToList()

      

Below are my questions:

  • Is there a better way to achieve the same parallel processing where I can maintain the original order, which is a critical business requirement

  • In the final solution I need a local variable rowIndex

    , my understanding index

    is part of a Parallel loop and will not lead to a closure problem

Any pointers?

+3


source to share


3 answers


How about this



var items = new ConcurrentDictionary<DataRow, Dictionary<object,object>>;

Parallel.ForEach(dataTable.AsEnumerable(),row => {
    var result = ...; 
    items.Add(row, result);
});

var finalResult = dataTable.Rows.Cast<DataRow>().Select(r => items[r]).ToList());

      

+1


source


You can use PLINQ

with a method ParallelEnumerable.AsOrdered

that

Allows you to treat the data source as if it had been ordered, overriding the default unordered.



In your example, you can use it like this:

var result = dataTable.AsEnumerable().AsParallel().AsOrdered()
                      .Select(/*Process the row to dictionary*/).ToList();

      

+2


source


First, you can get Index in Parallel.ForEach instead of using Parallel.For

Parallel.ForEach(dataTable.AsEnumerable(), (line, state, index) =>
{
    Console.WriteLine("{0} : {1}", index, line);
});

      

As I see it, the main goal is to avoid OrderBy. To achieve this, create before ForLoop

var lines =  new YourClass[NumberOfElemnts] ;

      

After that, you can fill this list with any cycle you are interested in. Let's use Parallel.For

Parallel.For(0, NumberOfElemnts, i =>
    {
        lines[i]=dataTable[i];
    });

      

Edit it as per @Panagiotis Kanavos comments

0


source







All Articles