Maintaining the Order of DataTable Rows in Parallel Processing
Below is the code snippet:
Parallel.ForEach(dataTable.AsEnumerable(),row => {
// Code to process the data row to Dictionary<object,object>
// Unique Column name is the Dictionary Key
// ConcurrentDictionary is used for thread safety
});
Here I am using Parallel.ForEach
to process a string DataTable
for an object of type, the Dictionary<object,object>
end result is of type List<Dictionary<object,object>>
achieved with an intermediate safe structure ConcurrentQueue<Dictionary<object,object>>
, the source DataTable
sorts the data in a given order, but this is inevitably lost in parallel processing. Since order is important, so I came up with the following workaround:
Parallel.For(0,RowCount,index => {
int rowIndex = index;
// Access the rows using the Index
// Final structure will be of type ConcurrentDictionary<Custom>,
// with the RowIndex assigned based on original index
});
Class Custom
{
public int RowIndex { get; set; }
public Dictionary<object,object> DataDictionary {get; set;}
}
The end result of the type is ConcurrentQueue<Dictionary<Custom>> customObj
processed using the following code:
customObj.OrderBy(x=>x.RowIndex).Select(y=>y.DataDictionary).ToList()
Below are my questions:
-
Is there a better way to achieve the same parallel processing where I can maintain the original order, which is a critical business requirement
-
In the final solution I need a local variable
rowIndex
, my understandingindex
is part of a Parallel loop and will not lead to a closure problem
Any pointers?
source to share
You can use PLINQ
with a method ParallelEnumerable.AsOrdered
that
Allows you to treat the data source as if it had been ordered, overriding the default unordered.
In your example, you can use it like this:
var result = dataTable.AsEnumerable().AsParallel().AsOrdered()
.Select(/*Process the row to dictionary*/).ToList();
source to share
First, you can get Index in Parallel.ForEach instead of using Parallel.For
Parallel.ForEach(dataTable.AsEnumerable(), (line, state, index) =>
{
Console.WriteLine("{0} : {1}", index, line);
});
As I see it, the main goal is to avoid OrderBy. To achieve this, create before ForLoop
var lines = new YourClass[NumberOfElemnts] ;
After that, you can fill this list with any cycle you are interested in. Let's use Parallel.For
Parallel.For(0, NumberOfElemnts, i =>
{
lines[i]=dataTable[i];
});
Edit it as per @Panagiotis Kanavos comments
source to share