Parallel Partitioning Algorithm in C #: How to Maximize Parallelism

I wrote a parallel algorithm in C # to split an array into two lists, one of which contains elements that satisfy a given predicate and the other list contains elements that do not satisfy the predicate. This is an order-preserving algorithm.

I wrote it like this, but I want to know how to maximize the opportunity to profit from hardware concurrency.

    static void TestPLinqPartition(int cnt = 1000000)
    {
        Console.WriteLine("PLINQ Partition");
        var a = RandomSequenceOfValuesLessThan100(cnt).ToArray();
        var sw = new Stopwatch();
        sw.Start();
        var ap = a.AsParallel();
        List<int> partA = null;
        List<int> partB = null;
        Action actionA = () => { partA = (from x in ap where x < 25 select x).ToList(); };
        Action actionB = () => { partB = (from x in ap where !(x < 25) select x).ToList(); };
        Parallel.Invoke(actionA, actionB);
        sw.Stop();

        Console.WriteLine("Partion sizes = {0} and {1}", partA.Count, partB.Count);
        Console.WriteLine("Time elapsed = {0} msec", sw.ElapsedMilliseconds);
    }

      

+3


source to share


2 answers


I would split the data into small segments (for example using the Partitioner class ) and assign an index to each partition in relation to its position. For each numbered section, I would create a Task that splits the section into two groups that match the predicate, and one that doesn't, and return the two groups, along with the index of the section they originated from, as the task's return value. Then I waited until all the tasks were completed, and then .Concat()

(to prevent wasting time actually merging all the data) the corresponding groups according to their index and the same for the inconsistent groups. You should be able to achieve an arbitrary degree of parallelism in this way while preserving the relative order of the positions.



+1


source


If your lists are very long, you won't get parallelism (2x) from it. Instead, I would recommend using Parallel.For and using thread local Tuple<List<int>, List<int>>

as the parallel loop state. The Parallel.For API lets you do this easily. You can combine the individual sublists at the end.

This version is awkwardly parallel and causes almost no coherent traffic on the CPU bus since there is no synchronization.



Edit: I want to emphasize that you can't just use two lists shared by all threads, because that would cause synchronization overheads like crazy. You need to use streaming local lists. Even ConcurrentQueue is not suitable for this scenario because it uses blocking operations that lead to restricting traffic coherency.

+3


source







All Articles