GPU Accelerated Sort (~ 1 GB) and Merge Sort (~ 100 GB)

I am asking the C ++ library to do a GPU-accelerated sort (about 1GB of data) and sort the merge (say about 100GB of data - but the size doesn't matter since the merge is a streaming algorithm). License must be LGPL, BSD or whatever. I really prefer OpenCL because of its portability (but I'm also interested in the links to the CUDA libraries). I appreciate the links to docs and blog posts on the subject.

Some background (please correct me if I'm wrong):

2-way merge sort 1GB (i.e. 128,000,000 8-byte objects) will consume approximately log 2(128,000,000) and middot; 1G = 27GB of memory bandwidth, which is about 1 second on a modern processor with ~ 30GB / s sequential memory bandwidth. (Some type of merge seems to take much longer as accessing non-sequential memory is 10 to 100 times slower.)

While I'm not familiar with a modern GPU, I suspect a 1GB merge will take 0.2 seconds or less, since the typical GPU memory bandwidth is around 150GB / s like AMD / ATI 58xx (see for example http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units#Evergreen_.28HD_5xxx.29_series )

This is at least 5x acceleration. (The time to transfer 1 GB over 16x PCI-E 2.0 is about 0.125 s, but it is possible to do PCI transitions in parallel with sorting, however this may require 2 GB or 3 GB of VRAM instead of 1 GB).

I suspect there is even more speedup due to more than two-way merge sort or some sort of GPU-friendly sort.

+3


source to share


1 answer


Have you watched Thrust ?

On the project page:



Thrust is a library of parallel algorithms that resembles the C ++ Standard Template Library (STL). Satisfy a high-level interface improves developer productivity while allowing performance portability between GPUs and multi-core processors. Interoperability with installed technologies (such as CUDA, TBB, and OpenMP) facilitates integration with existing software. Build high-performance apps quickly with Thrust!

The license is Apache, so it will work for you.

+3


source







All Articles