OpenCL1.1 lazy strategy
I'm not that familiar with OpenCL, but found a relevant discussion discussion at NVIDIA .
Also thought I'd respond to the mention of CUDA in case anyone meets later ...
I don't think native CUDA has the ability to allocate more memory than is physically available on the device. In fact, even on large maps, you cannot allocate one large contiguous array, because the map has separate memory banks, for example. back in the C1060 days I recall hitting the limit of something like 1.5gb on 3gb cards. Could you provide some details on what you mean by CUDA allowing for such large distributions?
If you are using ArrayFire (or Jacket ) they have the basic idea of virtual memory : if you have many small allocations that are larger than what is available on the card, then it only stores the most important on the device, while the other parts are kept on the host until you need it.