How to add a sparse domain to the Chapel

I am filling a sparse array at the Chapel with a loop that reads through CSV.

I am wondering which is the best template.

var dnsDom = {1..n_dims, 1..n_dims};
var spsDom: sparse subdomain(dnsDom);
for line in file_reader.lines() {
   var i = line[1]:int;
   var j = line[2]:int;
   spsDom += (i,j);
}

      

Is this an efficient way to do it?
Should I create a temporary array of tuples and add spsDom

every (say) 10,000 rows?

Thank!

+3


source to share


1 answer


The way you show in the snippet will expand the internal sparse domain arrays on every operation +=

. As you suggested; buffering the read indices in some way and then adding them in bulk will certainly get better due to several optimizations for adding an array of indices.

Similarly, you can do +=

where the right-hand side is an array:

spsDom += arrayOfIndices;

      

This operator overload +=

on sparse domains actually calls the main bulk add method bulkAdd

. The method itself has several flags that can help you get even better performance in some cases. Note that the overloading +=

calls the method bulkAdd

as much as possible. those. array of indices can be in random order, can include duplicates, etc. If you have arrays (in your cases the indices that you read from a file) satisfy some requirements (are they ordered? Are there duplicates? Array?), You can use bulkAdd

and pass multiple optimization flags directly .



See http://chapel.cray.com/docs/latest/builtins/internal/ChapelArray.html#ChapelArray.bulkAdd for documentation bulkAdd

.

Edit: Creating a snippet on top of the one in question:

var dnsDom = {1..n_dims, 1..n_dims};
var spsDom: sparse subdomain(dnsDom);

//create an index buffer
config const indexBufferSize = 100;
var indexBufferDom: {0..#indexBufferSize};
var indexBuffer: [indexBufferDom] 2*int;

var count = 0;
for line in file_reader.lines() {

  indexBuffer[count] = (line[1]:int, line[2]:int);
  count += 1;

  // bulk add indices if the buffer is full
  if count == indexBufferSize {
    spsDom.bulkAdd(indexBuffer, dataSorted=true,
                                preserveInds=false,
                                isUnique=true);
    count = 0;
  }
}

// dump the final buffer that is (most likely) partially filled
spsDom.bulkAdd(indexBuffer[0..#count],  dataSorted=true,
                                        preserveInds=false,
                                        isUnique=true);

      

I haven't tested it, but I think it should reflect the main idea. Flags passed to bulkAdd should yield better performance. Of course it depends on the sorting of the input buffer and no duplicates. Also, note that the initial bulkAdd will be much faster than the sequential ones. And they will probably slow down as the method has to sift through the existing indexes and change them as needed. Thus, a larger buffer can provide better performance.

+3


source







All Articles