How can I use multiple threads in Java to iterate over a collection, where no thread ever repeats in the same part of the collection?

I need to iterate over a large one ArrayList

(~ 50,000 records) and I need to use multiple threads to get it done pretty quickly.

But I need each thread to start at a unique index so that neither of the two threads repeats on the same part of the list. There will be batchSize

of 100

so that each thread will loop from startIndex

to startIndex + 100

.

Is there a way to achieve this? Note that I am doing read operations here, not write operations. Each entry in the list is just a string, which is actually a SQL query, which I then execute with DB over JDBC.

+3


source to share


3 answers


If you are only going to read List

and not mutate it, you can simply define your own Runnable

to accept constructor arguments List

and startIndex

. There is no danger of reading concurrently ArrayList

(even the same indices) if no thread changes it at the same time.

To be on the safe side, remember to wrap ArrayList

when calling Collections.unmodifiableList()

and pass this List

to your Runnable

s. This way, you can be sure that threads won't change support ArrayList

.

Alternatively, you can create subscriptions in your main thread ( List.subList()

) so that you don't have to pass startIndex

for each thread. However, you still want to make the signatures unmodifiable before you do this. Six of them, half a dozen others.



Better yet would be to use Guava ImmutableList

; it is naturally thread safe.

There's also parallel threads in Java 8, but take care of this solution; they are powerful, but easy to get wrong.

+5


source


If you are using Java 8 take a look list.stream().parallel()

For Java 7, use subList()

off-streams to split your work. Then threads should only work with such a sub-list. For most lists, this subList()

is a very efficient operation that does not copy data. If the support list is changed, you will receiveConcurrentModificationException



As for pumping data into streams, I suggest looking at the API Executor

and Queue

s. Just put all the details in the queue and let the performer know.

+1


source


Has an atomic variable:

int nextBatch = 0;

      

Increment it every time the thread issues a new batch:

public synchronized int getNextBatch() {
    nextBatch += batchSize;
    if(nextBatch >= arraylist.size()) {
        // The end was reached
        return -1;
    }
    return nextBatch;
}

      

The thread will call this method and get the range in which we need to work:

int start = getNextBatch();
if(start == -1) {
    // The end was reached
}
int end = Math.min(start + batchSize, arraylist.size);

// Iterate over its own range
for(int i = start; i < end; i++) {
    Object obj = arraylist.get(i);
    // Do something with obj
} 

      

0


source







All Articles