Java: making chunks of a list for processing

I have a list with a lot of items. When processing this list, in some cases I want the list to be split into smaller sub-lists, and in some cases I want to process the entire list.

private void processList(List<X> entireList, int partitionSize)
{
    Iterator<X> entireListIterator = entireList.iterator();
    Iterator<List<X>> chunkOfEntireList =   Iterators.partition(entireListIterator, partitionSize);
    while (chunkOfEntireList.hasNext()) {
        doSomething(chunkOfEntireList.next());
        if (chunkOfEntireList.hasNext()) {
            doSomethingOnlyIfTheresMore();
        }
    }

      

I am using com.google.common.collect.Iterators to create sections. The documentation link is here So in cases where I want to split a list with a size of 100 I call

processList(entireList, 100);

      

Now that I don't want to create chunks of the list, I thought I could pass Integer.MAX_VALUE as partitionSize.

processList(entireList, Integer.MAX_VALUE);

      

But this causes my code to go out of memory. Can anyone help me? What am I missing? What do iterators do internally, and how can I overcome this?

EDIT: I also require that the if if clause is executed internally only if there are more lists to process. ie I need the hasNext () iterator function.

+3


source to share


3 answers


You are getting out of memory error because Iterators.partition()

internally fills the array with the specified section length. The allocated array is always the partition size, since the actual cardinality is not known until the iteration is complete. (The problem could have been prevented if they had used ArrayList

internally, I think the developers decided that arrays would offer better performance in general.)

Using it Lists.partition()

will avoid the problem as it delegates a value List.subList()

, which is only a view of the underlying list:



private void processList(List<X> entireList, int partitionSize) {
    for (List<X> chunk : Lists.partition(entireList, partitionSize)) {
        doSomething(chunk);
    }
}

      

+6


source


Usually when partitioning, it allocates a new list with the given partitionSize parameter. Therefore, in this case it is obvious that there will be such an error. Why don't you use the original list if you only want one section. Possible solutions.



  • create a separate overloaded method where you will not take the size.
  • pass the size as -1 when you don't need the partition. In the method of check value if the then position -1 in the original list chunkOfEntireList

    ,.
0


source


Assuming you're trying to solve parallelism by processing slices of your list in parallel, it might be better to consider something like MapReduce or Spark as a larger framework that includes process management.

However, as part of a monolithic application, you might want to consider node-local variants, including possibly Java 8 Streams . Pay attention to the method parallelStream()

that is also available on your List<X>

.

0


source







All Articles