Performance issues with newFixedThreadPool vs newSingleThreadExecutor

I am trying to test client code. So I decided to write a multi-threaded program to benchmark my client code. I am trying to measure how much time (95 Percentile)

lower the method will take

attributes = deClient.getDEAttributes(columnsList);

So below is the multi-threaded code I wrote to benchmark with the above method. I see many variations in my two scenarios -

1) First, with multi-threaded code using 20 threads

and running for 15 minutes

. I am getting 95 percentile as 37ms

. And I am using -

ExecutorService service = Executors.newFixedThreadPool(20);

      

2) But if I use the same program for 15 minutes

using -

ExecutorService service = Executors.newSingleThreadExecutor();

instead

ExecutorService service = Executors.newFixedThreadPool(20);

I am getting 95 percentiles as 7ms

, which is less than the specified number when I run my code with newFixedThreadPool(20)

.

Can anyone tell me what could be causing such high performance problems -

newSingleThreadExecutor vs newFixedThreadPool(20)

And in both cases, I run my program for 15 minutes

.

Below is my code -

public static void main(String[] args) {

    try {

        // create thread pool with given size
        //ExecutorService service = Executors.newFixedThreadPool(20);
        ExecutorService service = Executors.newSingleThreadExecutor();

        long startTime = System.currentTimeMillis();
        long endTime = startTime + (15 * 60 * 1000);//Running for 15 minutes

        for (int i = 0; i < threads; i++) {
            service.submit(new ServiceTask(endTime, serviceList));
        }

        // wait for termination        
        service.shutdown();
        service.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
    } catch (InterruptedException e) {

    } catch (Exception e) {

    }
}

      

Following is the class that implements Runnable interface -

class ServiceTask implements Runnable {

    private static final Logger LOG = Logger.getLogger(ServiceTask.class.getName());
    private static Random random = new SecureRandom();

    public static volatile AtomicInteger countSize = new AtomicInteger();

    private final long endTime;
    private final LinkedHashMap<String, ServiceInfo> tableLists;

    public static ConcurrentHashMap<Long, Long> selectHistogram = new ConcurrentHashMap<Long, Long>();


    public ServiceTask(long endTime, LinkedHashMap<String, ServiceInfo> tableList) {
        this.endTime = endTime;
        this.tableLists = tableList;
    }

    @Override
    public void run() {

        try {

            while (System.currentTimeMillis() <= endTime) {

                double randomNumber = random.nextDouble() * 100.0;

                ServiceInfo service = selectRandomService(randomNumber);

                final String id = generateRandomId(random);
                final List<String> columnsList = getColumns(service.getColumns());

                List<DEAttribute<?>> attributes = null;

                DEKey bk = new DEKey(service.getKeys(), id);
                List<DEKey> list = new ArrayList<DEKey>();
                list.add(bk);

                Client deClient = new Client(list);

                final long start = System.nanoTime();

                attributes = deClient.getDEAttributes(columnsList);

                final long end = System.nanoTime() - start;
                final long key = end / 1000000L;
                boolean done = false;
                while(!done) {
                    Long oldValue = selectHistogram.putIfAbsent(key, 1L);
                    if(oldValue != null) {
                        done = selectHistogram.replace(key, oldValue, oldValue + 1);
                    } else {
                        done = true;
                    }
                }
                countSize.getAndAdd(attributes.size());

                handleDEAttribute(attributes);

                if (BEServiceLnP.sleepTime > 0L) {
                    Thread.sleep(BEServiceLnP.sleepTime);
                }
            }
        } catch (Exception e) {

        }
    }
}

      

Updated: -

Here is my processor spec - I am running my program from a linux machine with two processors defined as:

vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping        : 7
cpu MHz         : 2599.999
cache size      : 20480 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes hypervisor lahf_lm arat pln pts
bogomips        : 5199.99
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

      

+3


source to share


1 answer


Can anyone tell me what could be causing such high performance problems with newSingleThreadExecutor

vs newFixedThreadPool(20)

...

If you are doing many more tasks in parallel (20 in the case) than you have CPUs (I doubt you have 20+ CPU units), then yes, each individual task will take longer. It is easier for a computer to complete one task at a time, instead of having to switch between multiple threads running at the same time. Even if you limit the number of threads in the pool to the number of processors you have, each task will probably run slower, albeit slightly.

If, however, you are comparing the throughput (the amount of time it takes to complete a number of tasks) of your thread pools of different sizes, you should see that the throughput of 20 threads is much higher. If you run 1000 jobs with 20 threads, they generally finish much earlier than with just one thread. Each task may take longer, but they will run in parallel. It probably won't be 20x faster if there is threading overhead etc., but it could be about 15x faster.



You shouldn't worry about the speed of individual tasks, but you should try to maximize the throughput of a task by adjusting the number of threads in your pool. How many threads used depends largely on the number of IOs, CPU cycles used by each task, locks, synchronized blocks, other applications running on the OS, and other factors.

People often use 1-2 times as many CPUs to start in terms of the number of threads in the pool to maximize throughput. Additional IO requests or thread blocking operations then add more threads. More CPU, then reduces the number of threads to be closer to the number of available CPUs. If your application is competing for OS cycles with other more critical applications on the server, then even fewer threads may be required.

+8


source







All Articles