Running Java application in parallel on multicore cluster nodes

I have a Java program that calculates the semantic similarity between two documents. The program extracts documents from the specified file system and calculates the affinity. There are about 2,000,000 such documents.
I created 10 threads for this task and I assigned data patches to each of the threads. Ex. documents 1-20000 for the first stream and 20001-40000 for the next stream, etc.
I am currently running the above program on an 8 cpu. It will take a long time to complete the calculations.
I want to run a program on a 5 node Linux cluster where each node has 64 cores.

  • Are there any frameworks in Java like EXECUTOR Framework that can do this task?
  • Is there a way to calculate the maximum number of threads that can be created?
    Any pointers on how to resolve this or make it better would be appreciated.
+3


source to share


2 answers


Are there any frameworks in Java like EXECUTOR Framework that can do this task?

I suggest you take a look at the Akka framework for writing powerful concurrent and distributed applications. Akka uses the Actor model along with Software Transactional Memory to increase the level of abstraction and provide a better platform for building the right parallel and scalable applications.

Take a look at a step-by-step tutorial for more information on how to build a distributed application using the Akka framework.

In general, distributed applications are built in Java using Java-RMI , which internally uses Java's native serialization to transfer objects between nodes.



Is there a way to calculate the maximum number of threads that can be created?

The simple rule we use is set to a higher value than the available logical cores on the system . How much higher the value depends on the type of operations we are doing. For example, if the computation includes IO communication, then set the number of threads to 2 * available logical cores (not physical cores).

Other ideas we use are

  • Measure the processor load by increasing the number of threads one by one and stop when the processor load approaches 90-100%
  • Measure throughput and stop the point where throughput remains or starts to degrade
+4


source


Java Fork / Join framework is your friend. As this framework's opening statement says:

The fork / join framework is an implementation of the ExecutorService interface that helps you take advantage of multiple processors . these are designed to work, which can be broken down into smaller chunks recursively. The goal is to use all the available computing power for the performance of your application.



On how many threads you can appear - I think there is no such hard and fast rule, it depends. So you can try to start with a number like 5 or so and then continue to increase or decrease depending on the result.
In addition, you can analyze the existing maximum and minimum number of threads and also compare it with the CPU load, etc., and proceed as follows to understand how your system is behaving. If your application is deployed to an application server, check its streaming model and what they say about stream bandwidth.

0


source







All Articles