Text file section libraries in Java

My program takes large CSV files and converts them to XML files. In order to have the best performance, I would like to split these files into smaller segments (for example) of 500 lines. What are the Java libraries available for splitting text files?

+1


source to share


3 answers


I don't understand what you get by splitting the CSV file into smaller ones? With Java, you can read and process the file as it comes in, you don't have to read it all at once ...



+4


source


What do you intend to do with this data?

If it is just a record processing record, then event-oriented (SAX or StaX) parsing will be processed. Existing pipeline instrumentation can be used to record the processing of records.



You can preprocess the file with a splitter function like this or this Splitter.java .

+2


source


How do you plan to distribute work after splitting files?

I did something similar to this in a framework called GridGain - this is a grid grid framework that allows you to do tasks on a grid of computers.

In this case, you can use a cache provider like JBoss Cache to distribute the file across multiple nodes, specify the number and process of the start and end of the line. This is described in the following GridGain example: http://www.gridgainsystems.com/wiki/display/GG15UG/Affinity+MapReduce+with+JBoss+Cache

Alternatively, you can look at something like Hadoop and the Hadoop filesystem for moving a file between different nodes.

The same could be done on your local machine by loading the file into the cache and then assigning specific "chunks" of the file to be processed by separate threads. Mesh compute stuff is really only meant for really big problems, or for making your transparency level transparent. You may need to watch for bottlenecks and I / O locks, but a simple thread pool where you submit "jobs" after splitting a file might work.

0


source







All Articles