Univocity - irregular csv parsing

I have irregular (albeit sequential) "csv" files that I need to parse. The content looks like this:

Field1: Field1Text
Field2: Field2Text

Field3 (need to ignore)
Field4 (need to ignore)

Field5
Field5Text

// Cars - for example
#,Col1,Col2,Col3,Col4,Col5,Col6
#1,Col1Text,Col2Text,Col3Text,Col4Text,Col5Text,Col6Text
#2,Col1Text,Col2Text,Col3Text,Col4Text,Col5Text,Col6Text
#3,Col1Text,Col2Text,Col3Text,Col4Text,Col5Text,Col6Text

      

Ideally I would like to use the same approach as here .

Ultimately I want to get an object like:

String field1;
String field2;
String field5;
List<Car> cars;

      

I am currently having the following problems:

  • After adding some sample tests, lines starting with a hash (#) are ignored. I don't want it, is there anyway to escape?
  • My intention was to use a BeanListProcessor for the cars section and process other fields using separate row processors. Then combine the result in the above object. Am I missing any tricks here?
+3


source to share


1 answer


Your first problem has to do with #

which is treated as a comment character by default. To prevent lines starting with #

as a comment, do the following:

parserSettings.getFormat().setComment('\0');

      

As for the structure you are parsing, there is no way to do this out of the box, but it is easy to use the API for it. The following will work:

    CsvParserSettings settings = new CsvParserSettings();
    settings.getFormat().setComment('\0'); //prevent lines starting with # to be parsed as comments

    //Creates a parser
    CsvParser parser = new CsvParser(settings);

    //Open the input
    parser.beginParsing(new File("/path/to/input.csv"), "UTF-8");

    //create BeanListProcessor for instances of Car, and initialize it.
    BeanListProcessor<Car> carProcessor = new BeanListProcessor<Car>(Car.class);
    carProcessor.processStarted(parser.getContext());

    String[] row;
    Parent parent = null;
    while ((row = parser.parseNext()) != null) { //read rows one by one.
        if (row[0].startsWith("Field1:")) {  // when Field1 is found, create your parent instance
            if (parent != null) { //if you already have a parent instance, cars have been read. Associate the list of cars to the instance
                parent.cars = new ArrayList<Car>(carProcessor.getBeans()); //copy the list of cars from the processor.
                carProcessor.getBeans().clear(); //clears the processor list
                //you probably want to do something with your parent bean here.
            }
            parent = new Parent(); //create a fresh parent instance
            parent.field1 = row[0]; //assign the fields as appropriate.
        } else if (row[0].startsWith("Field2:")) {
            parent.field2 = row[0]; //and so on
        } else if (row[0].startsWith("Field5:")) {
            parent.field5 = row[0];
        } else if (row[0].startsWith("#")){ //got a "Car" row, invoke the rowProcessed method of the carProcessor.
            carProcessor.rowProcessed(row, parser.getContext());
        }
    }

    //at the end, if there is a parent, get the cars parsed
    if (parent != null) {
        parent.cars = carProcessor.getBeans();
    }

      



In order to work BeanListProcessor

, you need your instance to be declared like this:

public static final class Car {
    @Parsed(index = 0)
    String id;
    @Parsed(index = 1)
    String col1;
    @Parsed(index = 2)
    String col2;
    @Parsed(index = 3)
    String col3;
    @Parsed(index = 4)
    String col4;
    @Parsed(index = 5)
    String col5;
    @Parsed(index = 6)
    String col6;
}

      

You can use headers instead, but that forces you to write more code. If the headers always match, you can simply assume that the positions are locked.

Hope it helps

+1


source







All Articles