Reading java file with escape characters for newline

Question

Reading java file with escape characters for newline

I have a Unicode file that needs to be exported to a database (Vertica). The column separator is CTRL + B, the record separator is the newline character (\ n). Whenever there is a new row in the column, CTRL + A is used as an escape character.

When I use BufferedReader.readLine () to read this file, records with IDs 2 and 4 are read as two records. Whereas I want to read them as a single whole record as indicated in the output.

Here is a sample input file. | stands for CTRL + B and ^ stands for CTRL + A.

Input
ID|Name|Job Desc
----------------
1|xxxx|SO Job
2|YYYY|SO Careers^
Job
3|RRRRR|SO
4|ZZZZ^
 ZZ|SO Job
5|AAAA|YU

Output:
ID|Name|Job Desc
----------------
1|xxxx|SO Job
2|YYYY|SO Careers Job
3|RRRRR|SO
4|ZZZZ ZZ|SO Job
5|AAAA|YU

The file is huge, so I cannot use StringEscapeUtils. Any suggestions on this?

+3

java escaping bufferedreader unicode-escapes

Santhosh Apr 28 15 at 12:26

source to share

2 answers

Tim is partially correct in his answer. But it still doesn't allow CTRL + A escaped newlines.

Here is my solution for this (with Tim's answer )

File f = new File("C:\\Users\\SV7104\\Desktop\\sampletest.txt");
Scanner sc = new Scanner(f).useDelimiter(Pattern.compile("\\s*\\u0002\\n\\s*"));
            while (sc.hasNext()) {
                System.out.print(1);
                System.out.println(sc.next().toString().replaceAll("\\u0001\\n", " "));

            }

If there is any other efficient method, I am also interested to know about this.

0

Santhosh Apr 28 15 at 19:50

source to share

Tim biegeleisen · Accepted Answer · 2015-04-28T02:15:10+0000

You can use Scanner

with custom separator. The multiplier used is divisible by \n

, but not \u0001\n

(where \u0001

represents CTRL+A

):

try {
    PrintWriter writer = new PrintWriter("dboutput.txt");
    Scanner sc = new Scanner(new File("dbinput.txt"));
    sc.useDelimiter(Pattern.compile("^(?!.*(\\u0001\\n)).*\\n$"));
    while (sc.hasNext()) {
        writer.println(sc.next());
    }
    scanner.close();
    writer.close();
} catch (FileNotFoundException e) {
   e.printStackTrace();
}

Reading java file with escape characters for newline

More articles: