Reading java file with escape characters for newline
I have a Unicode file that needs to be exported to a database (Vertica). The column separator is CTRL + B, the record separator is the newline character (\ n). Whenever there is a new row in the column, CTRL + A is used as an escape character.
When I use BufferedReader.readLine () to read this file, records with IDs 2 and 4 are read as two records. Whereas I want to read them as a single whole record as indicated in the output.
Here is a sample input file. | stands for CTRL + B and ^ stands for CTRL + A.
Input
ID|Name|Job Desc
----------------
1|xxxx|SO Job
2|YYYY|SO Careers^
Job
3|RRRRR|SO
4|ZZZZ^
ZZ|SO Job
5|AAAA|YU
Output:
ID|Name|Job Desc
----------------
1|xxxx|SO Job
2|YYYY|SO Careers Job
3|RRRRR|SO
4|ZZZZ ZZ|SO Job
5|AAAA|YU
The file is huge, so I cannot use StringEscapeUtils. Any suggestions on this?
source to share
You can use Scanner
with custom separator. The multiplier used is divisible by \n
, but not \u0001\n
(where \u0001
represents CTRL+A
):
try {
PrintWriter writer = new PrintWriter("dboutput.txt");
Scanner sc = new Scanner(new File("dbinput.txt"));
sc.useDelimiter(Pattern.compile("^(?!.*(\\u0001\\n)).*\\n$"));
while (sc.hasNext()) {
writer.println(sc.next());
}
scanner.close();
writer.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
source to share
Tim is partially correct in his answer. But it still doesn't allow CTRL + A escaped newlines.
Here is my solution for this (with Tim's answer )
File f = new File("C:\\Users\\SV7104\\Desktop\\sampletest.txt");
Scanner sc = new Scanner(f).useDelimiter(Pattern.compile("\\s*\\u0002\\n\\s*"));
while (sc.hasNext()) {
System.out.print(1);
System.out.println(sc.next().toString().replaceAll("\\u0001\\n", " "));
}
If there is any other efficient method, I am also interested to know about this.
source to share