Java parses information from file sequentially
lets say I have a file with a structure like this:
Line 0:
354858
Some String That Is Important
AA OTHER EVENTS WHAT SHOULD BE PROHIBITEDLine 1:
543788
Another String That Is Important
AA OTHER STUFF SOMESTUFF THAT SHOULD BE PROHIBITED
etc.
Now I would like to get the information noted in my example (see gray background). The AA sequence is always present (and can be used as a break and skip the next line), while the information line changes in length.
What's the best way to parse the information? Buffered reader with if, then, else
or is there some kind of parser you can say read a few dozen XYZ, then read all into a line until you find AA , then skip the line.
source to share
I would read the file line by line and match each line to a regex. Hopefully my comments in the code below will be detailed enough.
// The pattern to use
Pattern p = Pattern.compile("^([0-9]+)\\s+(([^A]|A[^A])+)AA");
// Read file line by line
BufferedReader br = new BufferedReader(new FileReader(myFile));
String line;
while((line = br.readLine()) != null) {
// Match line against our pattern
Matcher m = p.matcher(line);
if(m.find()) {
// Line is valid, process it however you want
// m.group(1) contains the number
// m.group(2) contains the text between number and AA
} else {
// Line has invalid format (pattern does not match)
}
}
Explanation of the regular expression (Pattern) that I used:
^([0-9]+)\s+(([^A]|A[^A])+)AA
^ matches the start of the line
([0-9]+) matches any integral number
\s+ matches one or more whitespace characters
(([^A]|A[^A])+) matches any characters which are either not A or not followed by another A
AA matches the terminating AA
Update as response to comment:
If each line is preceded by a character |
, the expression looks like this:
^\|([0-9]+)\s+(([^A]|A[^A])+)AA
In JAVA, you need to escape like this:
"^\\|([0-9]+)\\s+(([^A]|A[^A])+)AA"
The character |
has special meaning in regular expressions and must be escaped.
source to share
Telling you what works best for your problem is impossible without additional information.
One solution could be
String s = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
String[] split = s.substring(0, s.indexOf(" AA")).split(" ", 2);
System.out.println("split = " + Arrays.toString(split));
Output
split = [354858, Some String That Is Important]
source to share
You can read the file line by line and exclude the AA part charSequence
:
final String charSequence = "AA";
String line;
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream("yourfilename")));
try {
while ((line = r.readLine()) != null) {
int pos = line.indexOf(charSequence);
if (pos > 0) {
String myImportantStuff = line.substring(0, pos);
//do something with your useful string
}
}
} finally {
r.close();
}
source to share
Here's a solution for you:
public static void main(String[] args) {
InputStream source; //select a text source (should be a FileInputStream)
{
String fileContent = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED\n" +
"543788 Another String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
source = new ByteArrayInputStream(fileContent.getBytes(StandardCharsets.UTF_8));
}
try(BufferedReader stream = new BufferedReader(new InputStreamReader(source))) {
Pattern pattern = Pattern.compile("^([0-9]+) (.*?) AA .*$");
while(true) {
String line = stream.readLine();
if(line == null) {
break;
}
Matcher matcher = pattern.matcher(line);
if(matcher.matches()) {
String someNumber = matcher.group(1);
String someText = matcher.group(2);
//do something with someNumber and someText
} else {
throw new ParseException(line, 0);
}
}
} catch (IOException | ParseException e) {
e.printStackTrace(); // TODO ...
}
}
source to share
You can use a regex, but if you know what each line contains AA
and you want the content before AA
, you could just do substring(int,int)
to get the part of the line beforeAA
public List read(Path path) throws IOException {
return Files.lines(path)
.map(this::parseLine)
.collect(Collectors.toList());
}
public String parseLine(String line){
int index = line.indexOf("AA");
return line.substring(0,index);
}
Here's a non-Java8 version read
public List read(Path path) throws IOException {
List<String> content = new ArrayList<>();
try(BufferedReader reader = new BufferedReader(new FileReader(path.toFile()))){
String line;
while((line = reader.readLine()) != null){
content.add(parseLine(line));
}
}
return content;
}
source to share