Java parses information from file sequentially

lets say I have a file with a structure like this:

Line 0:

354858

Some String That Is Important

AA OTHER EVENTS WHAT SHOULD BE PROHIBITED

Line 1:

543788

Another String That Is Important

AA OTHER STUFF SOMESTUFF THAT SHOULD BE PROHIBITED

etc.

Now I would like to get the information noted in my example (see gray background). The AA sequence is always present (and can be used as a break and skip the next line), while the information line changes in length.

What's the best way to parse the information? Buffered reader with if, then, else

or is there some kind of parser you can say read a few dozen XYZ, then read all into a line until you find AA , then skip the line.

+3


source to share


6 answers


I would read the file line by line and match each line to a regex. Hopefully my comments in the code below will be detailed enough.

// The pattern to use
Pattern p = Pattern.compile("^([0-9]+)\\s+(([^A]|A[^A])+)AA");

// Read file line by line
BufferedReader br = new BufferedReader(new FileReader(myFile));
String line;
while((line = br.readLine()) != null) {
  // Match line against our pattern
  Matcher m = p.matcher(line);
  if(m.find()) {
    // Line is valid, process it however you want
    // m.group(1) contains the number
    // m.group(2) contains the text between number and AA
  } else {
    // Line has invalid format (pattern does not match)
  }
}

      

Explanation of the regular expression (Pattern) that I used:

^([0-9]+)\s+(([^A]|A[^A])+)AA

^               matches the start of the line
([0-9]+)        matches any integral number
\s+             matches one or more whitespace characters
(([^A]|A[^A])+) matches any characters which are either not A or not followed by another A
AA              matches the terminating AA

      

Update as response to comment:



If each line is preceded by a character |

, the expression looks like this:

^\|([0-9]+)\s+(([^A]|A[^A])+)AA

      

In JAVA, you need to escape like this:

"^\\|([0-9]+)\\s+(([^A]|A[^A])+)AA"

      

The character |

has special meaning in regular expressions and must be escaped.

+1


source


Telling you what works best for your problem is impossible without additional information.

One solution could be

String s = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
String[] split = s.substring(0, s.indexOf(" AA")).split(" ", 2);
System.out.println("split = " + Arrays.toString(split));

      



Output

split = [354858, Some String That Is Important]

      

+1


source


You can read the file line by line and exclude the AA part charSequence

:

final String charSequence = "AA";
String line;
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream("yourfilename")));
try {
    while ((line = r.readLine()) != null) {
       int pos = line.indexOf(charSequence);
       if (pos > 0) {
            String myImportantStuff = line.substring(0, pos);
            //do something with your useful string
       }
    }
} finally {
    r.close();
}

      

+1


source


Use the Regex:.+?(?=AA)

.

Check Here is a demo

0


source


Here's a solution for you:

public static void main(String[] args) {
    InputStream source; //select a text source (should be a FileInputStream)
    {
        String fileContent = "354858 Some String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED\n" +
                "543788 Another String That Is Important AA OTHER STUFF SOMESTUFF THAT SHOULD BE IGNORED";
        source = new ByteArrayInputStream(fileContent.getBytes(StandardCharsets.UTF_8));
    }

    try(BufferedReader stream = new BufferedReader(new InputStreamReader(source))) {
        Pattern pattern = Pattern.compile("^([0-9]+) (.*?) AA .*$");
        while(true) {
            String line = stream.readLine();
            if(line == null) {
                break;
            }
            Matcher matcher = pattern.matcher(line);
            if(matcher.matches()) {
                String someNumber = matcher.group(1);
                String someText = matcher.group(2);
                //do something with someNumber and someText
            } else {
                throw new ParseException(line, 0);
            }
        }
    } catch (IOException | ParseException e) {
        e.printStackTrace(); // TODO ...
    }
}

      

0


source


You can use a regex, but if you know what each line contains AA

and you want the content before AA

, you could just do substring(int,int)

to get the part of the line beforeAA

public List read(Path path) throws IOException {
    return Files.lines(path)
          .map(this::parseLine)
          .collect(Collectors.toList());
}

public String parseLine(String line){
    int index = line.indexOf("AA");
    return line.substring(0,index);
}

      

Here's a non-Java8 version read

public List read(Path path) throws IOException {
    List<String> content = new ArrayList<>();

    try(BufferedReader reader = new BufferedReader(new FileReader(path.toFile()))){
        String line;
        while((line = reader.readLine()) != null){
            content.add(parseLine(line));
        }
    }

    return content;
}

      

0


source







All Articles