Parse content from specific tag in XML file (Java)

I have an XML file as described below and I need to generate a plaintext .txt file in a tag, each on a line using Java.

I read that I can use SAX to access various shortcuts, but in this case, where in the example below there may be random tags inside, this is not valid.

What's the best way to do this? Regex possible?

<?xml version="1.0" encoding="utf-8"?>
[...]
<source>
  <g id="_0">
    <g id="_1">First valid sentence</g>
  </g>
</source>
<source>Another valid string</source>

      

The .txt output should be something like this:

First valid sentence
Another valid string

      

+3


source to share


1 answer


You can use the library joox

for data analysis xml

. Using a method find()

, you can get all the elements <source>

and then use getTextContent()

to retrieve its text, for example:

import java.io.File;
import java.io.IOException;
import org.xml.sax.SAXException;
import static org.joox.JOOX.$;

public class Main {

    public static void main(String[] args) throws SAXException, IOException {
        $(new File(args[0]))
            .find("source")
            .forEach(elem -> System.out.println(elem.getTextContent().trim()));

    }
}

      

I will consider a well-formed file xml

, for example:



<?xml version="1.0" encoding="utf-8"?>
<root>
    <source>
        <g id="_0">
            <g id="_1">First valid sentence</g>
        </g>
    </source>
    <source>Another valid string</source>
</root>

      

And this gives:

First valid sentence
Another valid string

      

+3


source







All Articles