Combining two regular expressions into a custom way to get text in Java

I need to combine two regular expressions into one. text (userdoc)

INPUT:

<user>textxtxtxtx</user>

<unnecessarytag>unwanted info</unnecessarytag>

<info>infoinfoinfo. part 1.....multiline</info>

<unnecessarytag>unwanted info</unnecessarytag>

<info>infoinfoinfo. part 2.....multiline</info>

      

There will be many similar blocks in the file.

OUTPUT:

<user>textxtxtxtx</user>

<info>infoinfoinfo. part 1.....multiline</info>

<info>infoinfoinfo. part 2.....multiline</info>

      

Order must be maintained

One user can have a lot of information. The file contains many userdocs.

Code for this:

String out = String.join("\n", Files.readAllLines(Paths.get("text.txt")));

Pattern p = Pattern.compile("<user>(.*?)</user>");
Matcher m = p.matcher(out);

Pattern p1 = Pattern.compile("<info>([^<]*)</info>", Pattern.MULTILINE);
Matcher m1 = p1.matcher(out);

      

I was planning to write

while (m.find() && m1.find())
{
    String cp = m.group();
    String cp1 = m1.group();
    System.out.println(  cp + cp1 );
}

      

But it gives a text where each user will only have one information. How do I combine these two regular expressions to create a pattern that supports the ab ^ n format.

+3


source to share


2 answers


Hello why don't you turn this to XML using JDOM2 or no DOM implementation at all in java. Your current approach may be error-prone. Additionally, the XML query will be simpler, more readable (from a code point of view), and generally more elegant.

Do this, you will need to do something like the following (I am using JDOM2)

SAXBuilder saxBuilder = new SAXBuilder(); 
\\where modelPath a string originated from the IPath of the file that stores the data
Document originalDoc = saxBuilder.build(new File(modelPath));

      

Then, handling the nodes is pretty straightforward, you can either use the traditional parent -> children approach, or a slightly more general implementation that is reliable for modifying the model structure. This implementation is associated with xpath expressions. There are some pros and cons to these approaches that I suggest you research and evaluate.

For this to work, your structure must change to something like this:



<?xml version="1.0" encoding="UTF-8"?>
<userdocs>
    <user name="textxtxtxtx">
       <info>...</info>
       <info>...</info>
       <info>...</info>
    </user>
    <user name="test2">
       <info>...</info>
       <info>...</info>
       <info>...</info>
    </user>
    <!-- etc... -->
</userdocs>

      

You can then do this to extract items from your preferences.

public static List<Element> getElements(String regex, Document doc, Namespace ns) {
        XPathFactory xFactory = XPathFactory.instance();
        XPathExpression<Element> expr = xFactory.compile(regex, Filters.element(), null, ns);    
        return expr.evaluate(doc);   
 }


\\a sample caller of the method
getElements("//user",doc,namespace).
            forEach(el->{
                             //your processing
                        });

\\all it will take to retrive the user `xx` 
with all of its info children is this expression //user[@name='textxtxtxtx']

      

A list of xpath expressions and their meaning can be found here Tester / Evaluator / Examples

+1


source


Wrap your search info

in a search user

.

Pattern p = Pattern.compile("<user>(.*?)</user>");
Pattern p1 = Pattern.compile("<info>([^<]*)</info>", Pattern.MULTILINE);
Matcher m = p.matcher(out);
while ( m.find() ){
    String content = m.group(1);
    Matcher m2 = p1.matcher(content);
    while ( m2.find() ){
        //do what needs to be done. 
    }
}

      



You can also set the flag Pattern.DOT_ALL

0


source







All Articles