This is a quote and I d...">

XPath - select text of selected child nodes

Given that I have the following xml:

<div id="Main">
    <div class="quote">
        This is a quote and I don't want this text
    </div> 
    <p>
        This is content.
    </p>
    <p>  
        This is also content and I want both of them
    </p>
</div>

      

Is there an "XPath" to help me select the inner text of div # Main as a single node , but it should exclude the texts of any div.quote .

I just need the text: "This is content. This is also content, and I want both of them."

Thank you in advance

Here is the code to test XPath, I am using .NET with HtmlAgilityPack, but I believe xPath should work with any languages

[Test]
public void TestSelectNode()
{
    // Arrange 
    var html = "<div id=\"Main\"><div class=\"quote\">This is a quote and I don't want this text</div><p>This is content.</p><p>This is also content and I want both of them</p></div>";
    var xPath = "//div/*[not(self::div and @class=\"quote\")]/text()";

    var doc = new HtmlDocument();
    doc.LoadHtml(html);

    // Action
    var node = doc.DocumentNode.SelectSingleNode(xPath);

    // Assert
    Assert.AreEqual("This is content.This is also content and I want both of them", node.InnerText);
}

      

The test failed because the xPath is still not correct.

Test 'XPathExperiments/TestSelectNode' failed:
    Expected values to be equal.

    Expected Value : "This is content.This is also content and I want both of them"
    Actual Value   : "This is content."

      

0


source to share


3 answers


I don't think there is XPath that will give you this as the only node, because the values ​​you are trying to get are not a single node. Is there any reason why you cannot do this?



StringBuilder sb = new StringBuilder();
// Action
var nodes = doc.DocumentNode.SelectNodes(xPath);
foreach(var node in nodes)
{
   sb.Append(node.InnerText);
}

// Assert
Assert.AreEqual("This is content.This is also content and I want both of them", 
                sb.ToString());

      

+2


source


You need the text of any child div that is not a div with a class quote:



div/*[not(self::div and @class="quote")]/text()

      

0


source


Since you have <p>

nodes in <div>

, I would use

div[@id='Main']/p/text()

      

which creates a list of text nodes in elements <p>

in <div id="Main">

.

0


source







All Articles