Select elements with a self or child attribute value but not "overridden" (for example, the lang attribute)

I am trying to emulate the interpretation of an attribute lang

similar to HTML or xml:lang


Given the following XML snippet:

<xml lang="c">
    <para lang="d">
        <para lang="c">c#3</para>
        <para lang="d">
            <para lang="c">c#4</para>
        <para lang="c">


I am having trouble phrasing an XPath 1.0 expression that returns all nodes of a specific language, eg c

. A node is the same as a similar xpath function lang()

for an attribute xml:lang


  • It has an attribute lang

    with value c

    ( //*[@lang = "c"]

  • -OR-
    • One of them has an attribute lang

      with value c

      ( //*[ancestor::*/@lang = "c"]

    • -AND- the node itself has no attribute lang

      at all
    • -AND NOT- if any of its parent nodes have an attribute lang

      other than c

      more "near" than the parent with the attribute lang


      (2.1 is "canceled").

Examples of matches with XML above and c

for lang

will give 7 nodes: C # 0 - C # 6.

<xml lang="c"> c#0 ...              (direct match, lang="c")
<para>c#1</para>                    (parent has lang="c")
<para>c#2</para>                    (parent has lang="c")
<para lang="c">c#3</para>           (direct match, lang="c")
<para lang="c">c#4</para>           (direct match, lang="c")
<para lang="c"> c#5 ...             (direct match, lang="c")
<para>c#6</para>                    (parent has lang="c", that parent is descending of 
                                     any other ancestor with lang="d")


I have a problem to describe this in the xpath request. Even I got better with xpath over the last year, this one really knocks me out.

No matter what I try, I am having trouble describing the oversaturated nature of an ancestor with a matching predicate over an ancestor with a non-matching predicate.

The examples given are only half the problem, since there are not only full attribute values, but also initial ones:

 starts-with(@lang, concat("c", "-"))


But I would be happy to see that the brute force problem is solved first. I am testing PHP ( Online demo ):

header('Content-Type: text/plain');
$xml = <<<XML
<xml lang="c">
    <para lang="d">
        <para lang="c">c#3</para>
        <para lang="d">
            <para lang="c">c#4</para>
        <para lang="c">

$doc = new DOMDocument();
$xp = new DOMXPath($doc);

$expression = '
    ancestor-or-self::*/@lang = "c"
    and (
        not(ancestor-or-self::*/@lang != "c")
        or (
            count(ancestor-or-self::*[@lang != "c"])
            < count(ancestor-or-self::*[@lang = "c"])

$result = $xp->query($expression);

function printResult($result)
    global $xp;

    if ($result) {
        printf("Result (%d Nodes):\n", $result->length);
        foreach ($result as $index => $node) {
            $depth = $xp->evaluate('count(ancestor::*)', $node);
            printf("#%d (%d): %s\n", $index, $depth, $node->ownerDocument->saveXML($node));
    } else {
        printf("No Result, query failed.\n");



source to share

2 answers


   not(@lang) and ancestor::*[@lang][1]/@lang = 'c'


This selects any XML-document element that has an attribute lang

with a value "c"

or does not have the attribute lang

and attribute values lang

of its first ancestor which has lang, "c"


Simplest equivalent XPath expression :



Here is a snapshot of the selection taken with the XPath Visualizer :

enter image description here



Expected XPath

//*[(descendant-or-self::*/@lang = 'c' and not(descendant-or-self::*/@lang != 'c')) or (ancestor-or-self::*/@lang = 'c' and not(ancestor-or-self::*/@lang != 'c'))]



xml     c#0 (lang: c)
para    c#1 (lang: c)
para    c#2 (lang: c)
para    c#3 (lang: c)




All Articles