Select elements with a self or child attribute value but not "overridden" (for example, the lang attribute)
I am trying to emulate the interpretation of an attribute lang
similar to HTML or xml:lang
.
Given the following XML snippet:
<xml lang="c">
c#0
<para>c#1</para>
<para>c#2</para>
<para lang="d">
d#0
<para>d#1</para>
<para lang="c">c#3</para>
<para lang="d">
d#2
<para>d#3</para>
<para lang="c">c#4</para>
</para>
<para lang="c">
c#5
<para>c#6</para>
</para>
</para>
</xml>
I am having trouble phrasing an XPath 1.0 expression that returns all nodes of a specific language, eg c
. A node is the same as a similar xpath function lang()
for an attribute xml:lang
:
- It has an attribute
lang
with valuec
(//*[@lang = "c"]
) - -OR-
- One of them has an attribute
lang
with valuec
(//*[ancestor::*/@lang = "c"]
) - -AND- the node itself has no attribute
lang
at all - -AND NOT- if any of its parent nodes have an attribute
lang
other thanc
more "near" than the parent with the attributelang
c
(2.1 is "canceled").
- One of them has an attribute
Examples of matches with XML above and c
for lang
will give 7 nodes: C # 0 - C # 6.
<xml lang="c"> c#0 ... (direct match, lang="c")
<para>c#1</para> (parent has lang="c")
<para>c#2</para> (parent has lang="c")
<para lang="c">c#3</para> (direct match, lang="c")
<para lang="c">c#4</para> (direct match, lang="c")
<para lang="c"> c#5 ... (direct match, lang="c")
<para>c#6</para> (parent has lang="c", that parent is descending of
any other ancestor with lang="d")
I have a problem to describe this in the xpath request. Even I got better with xpath over the last year, this one really knocks me out.
No matter what I try, I am having trouble describing the oversaturated nature of an ancestor with a matching predicate over an ancestor with a non-matching predicate.
The examples given are only half the problem, since there are not only full attribute values, but also initial ones:
starts-with(@lang, concat("c", "-"))
But I would be happy to see that the brute force problem is solved first. I am testing PHP ( Online demo ):
<?php
header('Content-Type: text/plain');
$xml = <<<XML
<xml lang="c">
c#0
<para>c#1</para>
<para>c#2</para>
<para lang="d">
d#0
<para>d#1</para>
<para lang="c">c#3</para>
<para lang="d">
d#2
<para>d#3</para>
<para lang="c">c#4</para>
</para>
<para lang="c">
c#5
<para>c#6</para>
</para>
</para>
</xml>
XML;
$doc = new DOMDocument();
$doc->loadXML($xml);
$xp = new DOMXPath($doc);
$expression = '
//*[
ancestor-or-self::*/@lang = "c"
and (
not(ancestor-or-self::*/@lang != "c")
or (
count(ancestor-or-self::*[@lang != "c"])
< count(ancestor-or-self::*[@lang = "c"])
)
)
]';
$result = $xp->query($expression);
printResult($result);
function printResult($result)
{
global $xp;
if ($result) {
printf("Result (%d Nodes):\n", $result->length);
foreach ($result as $index => $node) {
$depth = $xp->evaluate('count(ancestor::*)', $node);
printf("#%d (%d): %s\n", $index, $depth, $node->ownerDocument->saveXML($node));
}
} else {
printf("No Result, query failed.\n");
}
}
source to share
Using
//*[@lang='c'
or
not(@lang) and ancestor::*[@lang][1]/@lang = 'c'
]
This selects any XML-document element that has an attribute lang
with a value "c"
or does not have the attribute lang
and attribute values lang
of its first ancestor which has lang, "c"
.
Simplest equivalent XPath expression :
//*[ancestor-or-self::*[@lang][1]/@lang='c']
Here is a snapshot of the selection taken with the XPath Visualizer :
source to share