How to get nested nodes using xml-> in clojure.data.zip?
I find using xml-> extremely confusing. I have read the docs and examples but cannot figure out how to get the nested nodes of the xml document.
Suppose the following xml is in the zip (as from xml-zip):
<html>
<body>
<div class='one'>
<div class='two'></div>
</div>
</body>
</html>
I am trying to get a div back using class = 'two'.
I expected this to work:
(xml-> z :html :body :div :div)
Or that:
(xml-> z :html :body :div (attr= :class "two"))
Kind of like css selectors.
But it only returns the first level and does not search down the tree.
The only way to make it work is:
(xml-> z :html :body :div children leftmost?)
Is this what I have to do?
The whole reason I started using xml-> was for convenience and didn't allow the lightning to move up and down and left and right. If xml-> can't get nested nodes then I can't see the value above clojure.zip.
Thank.
source to share
Two consecutive ones :div
correspond to the same node. You should have gone down. And I believe that you forgot to get the node using zip/node
.
(ns reagenttest.sample
(:require
[clojure.zip :as zip]
[clojure.data.zip.xml :as data-zip]))
(let [s "..."
doc (xml/parse (java.io.ByteArrayInputStream. (.getBytes s)))]
(prn (data-zip/xml-> (zip/xml-zip doc) :html :body :div zip/down (data-zip/attr= :class "two") zip/node)))
or you can use an arbitrary abstraction if you're not happy with xml->
:
(defn xml->find [loc & path]
(let [new-path (conj (vec (butlast (interleave path (repeat zip/down)))) zip/node)]
(apply (partial data-zip/xml-> loc) new-path)))
Now you can do this:
(xml->find z :html :body :div :div)
(xml->find z :html :body :div (data-zip/attr= :class "two"))
source to share
You can solve this problem using tupelo.forest
from the Tupelo library . forest
contains functions for searching and processing data trees. It is similar to Enlive on steroids. Here is a solution for your data:
(dotest
(with-forest (new-forest)
(let [xml-str "<html>
<body>
<div class='one'>
<div class='two'></div>
</div>
</body>
</html>"
enlive-tree (->> xml-str
java.io.StringReader.
en-html/xml-resource
only)
root-hid (add-tree-enlive enlive-tree)
; Removing whitespace nodes is optional; just done to keep things neat
blank-leaf-hid? (fn [hid] (ts/whitespace? (hid->value hid))) ; whitespace pred fn
blank-leaf-hids (keep-if blank-leaf-hid? (all-leaf-hids)) ; find whitespace nodes
>> (apply remove-hid blank-leaf-hids) ; delete whitespace nodes found
; Can search for inner `div` 2 ways
result-1 (find-paths root-hid [:html :body :div :div]) ; explicit path from root
result-2 (find-paths root-hid [:** {:class "two"}]) ; wildcard path that ends in :class "two"
]
(is= result-1 result-2) ; both searches return the same path
(is= (hid->bush root-hid)
[{:tag :html}
[{:tag :body}
[{:class "one", :tag :div}
[{:class "two", :tag :div}]]]])
(is=
(format-paths result-1)
(format-paths result-2)
[[{:tag :html}
[{:tag :body}
[{:class "one", :tag :div}
[{:class "two", :tag :div}]]]]])
(is (val= (hid->elem (last (only result-1)))
{:attrs {:class "two", :tag :div}, :kids []})))))
There are many examples in unit tests and a demo demo file .
source to share