Clojure XML Stream Exception
I am getting an exception parsing the XML file with clojure.data.xml
because the stream is closed before the parsing is complete.
I don't understand why it doall
doesn't force the XML data to be evaluated before with-open
closing it (as this linked answer suggested )
(:require [clojure.java.io :as io]
[clojure.data.xml :as xml])
(defn file->xml [path]
(with-open [rdr (-> path io/resource io/reader)]
(doall (xml/parse rdr))))
Which throws an exception:
(file->xml "example.xml")
;-> XMLStreamException ParseError at [row,col]:[80,1926]
Message: Stream closed com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next
If I remove the wrapper with-open
, it returns the XML data as expected (so the file is legal, although the reader is not guaranteed to be closed).
I can see which is (source xml/parse)
giving lazy results:
(defn parse
"Parses the source, which can be an
InputStream or Reader, and returns a lazy tree of Element records.
Accepts key pairs with XMLInputFactory options, see http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLInputFactory.html
and xml-input-factory-props for more information.
Defaults coalescing true."
[source & opts]
(event-tree (event-seq source opts)))
it might be related, but the function I have is very similar to the "round-trip" example on the clojure.data.xml README .
What am I missing here?
source to share
I was surprised to see this behavior. It looks like the clojure.data.xml.Element
(return type) implements a "lazy map" type that is immune to effects doall
.
Here's a solution that converts lazy values ββto normal maps:
(ns tst.clj.core
(:use clj.core clojure.test tupelo.test)
(:require
[tupelo.core :as t]
[clojure.string :as str]
[clojure.pprint :refer [pprint]]
[clojure.java.io :as io]
[clojure.data.xml :as xml]
[clojure.walk :refer [postwalk]]
))
(t/refer-tupelo)
(defn unlazy
[coll]
(let [unlazy-item (fn [item]
(cond
(sequential? item) (vec item)
(map? item) (into {} item)
:else item))
result (postwalk unlazy-item coll) ]
result ))
(defn file->xml [path]
(with-open [rdr (-> path io/resource io/reader) ]
(let [lazy-vals (xml/parse rdr)
eager-vals (unlazy lazy-vals) ]
eager-vals)))
(pprint (file->xml "books.xml"))
{:tag :catalog,
:attrs {},
:content
[{:tag :book,
:attrs {:id "bk101"},
:content
[{:tag :author, :attrs {}, :content ["Gambardella, Matthew"]}
{:tag :title, :attrs {}, :content ["XML Developer Guide"]}
{:tag :genre, :attrs {}, :content ["Computer"]}
{:tag :price, :attrs {}, :content ["44.95"]}
{:tag :publish_date, :attrs {}, :content ["2000-10-01"]}
{:tag :description,
:attrs {},
:content
["An in-depth look at creating applications\n with XML."]}]}
{:tag :book,
:attrs {:id "bk102"},
:content
[{:tag :author, :attrs {}, :content ["Ralls, Kim"]}
{:tag :title, :attrs {}, :content ["Midnight Rain"]}
{:tag :genre, :attrs {}, :content ["Fantasy"]}
{:tag :price, :attrs {}, :content ["5.95"]}
{:tag :publish_date, :attrs {}, :content ["2000-12-16"]}
{:tag :description,
:attrs {},
:content
["A former architect battles corporate zombies,\n an evil sorceress, and her own childhood to become queen\n of the world."]}]}
{:tag :book,
:attrs {:id "bk103"},
:content .....
Since it clojure.data.xml.Element
implements clojure.lang.IPersistentMap
, use (map? item)
returns true.
Here are sample data for books.xml
Note:
clojure.data.xml
different from clojure.xml
. You may have to explore both libraries to find the one that best suits your needs.
You can also use the crossclj.info
api docs to search as needed:
- https://crossclj.info/doc/org.clojure/clojure/latest/clojure.xml.html
- https://crossclj.info/doc/org.clojure/data.xml/0.2.0-alpha2/index.html
Update:
About a week after I saw this question, I ran into an XML parsing problem like this that needed a function unlazy
. You can now find unlazy
in the Tupelo library .
source to share