Clojure XML Stream Exception

I am getting an exception parsing the XML file with clojure.data.xml

because the stream is closed before the parsing is complete.

I don't understand why it doall

doesn't force the XML data to be evaluated before with-open

closing it (as this linked answer suggested )

(:require [clojure.java.io :as io]
          [clojure.data.xml :as xml])

(defn file->xml [path] 
  (with-open [rdr (-> path io/resource io/reader)] 
    (doall (xml/parse rdr))))

      

Which throws an exception:

(file->xml "example.xml")
;-> XMLStreamException ParseError at [row,col]:[80,1926]
Message: Stream closed com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next

      

If I remove the wrapper with-open

, it returns the XML data as expected (so the file is legal, although the reader is not guaranteed to be closed).

I can see which is (source xml/parse)

giving lazy results:

(defn parse
  "Parses the source, which can be an
   InputStream or Reader, and returns a lazy tree of Element records. 
   Accepts key pairs with XMLInputFactory options, see http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLInputFactory.html
   and xml-input-factory-props for more information. 
   Defaults coalescing true."
   [source & opts]
     (event-tree (event-seq source opts)))

      

it might be related, but the function I have is very similar to the "round-trip" example on the clojure.data.xml README .

What am I missing here?

+3


source to share


1 answer


I was surprised to see this behavior. It looks like the clojure.data.xml.Element

(return type) implements a "lazy map" type that is immune to effects doall

.

Here's a solution that converts lazy values ​​to normal maps:

(ns tst.clj.core
  (:use clj.core clojure.test tupelo.test)
  (:require
    [tupelo.core :as t]
    [clojure.string :as str]
    [clojure.pprint :refer [pprint]]
    [clojure.java.io :as io]
    [clojure.data.xml :as xml]
    [clojure.walk :refer [postwalk]]
  ))
(t/refer-tupelo)

(defn unlazy
  [coll]
  (let [unlazy-item (fn [item]
                      (cond
                        (sequential? item) (vec item)
                        (map? item) (into {} item)
                        :else item))
        result    (postwalk unlazy-item coll) ]
    result ))

(defn file->xml [path]
  (with-open [rdr (-> path io/resource io/reader) ]
    (let [lazy-vals    (xml/parse rdr)
          eager-vals   (unlazy lazy-vals) ]
      eager-vals)))
(pprint (file->xml "books.xml"))

{:tag :catalog,
 :attrs {},
 :content
 [{:tag :book,
   :attrs {:id "bk101"},
   :content
   [{:tag :author, :attrs {}, :content ["Gambardella, Matthew"]}
    {:tag :title, :attrs {}, :content ["XML Developer Guide"]}
    {:tag :genre, :attrs {}, :content ["Computer"]}
    {:tag :price, :attrs {}, :content ["44.95"]}
    {:tag :publish_date, :attrs {}, :content ["2000-10-01"]}
    {:tag :description,
     :attrs {},
     :content
     ["An in-depth look at creating applications\n      with XML."]}]}
  {:tag :book,
   :attrs {:id "bk102"},
   :content
   [{:tag :author, :attrs {}, :content ["Ralls, Kim"]}
    {:tag :title, :attrs {}, :content ["Midnight Rain"]}
    {:tag :genre, :attrs {}, :content ["Fantasy"]}
    {:tag :price, :attrs {}, :content ["5.95"]}
    {:tag :publish_date, :attrs {}, :content ["2000-12-16"]}
    {:tag :description,
     :attrs {},
     :content
     ["A former architect battles corporate zombies,\n      an evil sorceress, and her own childhood to become queen\n      of the world."]}]}
  {:tag :book,
   :attrs {:id "bk103"},
   :content .....

      

Since it clojure.data.xml.Element

implements clojure.lang.IPersistentMap

, use (map? item)

returns true.

Here are sample data for books.xml

Note:



clojure.data.xml

different from clojure.xml

. You may have to explore both libraries to find the one that best suits your needs.

You can also use the crossclj.info

api docs to search as needed:

Update:

About a week after I saw this question, I ran into an XML parsing problem like this that needed a function unlazy

. You can now find unlazy

in the Tupelo library .

+3


source







All Articles