How to parse a GPX file using Haskell xml-conduit?
I would like to use xml-conduit
GPX to parse files. So far I have the following:
{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative
import Data.Text as T
import Text.XML
import Text.XML.Cursor
data Trkpt = Trkpt {
trkptLat :: Text,
trkptLon :: Text,
trkptEle :: Text,
trkptTime :: Text
} deriving (Show)
trkptsFromFile path =
gpxTrkpts . fromDocument <$> Text.XML.readFile def path
gpxTrkpts =
child >=> element "{http://www.topografix.com/GPX/1/0}trk" >=>
child >=> element "{http://www.topografix.com/GPX/1/0}trkseg" >=>
child >=> element "{http://www.topografix.com/GPX/1/0}trkpt" >=>
child >=> \e -> do
let ele = T.concat $ element "{http://www.topografix.com/GPX/1/0}ele" e >>= descendant >>= content
let time = T.concat $ element "{http://www.topografix.com/GPX/1/0}time" e >>= descendant >>= content
let lat = T.concat $ attribute "lat" e
let lon = T.concat $ attribute "lon" e
return $ Trkpt lat lon ele time
Sample GPX file here .
I get strange results when the parsed text is mostly empty, with some sporadic actual values, even though the original GPX file data is valid. When there is an actual value, it is in only one of the fields in the record.
I am pretty sure I am using the API incorrectly xml-conduit
. What am I doing wrong?
source to share
Two questions. First, there is a typo in the namespace; it should be http://www.topografix.com/GPX/1/1
. Second, your final Kleisley arrow ( \e -> do -- etc.
) acts on the children of the elements trkpt
, not themselves trkpt
. Here's gpxTrkpts
one that should do what you want:
gpxTrkpts =
child >=> element "{http://www.topografix.com/GPX/1/1}trk" >=>
child >=> element "{http://www.topografix.com/GPX/1/1}trkseg" >=>
child >=> element "{http://www.topografix.com/GPX/1/1}trkpt" >=>
\e -> do
let cs = child e
ele = T.concat $ cs >>= element "{http://www.topografix.com/GPX/1/1}ele" >>= descendant >>= content
time = T.concat $ cs >>= element "{http://www.topografix.com/GPX/1/1}time" >>= descendant >>= content
lat = T.concat $ attribute "lat" e
lon = T.concat $ attribute "lon" e
return $ Trkpt lat lon ele time
source to share
@duplode pointed out the problem. Here are some more comments.
-
How about using the gpx-conduit package
-
Here's some code that can help you debug parsing problems:
code:
{-# LANGUAGE OverloadedStrings #-}
module Lib2 where
import qualified Data.Text as T
import Data.Text (Text)
import Text.XML
import Text.XML.Cursor
import qualified Filesystem.Path.CurrentOS as Path
import Control.Monad
showNode (NodeElement e) = "NodeEement " ++ T.unpack (nameLocalName $ elementName e)
showNode (NodeInstruction _) = "NodeInstruction ..."
showNode (NodeContent t) = "NodeContent " ++ show t
showNode (NodeComment _) = "NodeComment"
testParser parser = do
content <- Text.XML.readFile def (Path.decodeString "sample.xml")
let nodes = map node $ parser (fromDocument content)
forM_ nodes $ \n -> putStrLn (showNode n)
Use it in ghci like this:
ghci> :set -XOverloadedStrings
ghci> :l Lib2
Lib2> testParser child
NodeContent "\n "
NodeEement metadata
NodeContent "\n "
NodeEement trk
NodeContent "\n "
NodeEement extensions
NodeContent "\n"
Lib2> testParser $ child >=> element "trk"
Lib2> testParser $ child >=> laxElement "trk"
NodeEement trk
Lib2> testParser $ child >=> laxElement "trk" >=> child >=> laxElement "trkseg"
NodeElement trkseg
Lib2> testParser $ child >=> laxElement "trk" >=> child >=> laxElement "trkseg" >=> child >=> laxElement "trkpt"
NodeEement trkpt
NodeEement trkpt
NodeEement trkpt
NodeEement trkpt
Lib2>
source to share