How to parse a GPX file using Haskell xml-conduit?

I would like to use xml-conduit

GPX to parse files. So far I have the following:

{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative
import Data.Text           as T
import Text.XML
import Text.XML.Cursor

data Trkpt = Trkpt {
  trkptLat :: Text,
  trkptLon :: Text,
  trkptEle :: Text,
  trkptTime :: Text
  } deriving (Show)

trkptsFromFile path =
  gpxTrkpts . fromDocument <$> Text.XML.readFile def path

gpxTrkpts =
  child >=> element "{http://www.topografix.com/GPX/1/0}trk" >=>
  child >=> element "{http://www.topografix.com/GPX/1/0}trkseg" >=>
  child >=> element "{http://www.topografix.com/GPX/1/0}trkpt" >=>
  child >=> \e -> do
    let ele  = T.concat $ element "{http://www.topografix.com/GPX/1/0}ele" e >>= descendant >>= content
    let time = T.concat $ element "{http://www.topografix.com/GPX/1/0}time" e >>= descendant >>= content
    let lat  = T.concat $ attribute "lat" e
    let lon  = T.concat $ attribute "lon" e
    return $ Trkpt lat lon ele time

      

Sample GPX file here .

I get strange results when the parsed text is mostly empty, with some sporadic actual values, even though the original GPX file data is valid. When there is an actual value, it is in only one of the fields in the record.

I am pretty sure I am using the API incorrectly xml-conduit

. What am I doing wrong?

+3


source to share


2 answers


Two questions. First, there is a typo in the namespace; it should be http://www.topografix.com/GPX/1/1

. Second, your final Kleisley arrow ( \e -> do -- etc.

) acts on the children of the elements trkpt

, not themselves trkpt

. Here's gpxTrkpts

one that should do what you want:



gpxTrkpts =
  child >=> element "{http://www.topografix.com/GPX/1/1}trk" >=>
  child >=> element "{http://www.topografix.com/GPX/1/1}trkseg" >=>
  child >=> element "{http://www.topografix.com/GPX/1/1}trkpt" >=>
  \e -> do
    let cs = child e
        ele  = T.concat $ cs >>= element "{http://www.topografix.com/GPX/1/1}ele" >>= descendant >>= content
        time = T.concat $ cs >>= element "{http://www.topografix.com/GPX/1/1}time" >>= descendant >>= content
        lat  = T.concat $ attribute "lat" e
        lon  = T.concat $ attribute "lon" e
    return $ Trkpt lat lon ele time

      

+2


source


@duplode pointed out the problem. Here are some more comments.

  • How about using the gpx-conduit package

  • Here's some code that can help you debug parsing problems:

code:



{-# LANGUAGE OverloadedStrings #-}
module Lib2 where

import qualified Data.Text           as T
import Data.Text (Text)
import Text.XML
import Text.XML.Cursor
import qualified Filesystem.Path.CurrentOS as Path
import Control.Monad

showNode (NodeElement e)     = "NodeEement " ++ T.unpack (nameLocalName $ elementName e)
showNode (NodeInstruction _) = "NodeInstruction ..."
showNode (NodeContent t)     = "NodeContent " ++ show t
showNode (NodeComment _)     = "NodeComment"

testParser parser =  do
  content <- Text.XML.readFile def (Path.decodeString "sample.xml")
  let nodes = map node $ parser (fromDocument content)
  forM_ nodes $ \n -> putStrLn (showNode n)

      

Use it in ghci like this:

ghci> :set -XOverloadedStrings
ghci> :l Lib2
Lib2> testParser child
NodeContent "\n  "
NodeEement metadata
NodeContent "\n  "
NodeEement trk
NodeContent "\n  "
NodeEement extensions
NodeContent "\n"

Lib2> testParser $ child >=> element "trk"
Lib2> testParser $ child >=> laxElement "trk"
NodeEement trk

Lib2> testParser $ child >=> laxElement "trk" >=> child >=> laxElement "trkseg"
NodeElement trkseg
Lib2> testParser $ child >=> laxElement "trk" >=> child >=> laxElement "trkseg" >=> child >=> laxElement "trkpt"
NodeEement trkpt
NodeEement trkpt
NodeEement trkpt
NodeEement trkpt
Lib2>

      

+2


source







All Articles