How do I wrap the entire list of matches from HXT with a datatype constructor?
I am currently studying HXT
using it for GPX analysis . An example is here . I have the following:
import Data.Time
import Text.XML.HXT.Core
data Gpx = Gpx [Trk] deriving (Show)
data Trk = Trk [TrkSeg] deriving (Show)
data TrkSeg = TrkSeg [TrkPt] deriving (Show)
data TrkPt = TrkPt Double Double deriving (Show)
parseGpx =
getChildren >>> isElem >>> hasName "gpx" >>>
getChildren >>> isElem >>> hasName "trk" >>>
parseGpxTrk >>> arr Gpx
parseGpxTrk = undefined
parseGpxTrkSegs = undefined
You can see that it is incomplete, but it should still be type-checking. Unfortunately, I already have the error:
Couldn't match type ‘Trk’ with ‘[Trk]’
Expected type: Trk -> Gpx
Actual type: [Trk] -> Gpx
In the first argument of ‘arr’, namely ‘Gpx’
In the second argument of ‘(>>>)’, namely ‘arr Gpx’
This error says that I am trying to pass every matched item from the arrow parseGpxTrk
through the constructor arr Gpx
, but I really want to pass the entire list of matches using the constructor arr Gpx
.
So how do I get HXT
(or arrows in general?) To pass matches as a list through my constructor arr Gpx
instead of passing every entry in the list through the constructor arr Gpx
?
source to share
Here is a solution that seems very good to me
{-# LANGUAGE Arrows #-} import Data.Maybe import Text.Read import Text.XML.HXT.Core import Control.Applicative data Gpx = Gpx [Trk] deriving (Show) data Trk = Trk [TrkSeg] deriving (Show) data TrkSeg = TrkSeg [TrkPt] deriving (Show) data TrkPt = TrkPt Double Double deriving (Show)
The hardest option is probably parseTrkPt
because for it to work properly, you need to handle the parsing String
before Double
, which might fail. I made the decision to return it Maybe TrkPt
and then process it further down the line:
elemsNamed :: ArrowXml cat => String -> cat XmlTree XmlTree elemsNamed name = isElem >>> hasName name parseTrkPt :: ArrowXml cat => cat XmlTree (Maybe TrkPt) parseTrkPt = elemsNamed "trkpt" >>> proc trkpt -> do lat <- getAttrValue "lat" -< trkpt lon <- getAttrValue "lon" -< trkpt returnA -< TrkPt <$> readMaybe lat <*> readMaybe lon
I also used the syntax proc
because I think it comes out much cleaner. TrkPt <$> readMaybe lat <*> readMaybe lon
is of type Maybe TrkPt
and will return Nothing
if any of the readMaybe
returns Nothing
. Now we can combine all the successful results:
parseTrkSeg :: (ArrowXml cat, ArrowList cat) => cat XmlTree TrkSeg parseTrkSeg = elemsNamed "trkseg" >>> (getChildren >>> parseTrkPt >>. catMaybes) >. TrkSeg
The parentheses are important here, it took me a while to figure this part out. Depending on where you place the parens you will get different results, for example [TrkSeg [TrkPt a b], TrkSeg [TrkPt c d]]
instead of [TrkSeg [TrkPt a b, TrkPt c d]]
. Similar patterns just follow alongside parsers:
parseTrk :: ArrowXml cat => cat XmlTree Trk parseTrk = elemsNamed "trk" >>> (getChildren >>> parseTrkSeg) >. Trk parseGpx :: ArrowXml cat => cat XmlTree Gpx parseGpx = elemsNamed "gpx" >>> (getChildren >>> parseTrk) >. Gpx
Then you can start it quite simply, although you still have to expand the root element:
main :: IO () main = do gpxs <- runX $ readDocument [withRemoveWS yes] "ana.gpx" >>> getChildren >>> parseGpx -- Pretty print the document forM_ gpxs $ \(Gpx trks) -> do putStrLn "GPX:" forM_ trks $ \(Trk segs) -> do putStrLn "\tTRK:" forM_ segs $ \(TrkSeg pts) -> do putStrLn "\t\tSEG:" forM_ pts $ \pt -> do putStr "\t\t\t" print pt
The trick is to use methods on a class ArrowList
, especially >.
one that has a type a b c -> ([c] -> d) -> a b d
. It concatenates the elements from ArrowList
, passes it to a function that converts it to a new type, then infers a new one ArrowList
to that new type d
.
If you want, you can even abstract this a bit for the last three parsers:
nestedListParser :: ArrowXml cat => String -> cat XmlTree a -> ([a] -> b) -> cat XmlTree b nestedListParser name subparser constructor = elemsNamed name >>> (getChildren >>> subparser) >. constructor parseTrkSeg :: (ArrowXml cat, ArrowList cat) => cat XmlTree TrkSeg parseTrkSeg = nestedListParser "trkseg" (parseTrkPt >>. catMaybes) TrkSeg parseTrk :: ArrowXml cat => cat XmlTree Trk parseTrk = nestedListParser "trk" parseTrkSeg Trk parseGpx :: ArrowXml cat => cat XmlTree Gpx parseGpx = nestedListParser "gpx" parseTrk Gpx
This can come in handy if you want to complete the rest of the grammar of the GPX file.
source to share