F #, FParsec, and calling a recursive stream parser

I am developing a multithreaded MIME parser using F # and FParsec. I develop iteratively and so this is very unrefined, fragile code - it only solves my first problem. Red, Green, Refactor.

I need to parse a stream, not a string that really throws me for a loop. Given this limitation, as I understand it, I need to call the parser recursively. How do I do this outside of my ken, at least with the way I have continued so far.

namespace MultipartMIMEParser

open FParsec
open System.IO

type private Post = { contentType : string
                    ; boundary    : string
                    ; subtype     : string
                    ; content     : string }

type MParser (s:Stream) =
  let ($) f x = f x
  let ascii = System.Text.Encoding.ASCII
  let str cs = System.String.Concat (cs:char list)
  let q = "\""
  let qP = pstring q
  let pSemicolon = pstring ";"
  let manyNoDoubleQuote = many $ noneOf q
  let enquoted = between qP qP manyNoDoubleQuote |>> str
  let skip = skipStringCI
  let pContentType = skip "content-type: "
                     >>. manyTill anyChar (attempt $ preturn () .>> pSemicolon)
                     |>> str
  let pBoundary = skip " boundary=" >>. enquoted
  let pSubtype = opt $ pSemicolon >>. skip " type=" >>. enquoted
  let pContent = many anyChar |>> str // TODO: The content parser needs to recurse on the stream.
  let pStream = pipe4 pContentType pBoundary pSubtype pContent
                      $ fun c b t s -> { contentType=c; boundary=b; subtype=t; content=s }
  let result s = match runParserOnStream pStream () "" s ascii with
                 | Success (r,_,_) -> r
                 | Failure (e,_,_) -> failwith (sprintf "%A" e)
  let r = result s
  member p.ContentType = r.contentType
  member p.Boundary = r.boundary
  member p.ContentSubtype = r.subtype
  member p.Content = r.content

      

First line of the POST example:

content-type: Multipart/related; boundary="RN-Http-Body-Boundary"; type="multipart/related"

It spans one line per file. Further subparts in the content include values content-type

that span multiple lines, so I know I'll have to refine my parsers if I reuse them.

Somehow I need to call pContent

with the results (string?) pBoundary

So that I can split the rest of the stream at the appropriate boundaries, and then somehow return multiple parts for the post content, each of which will be a separate post with titles and content (which, obviously should be something other than a string). My head is spinning. This code already seems too complex to parse a single line.

Thank you very much for your understanding and wisdom!

+1


source to share


1 answer


This is a snippet that can get you moving in the right direction.

Ask the parsers to spit out something with the same base type. For this purpose, I prefer to use F # discriminatory unions. If you really need to insert values ​​into the Post type, then go into the returned AST tree. This is the way I approach it.



#if INTERACTIVE
#r"""..\..\FParsecCS.dll"""    // ... edit path as appropriate to bin/debug, etc.
#r"""..\..\FParsec.dll"""
#endif

let packet = @"content-type: Multipart/related; boundary=""RN-Http-Body-Boundary""; type=""multipart/related""

--RN-Http-Body-Boundary
Message-ID: <25845033.1160080657073.JavaMail.webmethods@exshaw>
Mime-Version: 1.0
Content-Type: multipart/related; type=""application/xml"";
  boundary=""----=_Part_235_11184805.1160080657052""

------=_Part_235_11184805.1160080657052
Content-Type: Application/XML
Content-Transfer-Encoding: binary
Content-Location: RN-Preamble
Content-ID: <1430586.1160080657050.JavaMail.webmethods@exshaw>"

//XML document begins here...

type AST =
| Document of AST list
| Header of AST list
/// ie. Content-Type is the tag, and it consists of a list of key value pairs
| Tag of string * AST list  
| KeyValue of string * string
| Body of string

      

The AST DU above may represent the first pass of the example data you posted in your other question. It may be more subtle than that, but simpler is usually better. I mean, the final destination in your example is the Post type, and you can achieve that with simple pattern matching.

+2


source







All Articles