Haskell Regex Capture Group

I am using Text.Regex.TDFA in Lazy ByteString to extract some information from a file.

I need to extract every byte from this line:

 27 FB D9 59 50 56 6C 8A

      

Here's what I tried (my line starts with a space):

(\\ ([0-9A-Fa-f]{2}))+

      

but i have 2 problems:

  • Only the last match is returned [["27 FB D9 59 50 56 6C 8A", "8A", "8A"]]
  • I want to make an external group without surrender (ex:?: In other versions)

Here is my minimal code:

import System.IO ()
import Data.ByteString.Lazy.Char8 as L
import Text.Regex.TDFA


main::IO()
main = do
    let input = L.pack " 27 FB D9 59 50 56 6C 8A"
    let entries = input =~ "(\\ ([0-9A-Fa-f]{2}))+" :: [[L.ByteString]]
    print  entries

      

+3


source to share


1 answer


When you attach a multiplier to a capture group, the engine only returns the last match. See rexegg.com/regex-capture.html#groupnumbers for a good explanation.

In the first pass, use this regex, similar to what you've already used (using a case-insensitive variant):

^([\dA-F]+) +([\dA-F]+) +(\d+) +([\dA-F]+)(( [\dA-F]{2})+)

You will get the following comparable groups:

enter image description here



Use the fifth as the target of the second pass to extract every single byte (using the "global" option):

([0-9A-Fa-f]{2})

Then each match will be returned separately.

Note. you don't need to escape whitespace like you did in the original regex.

+2


source







All Articles