Haskell Regex Capture Group
I am using Text.Regex.TDFA in Lazy ByteString to extract some information from a file.
I need to extract every byte from this line:
27 FB D9 59 50 56 6C 8A
Here's what I tried (my line starts with a space):
(\\ ([0-9A-Fa-f]{2}))+
but i have 2 problems:
- Only the last match is returned [["27 FB D9 59 50 56 6C 8A", "8A", "8A"]]
- I want to make an external group without surrender (ex:?: In other versions)
Here is my minimal code:
import System.IO ()
import Data.ByteString.Lazy.Char8 as L
import Text.Regex.TDFA
main::IO()
main = do
let input = L.pack " 27 FB D9 59 50 56 6C 8A"
let entries = input =~ "(\\ ([0-9A-Fa-f]{2}))+" :: [[L.ByteString]]
print entries
source to share
When you attach a multiplier to a capture group, the engine only returns the last match. See rexegg.com/regex-capture.html#groupnumbers for a good explanation.
In the first pass, use this regex, similar to what you've already used (using a case-insensitive variant):
^([\dA-F]+) +([\dA-F]+) +(\d+) +([\dA-F]+)(( [\dA-F]{2})+)
You will get the following comparable groups:
Use the fifth as the target of the second pass to extract every single byte (using the "global" option):
([0-9A-Fa-f]{2})
Then each match will be returned separately.
Note. you don't need to escape whitespace like you did in the original regex.
source to share