Split ByteString by ByteString (instead of Word8 or Char)
I know that I already have a Haskell Data.ByteString.Lazy function to split the CSV into one character, like this:
split :: Word8 -> ByteString -> [ByteString]
But I want to split into multi-character ByteString (like splitting to String instead of Char):
split :: ByteString -> ByteString -> [ByteString]
I have multi-character delimiters in a csv-like text file I need for parsing, and the individual characters appear themselves in some fields, so selecting one delimiter character and discarding the others will pollute the data import.
I had some ideas on how to do this, but they seem to be a hack (for example, take three Word8s, check if they are a combination of delimiters, start a new field, if any, repeat the procedure) and I Imagine I am all equally invented the wheel. Is there a way to do this without rebuilding the function from scratch?
source to share
The Bytestrings documentation breakSubstring
contains a function that does what you ask:
tokenise x y = h : if null t then [] else tokenise x (drop (length x) t)
where (h,t) = breakSubstring x y
source to share
There are several functions in bytestring for subsequence splitting:
breakSubstring :: ByteString -> ByteString -> (ByteString,ByteString)
There also
- package bytestring-csv, http://hackage.haskell.org/package/bytestring-csv
- broken package: http://hackage.haskell.org/package/split for strings.
source to share