Split ByteString by ByteString (instead of Word8 or Char)

I know that I already have a Haskell Data.ByteString.Lazy function to split the CSV into one character, like this:

split :: Word8 -> ByteString -> [ByteString]

      

But I want to split into multi-character ByteString (like splitting to String instead of Char):

split :: ByteString -> ByteString -> [ByteString]

      

I have multi-character delimiters in a csv-like text file I need for parsing, and the individual characters appear themselves in some fields, so selecting one delimiter character and discarding the others will pollute the data import.

I had some ideas on how to do this, but they seem to be a hack (for example, take three Word8s, check if they are a combination of delimiters, start a new field, if any, repeat the procedure) and I Imagine I am all equally invented the wheel. Is there a way to do this without rebuilding the function from scratch?

+2


source to share


2 answers


The Bytestrings documentation breakSubstring

contains a function that does what you ask:



tokenise x y = h : if null t then [] else tokenise x (drop (length x) t)
    where (h,t) = breakSubstring x y

      

+2


source


There are several functions in bytestring for subsequence splitting:

breakSubstring :: ByteString -> ByteString -> (ByteString,ByteString)

      



There also

+2


source







All Articles