Split ByteString by ByteString (instead of Word8 or Char)

Question

Split ByteString by ByteString (instead of Word8 or Char)

I know that I already have a Haskell Data.ByteString.Lazy function to split the CSV into one character, like this:

split :: Word8 -> ByteString -> [ByteString]

But I want to split into multi-character ByteString (like splitting to String instead of Char):

split :: ByteString -> ByteString -> [ByteString]

I have multi-character delimiters in a csv-like text file I need for parsing, and the individual characters appear themselves in some fields, so selecting one delimiter character and discarding the others will pollute the data import.

I had some ideas on how to do this, but they seem to be a hack (for example, take three Word8s, check if they are a combination of delimiters, start a new field, if any, repeat the procedure) and I Imagine I am all equally invented the wheel. Is there a way to do this without rebuilding the function from scratch?

+2

string text haskell csv bytestring

Daniel Quinlan 09 Sep '09 at 8:30

source to share

2 answers

There are several functions in bytestring for subsequence splitting:

breakSubstring :: ByteString -> ByteString -> (ByteString,ByteString)

There also

package bytestring-csv, http://hackage.haskell.org/package/bytestring-csv
broken package: http://hackage.haskell.org/package/split for strings.

+2

Don stewart 09 Sep 09 at 10:48

source to share

sth · Accepted Answer · 2009-09-09T11:24:18+0000

The Bytestrings documentation breakSubstring

contains a function that does what you ask:

tokenise x y = h : if null t then [] else tokenise x (drop (length x) t)
    where (h,t) = breakSubstring x y

Split ByteString by ByteString (instead of Word8 or Char)

More articles: