Haskell Assignment - direction needed to split a string into words

we started working on Haskell a few weeks ago and just got our first assignment. I know SO doesn't like homework, so I won't ask how to do this. Instead, it would be greatly appreciated if someone could nudge me in the right direction. Seeing that this might not be a specific question, would it be more pertinent in discussions / communities?

Question: Tokenize String, that is: "Hello, World!" → ["Hello", "World"]

Coming from a Java background, I have to forget all about the usual way of doing this. The problem is that I am still very unfamiliar with Haskell. This is what I came up with:

module Main where

main :: IO()
main = do putStrLn "Type in a string:\n"
          x <- getLine
          putStrLn "The string entered was:"
          putStrLn x
          putStrLn "\n"
          print (tokenize x)

tokenize :: String -> [String]
tokenize [] = []
tokenize l = token l ++ tokenize l

token :: String -> String
token [] = []
token l = takeWhile (isAlphaNum) l


What will be the first glaring mistake? Thank.


source to share

5 answers

The first glaring mistake is

tokenize l = token l ++ tokenize l


(++) :: [a] -> [a] -> [a]

adds two lists of the same type. Since token :: String -> String

(and type String = [Char]

), the type tokenize

that is inferred from this string is tokenize :: String -> String

. This is where you should use (:) :: a -> [a] -> [a]


The next mistake on this line is that in a recursive call, you enter the same input again l

, so you have infinite recursion, always doing the same thing without change. You have to remove the first token (and slightly more) from the input for the argument for the recursive call.

Another problem is that yours token

assumes input starts with alphanumeric characters.

You also need a function that provides a condition for what you are passing token




This line results in an infinite list (this is fine, since Haskell is lazy, so the list is only created on request), since it repeats without changing the arguments:

tokenize l = token l ++ tokenize l


We can visualize what happens when tokenize is called like:

tokenize l = token l ++ tokenize l
           = token l ++ (token l ++ tokenize l)
           = token l ++ (token l ++ (token l ++ tokenize l))
           = ...


To stop this, you need to change the argument tokenize

so that it repeats intelligently:

tokenize l = token l ++ tokenize <something goes here>




As others have already pointed out, your mistake is just a small hint: while you've already found a very useful feature takeWhile

, you should take a look at span

, as it might be even more useful here.



There is something about this that looks like the monarch of parsing. However, since you are new to Haskell, it is unlikely that you will be able to understand how syntactic mosaics work (or use them in your code). To give you the basics, consider what you want:

tokenize :: String -> [String]


This takes a String, translates it into multiple parts, and generates a list of strings that match the words in the input string. How can we imagine this? We want to find a function that processes one line, and at the first space character adds that line to the sequence of words. But then you must process what is left. (Ie, the rest of the line.) For example, let's say you want tokenize:

Brown fox jumped

You will first pull out "The" and then continue processing the "brown fox" (note the space at the beginning of the second line). You will be doing this recursively, so naturally you need a recursive function.

The natural solution that sticks out is to take something where you accumulate the set of lines you have provided so far, keep humming the current input until you hit space, then accumulate what you saw on the current line (this leads to an implementation where you mostly do things and then flip things around occasionally).



Your exercise seemed a little difficult to me, so I decided to solve it just for self-study. This is what I came up with:

import Data.List
import Data.Maybe

splitByAnyOf yss xs = 
  foldr (\ys acc -> concat $ map (splitBy ys) acc) [xs] yss

splitBy ys xs = 
  case (precedingElements ys xs, succeedingElements ys xs) of
    (Just "", Just s) -> splitBy ys s
    (Just p, Just "") -> [p]
    (Just p, Just s) -> p : splitBy ys s
    otherwise -> [xs]

succeedingElements ys xs = 
  fromMaybe Nothing . find isJust $ map (stripPrefix ys) $ tails xs

precedingElements ys xs = 
  fromMaybe Nothing . find isJust $ map (stripSuffix ys) $ inits xs
    stripSuffix ys xs = 
      if ys `isSuffixOf` xs then Just $ take (length xs - length ys) xs
      else Nothing

main = do
  print $ splitBy "!" "Hello, World!"
  print $ splitBy ", " "Hello, World!"
  print $ splitByAnyOf [", ", "!"] "Hello, World!"



["Hello, World"]




All Articles