Haskell Assignment - direction needed to split a string into words
we started working on Haskell a few weeks ago and just got our first assignment. I know SO doesn't like homework, so I won't ask how to do this. Instead, it would be greatly appreciated if someone could nudge me in the right direction. Seeing that this might not be a specific question, would it be more pertinent in discussions / communities?
Question: Tokenize String, that is: "Hello, World!" → ["Hello", "World"]
Coming from a Java background, I have to forget all about the usual way of doing this. The problem is that I am still very unfamiliar with Haskell. This is what I came up with:
module Main where
main :: IO()
main = do putStrLn "Type in a string:\n"
x <- getLine
putStrLn "The string entered was:"
putStrLn x
putStrLn "\n"
print (tokenize x)
tokenize :: String -> [String]
tokenize [] = []
tokenize l = token l ++ tokenize l
token :: String -> String
token [] = []
token l = takeWhile (isAlphaNum) l
What will be the first glaring mistake? Thank.
source to share
The first glaring mistake is
tokenize l = token l ++ tokenize l
(++) :: [a] -> [a] -> [a]
adds two lists of the same type. Since token :: String -> String
(and type String = [Char]
), the type tokenize
that is inferred from this string is tokenize :: String -> String
. This is where you should use (:) :: a -> [a] -> [a]
.
The next mistake on this line is that in a recursive call, you enter the same input again l
, so you have infinite recursion, always doing the same thing without change. You have to remove the first token (and slightly more) from the input for the argument for the recursive call.
Another problem is that yours token
assumes input starts with alphanumeric characters.
You also need a function that provides a condition for what you are passing token
.
source to share
This line results in an infinite list (this is fine, since Haskell is lazy, so the list is only created on request), since it repeats without changing the arguments:
tokenize l = token l ++ tokenize l
We can visualize what happens when tokenize is called like:
tokenize l = token l ++ tokenize l
= token l ++ (token l ++ tokenize l)
= token l ++ (token l ++ (token l ++ tokenize l))
= ...
To stop this, you need to change the argument tokenize
so that it repeats intelligently:
tokenize l = token l ++ tokenize <something goes here>
source to share
There is something about this that looks like the monarch of parsing. However, since you are new to Haskell, it is unlikely that you will be able to understand how syntactic mosaics work (or use them in your code). To give you the basics, consider what you want:
tokenize :: String -> [String]
This takes a String, translates it into multiple parts, and generates a list of strings that match the words in the input string. How can we imagine this? We want to find a function that processes one line, and at the first space character adds that line to the sequence of words. But then you must process what is left. (Ie, the rest of the line.) For example, let's say you want tokenize:
Brown fox jumped
You will first pull out "The" and then continue processing the "brown fox" (note the space at the beginning of the second line). You will be doing this recursively, so naturally you need a recursive function.
The natural solution that sticks out is to take something where you accumulate the set of lines you have provided so far, keep humming the current input until you hit space, then accumulate what you saw on the current line (this leads to an implementation where you mostly do things and then flip things around occasionally).
source to share
Your exercise seemed a little difficult to me, so I decided to solve it just for self-study. This is what I came up with:
import Data.List
import Data.Maybe
splitByAnyOf yss xs =
foldr (\ys acc -> concat $ map (splitBy ys) acc) [xs] yss
splitBy ys xs =
case (precedingElements ys xs, succeedingElements ys xs) of
(Just "", Just s) -> splitBy ys s
(Just p, Just "") -> [p]
(Just p, Just s) -> p : splitBy ys s
otherwise -> [xs]
succeedingElements ys xs =
fromMaybe Nothing . find isJust $ map (stripPrefix ys) $ tails xs
precedingElements ys xs =
fromMaybe Nothing . find isJust $ map (stripSuffix ys) $ inits xs
where
stripSuffix ys xs =
if ys `isSuffixOf` xs then Just $ take (length xs - length ys) xs
else Nothing
main = do
print $ splitBy "!" "Hello, World!"
print $ splitBy ", " "Hello, World!"
print $ splitByAnyOf [", ", "!"] "Hello, World!"
outputs:
["Hello, World"]
["Hello","World!"]
["Hello","World"]
source to share