R spell checker / tokenizer
I'm not sure if R is the right place to try this or not, but here's my situation. I have a character vector full of strings.
id Words
1 'The'
2 'victory'
3 'wasgreat'
... ...
The original data had some encoding problems and some of the strings were concatenating several words:
(ie 'My name is' -> 'Mynameis').
I need to leave the correct words alone and split the misspelled contributions into their correct substrings.
I'm curious if there is any setting in R to solve this problem. I think there are several programs in python that can handle this much better, but my python skills are considerably weaker (bordering on nonexistent). However, I would like to consider this as an alternative.
Any suggestions?
+3
source to share