R spell checker / tokenizer
I'm not sure if R is the right place to try this or not, but here's my situation. I have a character vector full of strings.
id Words 1 'The' 2 'victory' 3 'wasgreat' ... ...
The original data had some encoding problems and some of the strings were concatenating several words:
(ie 'My name is' -> 'Mynameis').
I need to leave the correct words alone and split the misspelled contributions into their correct substrings.
I'm curious if there is any setting in R to solve this problem. I think there are several programs in python that can handle this much better, but my python skills are considerably weaker (bordering on nonexistent). However, I would like to consider this as an alternative.
source to share