Unable to remove special characters in swing
I have a text file that I want to upload to my Pig Engine, The text file has names in it on separate lines and data has errors in it ..... special characters .... Something like this:
Ja@@$s000on J@@a%^ke T!!ina Mel@ani
I want to remove special characters from all names using REGEX .... One way to find a job at the pig and finally get the output as ...
Jason
Jake
Tina
Melani
Can someone please tell me a regex that will make this work on Pig. Also write a command that will do this as I cannot use the REGEX_EXTRACT and REGEX_EXTRACT_ALL function .
Also can someone explain what the value of the number 1 is, which we pass to this function as an Argument after defining the regex.
Any help would be much appreciated.
You can use REPLACE with RegEx to solve this problem.
input.txt
Ja@@$s000on
J@@a%^ke T!!ina Mel@ani
PigScript:
A = LOAD 'input.txt' as line;
B = FOREACH A GENERATE REPLACE(line,'([^a-zA-Z\\s]+)','');
dump B;
Output:
(Jason)
(Jake Tina Melani)
There is no way to escape these characters when they are part of the values in a tuple, bag, or map, but there is no problem whatsoever in loading these characters in when part of a string. Just specify that field as type chararray
Please take a look here