Unable to remove special characters in swing

I have a text file that I want to upload to my Pig Engine, The text file has names in it on separate lines and data has errors in it ..... special characters .... Something like this:

Ja@@$s000on   
J@@a%^ke
T!!ina
Mel@ani

      

I want to remove special characters from all names using REGEX .... One way to find a job at the pig and finally get the output as ...

Jason
Jake
Tina
Melani

      

Can someone please tell me a regex that will make this work on Pig. Also write a command that will do this as I cannot use the REGEX_EXTRACT and REGEX_EXTRACT_ALL function .
Also can someone explain what the value of the number 1 is, which we pass to this function as an Argument after defining the regex.

Any help would be much appreciated.

+3


source to share


2 answers


You can use REPLACE with RegEx to solve this problem.



input.txt  
Ja@@$s000on  
J@@a%^ke T!!ina Mel@ani  

PigScript:
A = LOAD 'input.txt' as line;  
B = FOREACH A GENERATE REPLACE(line,'([^a-zA-Z\\s]+)','');  
dump B;  

Output:  
(Jason)  
(Jake Tina Melani)  

      

+8


source


There is no way to escape these characters when they are part of the values in a tuple, bag, or map, but there is no problem whatsoever in loading these characters in when part of a string. Just specify that field as type chararray



Please take a look here

0


source







All Articles