Hive RegexSerDe does not give correct output
I tried to parse the bottom line of the input using Hive RegexSerDe, but I am not getting the expected result. I really don't know if the problem is sitting in my regex request or in RegexSerDe. My regex query works as expected in another online regex simulator, but it doesn't work in hive RegexSerDe. Can someone please help me understand what is wrong here?
I am using apachehive-0.9.0 version.
1 :: Toy Story (1995) :: Adventure | Animation | Children | Comedy | Fantasy
My expected output:
1 Toy Story 1995 Adventure | Animation | Children | Comedy | Fantasy
My hive request:
CREATE TABLE myMovie3( id STRING, name STRING, year STRING, category STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "^(.*?)::(.*)\(([0-9]*)\)::(.*)$","output.format.string" = "%1$s %2$s %3$s %4$s") STORED AS TEXTFILE;
The actual output I got from the regex:
hive> select * from mymovie3; OK 1 Toy Story (1995)
source to share