PIG: how to remove '::' in column name
I have a pig relationship like below:
FINAL = {input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray,test_1:: type: chararray,test_2::name:chararray}
I am trying to store all columns for a relation input_md5
to a hive table. like everyone else input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray
, not acceptingtest_1:: type: chararray,test_2::name:chararray
is there any command in swing that filters only columns input_md5
. Something like below:
STORE= FOREACH FINAL GENERATE all input_md5::type .
I know that pigs have:
FOREACH FINAL GENERATE all input_md5::type as type
but i have a lot of columns so i cant use as
in my code.
Because when I try:
STORE= FOREACH FINAL GENERATE input_md5::type .. bus_input_md5::name;
The pig gives an error:
org.apache.hive.hcatalog.common.HCatException : 2007 : Invalid column position in partition schema : Expected column <type> at position 1, found column <input_md5::type>
Thanks in advance,
source to share
Fixed this issue, below is fixed:
Create a relationship with some filter condition as shown below:
DUMMY_RELATION= FILTER SOURCE_TABLE BY type== '';
(I took a column named type, this can be filtered by any column in the table, all that matters is we need its schema)
FINAL_DATASET= UNION DUMMY_RELATION,SCHEMA_1,SCHEMA_2;
(this new DUMMY_RELATION
n should be placed 1st in the union) Now you no longer have an operator ::
. And your column names will match the column names of the hive table if your source table (DUMMY_RELATION) and targeting table have the same column order.
Thank you yourself :)
source to share
I have implemented the Neethu example this way. May have typos, but it shows how to implement this idea.
tableA = LOAD 'default.tableA' USING org.apache.hive.hcatalog.pig.HCatLoader();
tableB = LOAD 'default.tableB' USING org.apache.hive.hcatalog.pig.HCatLoader();
--load empty table
finalTable = LOAD 'default.finalTable' USING org.apache.hive.hcatalog.pig.HCatLoader();
--example operations that end up with '::' in column names
g = group tableB by (id);
j = JOIN tableA by id LEFT, g by group;
result = foreach j generate tableA::id, tableA::col2, g::tableB;
--union empty finalTable and result
result2 = union finalTable, result;
--bob your uncle
STORE result2 INTO 'finalTable' USING org.apache.hive.hcatalog.pig.HCatStorer();
Thanks Neethu!
source to share