PIG: how to remove '::' in column name

I have a pig relationship like below:

FINAL = {input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray,test_1:: type: chararray,test_2::name:chararray}

I am trying to store all columns for a relation input_md5

to a hive table. like everyone else input_md5::type: chararray,input_md5::name: chararray,input_md5::id: long,input_md5::age: chararray

, not acceptingtest_1:: type: chararray,test_2::name:chararray

is there any command in swing that filters only columns input_md5

. Something like below:

STORE= FOREACH FINAL GENERATE all input_md5::type .

I know that pigs have:

FOREACH FINAL GENERATE all input_md5::type as type

but i have a lot of columns so i cant use as

in my code.

Because when I try: STORE= FOREACH FINAL GENERATE input_md5::type .. bus_input_md5::name;

The pig gives an error:

org.apache.hive.hcatalog.common.HCatException : 2007 : Invalid column position in partition schema : Expected column <type> at position 1, found column <input_md5::type>

Thanks in advance,

+2


source to share


2 answers


Fixed this issue, below is fixed:

Create a relationship with some filter condition as shown below:

DUMMY_RELATION= FILTER SOURCE_TABLE BY type== '';

(I took a column named type, this can be filtered by any column in the table, all that matters is we need its schema)



FINAL_DATASET= UNION DUMMY_RELATION,SCHEMA_1,SCHEMA_2;

(this new DUMMY_RELATION

n should be placed 1st in the union) Now you no longer have an operator ::

. And your column names will match the column names of the hive table if your source table (DUMMY_RELATION) and targeting table have the same column order.

Thank you yourself :)

+5


source


I have implemented the Neethu example this way. May have typos, but it shows how to implement this idea.

tableA = LOAD 'default.tableA' USING org.apache.hive.hcatalog.pig.HCatLoader();
tableB = LOAD 'default.tableB' USING org.apache.hive.hcatalog.pig.HCatLoader();

--load empty table
finalTable = LOAD 'default.finalTable' USING org.apache.hive.hcatalog.pig.HCatLoader();

--example operations that end up with '::' in column names
g = group tableB by (id);
j = JOIN tableA by id LEFT, g by group;
result = foreach j generate tableA::id, tableA::col2, g::tableB;

--union empty finalTable and result
result2 = union finalTable, result;

--bob your uncle
STORE result2 INTO 'finalTable' USING org.apache.hive.hcatalog.pig.HCatStorer();

      



Thanks Neethu!

+2


source







All Articles