Saving a pig output to a Hive table in one copy

I would like to insert pig output into Hive tables (tables in Hive are already created with exact schema). You just need to insert the output values ​​into the table. I don't want the usual method where I first store in a file, then read that file from Hive and then insert into tables. I need to reduce this extra jump to be done.

Is it possible. If so, please tell me how can this be done?

thank

+3


source to share


2 answers


Ok. Create an outer hive table with schema layout somewhere in the HDFS directory. Let's admit

create external table emp_records(id int,
                                  name String,
                                  city String)
                                  row formatted delimited 
                                  fields terminated by '|'
                                  location '/user/cloudera/outputfiles/usecase1';

      

Just create the table as above and you don't need to upload any file to that directory.



Now write a Pig script that we read data for some input directory and then when you store the output of that Pig script like below

A =  LOAD 'inputfile.txt' USING PigStorage(',') AS(id:int,name:chararray,city:chararray);
B = FILTER A by id > = 678933;
C = FOREACH B GENERATE id,name,city;
STORE C INTO '/user/cloudera/outputfiles/usecase1' USING PigStorage('|');

      

Make sure that the trailing location, separator and final FOREACH clause in your Pigscript match the DHL schema.

+6


source


There are two approaches described below with an example of an "Employee" table for storing pig data in a beehive table. (The prerequisite is that the hive table must have already been created)

A =  LOAD 'EMPLOYEE.txt' USING PigStorage(',') AS(EMP_NUM:int,EMP_NAME:chararray,EMP_PHONE:int);

      

Approach 1: Using Hcatalog

// dump pig result to Hive using Hcatalog 
store A into 'Empdb.employee' using org.apache.hive.hcatalog.pig.HCatStorer();

      



(or)

Approach 2: Using HDFS Physical Location

// dump pig result to external hive warehouse location
STORE A INTO 'hdfs://<<nmhost>>:<<port>>/user/hive/warehouse/Empdb/employee/' USING PigStorage(',')

      

;

+3


source







All Articles