Importing a generic constants file declared by pigs into other pigs files

Purpose . Define constants (% declare and% default) in .pig constants for modularity of code and import them into other pig files.

As per the docs: http://pig.apache.org/docs/r0.12.0/cont.html#import-macros,% declare and% default are valid macro statements.

Problem: Pig cannot find the declared parameter.

Pig File: .pig constants

 %declare ACTIVE_VALUES 'UK';

      

Pig: a.pig

 IMPORT 'constants.pig';

 A = LOAD 'a.csv' using PigStorage(',') AS (country_code:chararray, country_name:chararray);
 B = FILTER A BY country_code == '$ACTIVE_VALUES';
 dump B;

      

Login: a.csv

IN,India
US,United States
UK,United Kingdom

      

Mistake

Error before Pig is launched
----------------------------
ERROR 2997: Encountered IOException.      org.apache.pig.tools.parameters.ParameterSubstitutionException: Undefined parameter : ACTIVE_VALUES

 java.io.IOException: org.apache.pig.tools.parameters.ParameterSubstitutionException: Undefined parameter : ACTIVE_VALUES
at org.apache.pig.impl.PigContext.doParamSubstitution(PigContext.java:414)
at org.apache.pig.Main.runParamPreprocessor(Main.java:810)
at org.apache.pig.Main.run(Main.java:588)
at org.apache.pig.Main.main(Main.java:170)
Caused by: org.apache.pig.tools.parameters.ParameterSubstitutionException: Undefined parameter : ACTIVE_VALUES
at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:355)
at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:303)
at org.apache.pig.tools.parameters.PigFileParser.input(PigFileParser.java:67)
at org.apache.pig.tools.parameters.PigFileParser.Parse(PigFileParser.java:43)
at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:95)
at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:76)
at org.apache.pig.impl.PigContext.doParamSubstitution(PigContext.java:410)
... 3 more

      

My understanding of IMPORT is that the content of the imported pig will be executed and available from the calling pig script. If so, the declared parameter should be available in the lead import file.

Any ins / thoughts on having a generic pig script file that will have constants declared and import it into other pig files to achieve code modularity.

Update:

JIRA has already been raised on this issue. Link below links for details

+3


source to share


2 answers


The keyword is IMPORT

used to import macros, not constants. %declare

and %default

are preprocessor instructions, and its scope is all other lines in the script. If you declare it in a script, but import it from another, it won't work because it is out of scope.

Both statements are valid in a macro if you use the declared variable inside the macro. If you need to define constants outside of a script for modularity, you need to use a options file:

ACTIVE_VALUES = 'UK'

      

And then run your Pig script like this:

pig -param_file your_params_file.properties -f your_script.pig

      



If you really want to use IMPORT

, you can create a macro that takes care of filtering with this constant value:

%declare ACTIVE_VALUES 'UK';

DEFINE my_custom_filter(A) RETURNS B {
   $B = FILTER $A BY $0 == '$ACTIVE_VALUES ';
};

      

And then import it like you did in your script, but instead of calling the function, FILTER

call your own macro:

IMPORT 'macro.pig';

A = LOAD 'a.csv' using PigStorage(',') AS (country_code:chararray, country_name:chararray);
B = my_custom_filter(A);
dump B;

      

+3


source


Despite the hackery, another possible solution is to use a python controller, and in this python controller, the concatenation of these two files. You can read about controllers here .

This is potentially what it might look like, and will least break your current structure:



#!/usr/bin/python 
from org.apache.pig.scripting import Pig 

def readfile(f):
    out = []
    with open(f, 'r') as infile:
        for line in infile:
            out.append(file)
    return out

constants = readfile('constants.pig')
script = readfile('a.pig')

# Compile
P = Pig.compile('\n'.join(constants + scripts))

# Run
result = P.bind({}).runSingle()

      

However, you can also try passing the variables you want to change in the dictionary that is the argument to the method bind

. This is the same process as using parameter substitution , and I recommend doing it this way.

+1


source







All Articles