FAQ
Maybe this is off topic, but I used it in Java code with a parameter
array.

In MAIN (or UI, Input, etc.):

String[] params = new String[];

params[0]= "date';
params[1]="filter_regex";
runScript(params);

in runScript(String[] params, pigServer server, String inputPath, String
outputPath)

PigServer.registerQuery("data = Load "'+inputPath+'" USING
PigStorage('|') AS (date:chararray,comment:chararray);");
PigServer.registerQuery("filtered= FILTER data BY date=='"+params[0]+"'
AND comment=='"+params[1]+"';);
....


Just a thought...

Matt

-----Original Message-----
From: Saurav Datta
Sent: Wednesday, September 29, 2010 1:25 PM
To: pig-user@hadoop.apache.org
Subject: Re: Magic numbers in my pig scripts

Same here, I was coming to parameter substitution by reading from a
parameter file.

Here is how you declare the variable year, month and date .
A = load '/INPUTDIR/$year/$month/$date/input_test.dat' using
PigStorage(' ') as (field1, field2, field3) ;

Here is how you invoke the pig script, in local mode though .
pig -param_file param_file.cfg -x local testParamFile.pig


And below are the contents of the param_file.cfg, in the same
directory :
year='2010'
month='09'
date='19'

We are using Pig 0.7.0
Let me know if this helps.

Regards,
Saurav
On Sep 29, 2010, at 10:15 AM, Aniket Mokashi wrote:

http://wiki.apache.org/pig/ParameterSubstitution
http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html

Also, Pig 0.8 can have RECORD_TYPE_ALPHA take runtime values (alias
like
filtered_stuff_threshold).
https://issues.apache.org/jira/browse/PIG-1434

Thanks,
Aniket

-----Original Message-----
From: Saurav Datta
Sent: Wednesday, September 29, 2010 1:06 PM
To: pig-user@hadoop.apache.org
Subject: Re: Magic numbers in my pig scripts

Hi Eric,

As I understand, you would like to define the value of the filter at
run time, and this value would be taken from a file.
Am I correct ?

Regards,
Saurav
On Sep 29, 2010, at 10:00 AM, Eric Wadsworth wrote:

Hi folks!

I'm brand new to this list, so apologies if this is an inappropriate
newbie question, or is otherwise incorrect, but here goes.

I'm working with a bunch of pig scripts, and we're adding new ones
almost daily. They are getting more and more complex. The problem is
exacerbated by the proliferation of magic numbers throughout them.
As a software engineer, these are driving me nuts! The code is quite
brittle. There seems to be no way to centralize logic or even values.

For a simple example:
filtered_stuff = FILTER stuff by record_type == 23;

I'd prefer:
filtered_stuff = FILTER stuff by record_type == RECORD_TYPE_ALPHA;

Where RECORD_TYPE_ALPHA is defined in some other file that the pig
script consumes.

Sounds rather like the old C-style header files would be in order...

Am I missing something obvious here? How do you guys handle this
problem? (We're using pig 6 and are just starting to transition to
pig 7.)

Thanks! --- Eric Wadsworth

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 6 of 9 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedSep 29, '10 at 5:01p
activeSep 30, '10 at 8:31p
posts9
users6
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase