FAQ
Hi Jeff,
It does not sound like you need properties (or a configuration). It sounds
like you want to pass arguments to your LoadFunc. You can create a LoadFunc
that takes an arbitrary number of String arguments. For example, the
default loader, PigStorage, takes 2 arguments: the first is a delimiter
(let's ignore the 2nd arg for now, it's advanced). So if you have a file
delimited by a colon rather than a tab, you can say this:

mystuff = load '/some/path' using PigStorage(':');

And this will cause the PigStorage(String delimiter) constructor to be
called. PigStorage will store the delimiter and use it to parse records.
The same constructor will be called on the client side (during parsing) and
on the server side (in the mapper task initialization.

Now, if you want users to be able to change arguments without modifying the
script, you can parametrize the script. So instead you could say

mystuff = load 'some/path' using PigStorage('$DELIM');

and call your script with "pig --param DELIM=':' myscript.pig". That way
you can change the delimiter at invocation time.

If you do for some reason want to change job properties, you can use the -D
flag (eg, '-Dpig.exec.mapPartAgg=true'). This will be available via the
Configuration *and* via Properties -- I don't want to get into the
differences because it's messy, but basically they are somewhat
interchangeable and you use whichever one is handy. If you are trying to
set a property from inside the code, you probably want to change it in the
JobConf.

The difference between -p and -D is that one is a parameter to the script,
while the other is more of an environment setting.

Hope this helps,

-D

On Sun, Feb 24, 2013 at 5:07 PM, Jeff Yuan wrote:

Thanks for the pointers Prashant. I will take a look at PigStorage.

I have a system for storing metadata, so users don't have to specify it.

With respect to the properties, I guess my question is, are the ones
passed in from the command line via -p stored in Property or
Configuration from the UDFContext? What's the difference between
Property and Configuration?

Thanks.

On Sun, Feb 24, 2013 at 4:02 PM, Prashant Kommireddi
wrote:
Hi Jeff,

How do you see your loader being used? Would users specify schema file or
would that be something your loader sets without user being aware of it?
Can you pass it in as a constructor argument instead?

UDFContext could be used, like you said to set/retrieve properties. You
might want to take a look at PigStorage that does something very similar
(look for the method applySchema(Tuple tup) )
On Sun, Feb 24, 2013 at 3:33 PM, Jeff Yuan wrote:

I'm trying to write a loader, extending LoadFunc, to read a specific
file format.

My question, how do I pass properties to it (for example the schema of
the file type I'm loading)? Would it be using the -p parameter from
the cmdline when issuing the query?

The second part of the question is, how would I access the passed in
property/configuration from the code? So far I'm theorizing it's
something like this:
Properties p = udfc.getUDFProperties(this.getClass(), new
String[]{ contextSignature });
Configuration conf = udfc.getJobConf();
Then get it from p or conf?

Thanks a lot for any pointers.

-Jeff

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 4 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedFeb 24, '13 at 11:33p
activeFeb 25, '13 at 3:25a
posts4
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase