It's basically two steps.
The first step takes a ton of data and boils it down to ONE variable.
That variable needs to be used in a number of places in the next steps.
It doesn't make sense to create a temporary file like:
1, VAR
2, VAR
3, VAR
… but instead it seems cleaner to just use something like
result = FOREACH input GENERATE $0 * VARIABLE;
… but the question is how do I get the variable into Pig.
I don't see a way which is straight forward.
One thing I was thinking of doing is splitting up the Job into two pig
files.
Then running the first, getting the variable, and passing it as a param into
the remaining scripts.
Is this what pretty much everyone else does?
Maybe this should be in the FAQ.
--
Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
Skype-in: *(415) 871-0687*