It's basically two steps.
The first step takes a ton of data and boils it down to ONE variable.
That variable needs to be used in a number of places in the next steps.
It doesn't make sense to create a temporary file like:
… but instead it seems cleaner to just use something like
result = FOREACH input GENERATE $0 * VARIABLE;
… but the question is how do I get the variable into Pig.
I don't see a way which is straight forward.
One thing I was thinking of doing is splitting up the Job into two pig
Then running the first, getting the variable, and passing it as a param into
the remaining scripts.
Is this what pretty much everyone else does?
Maybe this should be in the FAQ.
Location: *San Francisco, CA*
Skype-in: *(415) 871-0687*