Grokbase Groups Pig user August 2011
FAQ
I have a complex algorithm that I'm mapping to pig.

It's basically two steps.

The first step takes a ton of data and boils it down to ONE variable.

That variable needs to be used in a number of places in the next steps.

It doesn't make sense to create a temporary file like:

1, VAR
2, VAR
3, VAR

… but instead it seems cleaner to just use something like

result = FOREACH input GENERATE $0 * VARIABLE;

… but the question is how do I get the variable into Pig.

I don't see a way which is straight forward.

One thing I was thinking of doing is splitting up the Job into two pig
files.

Then running the first, getting the variable, and passing it as a param into
the remaining scripts.

Is this what pretty much everyone else does?

Maybe this should be in the FAQ.

--

Founder/CEO Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

Search Discussions

  • Dmitriy Ryaboy at Aug 18, 2011 at 6:40 am
    Kevin,
    Check out the "scalar" feature in Pig:
    https://pig.apache.org/docs/r0.9.0/basic.html (under "Casting Relations to
    Scalars")

    D
    On Wed, Aug 17, 2011 at 10:57 PM, Kevin Burton wrote:

    I have a complex algorithm that I'm mapping to pig.

    It's basically two steps.

    The first step takes a ton of data and boils it down to ONE variable.

    That variable needs to be used in a number of places in the next steps.

    It doesn't make sense to create a temporary file like:

    1, VAR
    2, VAR
    3, VAR

    … but instead it seems cleaner to just use something like

    result = FOREACH input GENERATE $0 * VARIABLE;

    … but the question is how do I get the variable into Pig.

    I don't see a way which is straight forward.

    One thing I was thinking of doing is splitting up the Job into two pig
    files.

    Then running the first, getting the variable, and passing it as a param
    into
    the remaining scripts.

    Is this what pretty much everyone else does?

    Maybe this should be in the FAQ.

    --

    Founder/CEO Spinn3r.com

    Location: *San Francisco, CA*
    Skype: *burtonator*

    Skype-in: *(415) 871-0687*

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 18, '11 at 5:58a
activeAug 18, '11 at 6:40a
posts2
users2
websitepig.apache.org

2 users in discussion

Kevin Burton: 1 post Dmitriy Ryaboy: 1 post

People

Translate

site design / logo © 2022 Grokbase