FAQ
Basically, I want a way to be able to see the schema of something from
within a pig script outside of pig, ideally without having to connect to
hadoop to do so.

So for example, we take a random script...

a = LOAD blah AS (one:int, two:chararray, three:int);
b = FOREACH a GENERATE one, two;

ideally I want a way to get the result of DESCRIBE b; but from outside of
pig.

One ugly way I can think of would be to sort of create a temporary script,
append DESCRIBE b;, get rid of any stores and dumbs, run the job locally,
and then only take the result.

I was hoping there might be a nicer way to do it, OR, if not, how do I run
that sort of thing locally, forcing pig not to go onto my hadoop cluster?

I appreciate your help
Jon

Search Discussions

  • Dmitriy Ryaboy at Dec 28, 2010 at 11:27 pm
    Do the ugly thing, and you can run in pig -x local for local mode (though
    you might run into trouble with Pig trying to verify existence of files).
    PigUnit does essentially the same thing by overriding the Pig parser and
    simply replacing the parsing code for STOREs :)

    D
    On Tue, Dec 28, 2010 at 8:22 AM, Jonathan Coveney wrote:

    Basically, I want a way to be able to see the schema of something from
    within a pig script outside of pig, ideally without having to connect to
    hadoop to do so.

    So for example, we take a random script...

    a = LOAD blah AS (one:int, two:chararray, three:int);
    b = FOREACH a GENERATE one, two;

    ideally I want a way to get the result of DESCRIBE b; but from outside of
    pig.

    One ugly way I can think of would be to sort of create a temporary script,
    append DESCRIBE b;, get rid of any stores and dumbs, run the job locally,
    and then only take the result.

    I was hoping there might be a nicer way to do it, OR, if not, how do I run
    that sort of thing locally, forcing pig not to go onto my hadoop cluster?

    I appreciate your help
    Jon
  • Jonathan Coveney at Dec 28, 2010 at 11:39 pm
    Haha that's funny, that's exactly what I ended up doing. Python does the job admirably, now if only my python udfs would work :s

    Sent via BlackBerry

    -----Original Message-----
    From: Dmitriy Ryaboy <dvryaboy@gmail.com>
    Date: Tue, 28 Dec 2010 15:27:02
    To: <user@pig.apache.org>
    Reply-To: user@pig.apache.org
    Subject: Re: Getting the results of DEFINE from outside of pig?

    Do the ugly thing, and you can run in pig -x local for local mode (though
    you might run into trouble with Pig trying to verify existence of files).
    PigUnit does essentially the same thing by overriding the Pig parser and
    simply replacing the parsing code for STOREs :)

    D
    On Tue, Dec 28, 2010 at 8:22 AM, Jonathan Coveney wrote:

    Basically, I want a way to be able to see the schema of something from
    within a pig script outside of pig, ideally without having to connect to
    hadoop to do so.

    So for example, we take a random script...

    a = LOAD blah AS (one:int, two:chararray, three:int);
    b = FOREACH a GENERATE one, two;

    ideally I want a way to get the result of DESCRIBE b; but from outside of
    pig.

    One ugly way I can think of would be to sort of create a temporary script,
    append DESCRIBE b;, get rid of any stores and dumbs, run the job locally,
    and then only take the result.

    I was hoping there might be a nicer way to do it, OR, if not, how do I run
    that sort of thing locally, forcing pig not to go onto my hadoop cluster?

    I appreciate your help
    Jon

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedDec 28, '10 at 4:22p
activeDec 28, '10 at 11:39p
posts3
users2
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase