Grokbase Groups Pig user March 2010
FAQ
I'd be happy to put these together into a NOOB faq =).

Please feel free to forward me to the docs where I might have missed this.

How do I generate a simple Tuple? I have a value, say a sum, and I want to
just generate a tuple that's ('TOTAL CATS', 2L). Basically, after all is
said and done, I want my output file to look like this:

<DATE>, <COUNT of one interesting value>, <COUNT of another interesting
value>,<COUNT of a third interesting value>

I've figured out how to get the interesting values to a single TUPLE, but I
want to get it to a point where I can create a tuple and then STORE it.

I'm a fairly reasonably trained SQL developer. I think a lot of people
coming at this will be SQL conversant. It might be helpful (again, I'd help
once I know what I'm doing) to have examples that deal with CSV crunching
for SQL minded folk, no?

Something to the effect of:

Here's your CSV, here's how you break it into Tuples, here's a bunch of
examples as if this was a table and you were trying to run reports.

This way, I think it would help map to familiar territory faster.

Anyways, thanks for listening to the rambling. I'm really digging this
stuff!

Cory

Search Discussions

  • Dmitriy Ryaboy at Mar 1, 2010 at 4:50 pm
    Cory,
    FOREACH is generally the way you transform tuples to generate new ones.

    Here's an example script that may make things clearer (writing it free-hand,
    there may be syntax errors :)

    raw_data = LOAD '/user/dmitriy/petshop/*.txt' USING PigStorage(',') as (id,
    species, is_brown, can_swim);

    by_species = GROUP raw_data BY species;

    summary = FOREACH by_species GENERATE
    group AS species, COUNT(raw_data) AS num_animals,
    SUM(is_brown) AS brown_ones,
    SUM(can_swim) AS swimming_ones,
    SUM ( (is_brown AND can_swim) ? 1 : 0) AS brown_swimming_pets;

    store summary into '/user/dmitriy/petshop_summary';
    On Sun, Feb 28, 2010 at 11:29 PM, Cory Radcliff wrote:

    I'd be happy to put these together into a NOOB faq =).

    Please feel free to forward me to the docs where I might have missed this.

    How do I generate a simple Tuple? I have a value, say a sum, and I want to
    just generate a tuple that's ('TOTAL CATS', 2L). Basically, after all is
    said and done, I want my output file to look like this:

    <DATE>, <COUNT of one interesting value>, <COUNT of another interesting
    value>,<COUNT of a third interesting value>

    I've figured out how to get the interesting values to a single TUPLE, but I
    want to get it to a point where I can create a tuple and then STORE it.

    I'm a fairly reasonably trained SQL developer. I think a lot of people
    coming at this will be SQL conversant. It might be helpful (again, I'd help
    once I know what I'm doing) to have examples that deal with CSV crunching
    for SQL minded folk, no?

    Something to the effect of:

    Here's your CSV, here's how you break it into Tuples, here's a bunch of
    examples as if this was a table and you were trying to run reports.

    This way, I think it would help map to familiar territory faster.

    Anyways, thanks for listening to the rambling. I'm really digging this
    stuff!

    Cory

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMar 1, '10 at 7:29a
activeMar 1, '10 at 4:50p
posts2
users2
websitepig.apache.org

2 users in discussion

Dmitriy Ryaboy: 1 post Cory Radcliff: 1 post

People

Translate

site design / logo © 2022 Grokbase