Grokbase Groups Pig user May 2011
FAQ
Hi ,

I am trying to use pig to aggregate data from an applications log lines.

Most of the data in the input file have the following format:
A B C D E F

I am aggregating the data as follows:

A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
D = group A by (A, B,C,D,E,F);
E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
STORE E INTO '$in_dir._1' using PigStorage('\t');

In some cases i see the input lines are only : A B C D (E,F columns are missing)
Would the pig script ignore such lines.

Thanks & Regards,
Arun

Search Discussions

  • Jonathan Coveney at May 25, 2011 at 10:33 pm
    I believe it should null them out.

    2011/5/25 Arun Chandy Thomas <arunc_thomas@apple.com>
    Hi ,

    I am trying to use pig to aggregate data from an applications log lines.

    Most of the data in the input file have the following format:
    A B C D E F

    I am aggregating the data as follows:

    A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
    D = group A by (A, B,C,D,E,F);
    E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
    STORE E INTO '$in_dir._1' using PigStorage('\t');

    In some cases i see the input lines are only : A B C D
    (E,F columns are missing)
    Would the pig script ignore such lines.

    Thanks & Regards,
    Arun
  • Alan Gates at May 25, 2011 at 10:36 pm
    No, but you can make it by adding:

    B = filter A by E is not null;

    Alan.
    On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote:

    Hi ,

    I am trying to use pig to aggregate data from an applications log
    lines.

    Most of the data in the input file have the following format:
    A B C D E F

    I am aggregating the data as follows:

    A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
    D = group A by (A, B,C,D,E,F);
    E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as
    hit
    STORE E INTO '$in_dir._1' using PigStorage('\t');

    In some cases i see the input lines are only : A B C D (E,F columns
    are missing)
    Would the pig script ignore such lines.

    Thanks & Regards,
    Arun
  • Arun Chandy Thomas at May 25, 2011 at 10:43 pm
    Thanks for the quick reply, but my question is a little different.
    I am sorry if i am not clear in my initial post.

    I want the Pig script to consider E and F as null if the values are not present in the input line.

    So basically all the lines should be loaded while firing :
    A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
    irrespective of whether any of the fields are null or not.

    How can we achieve this?

    Thanks & Regards,
    Arun
    On May 25, 2011, at 3:35 PM, Alan Gates wrote:

    No, but you can make it by adding:

    B = filter A by E is not null;

    Alan.
    On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote:

    Hi ,

    I am trying to use pig to aggregate data from an applications log lines.

    Most of the data in the input file have the following format:
    A B C D E F

    I am aggregating the data as follows:

    A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
    D = group A by (A, B,C,D,E,F);
    E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
    STORE E INTO '$in_dir._1' using PigStorage('\t');

    In some cases i see the input lines are only : A B C D (E,F columns are missing)
    Would the pig script ignore such lines.

    Thanks & Regards,
    Arun
  • Sven Krasser at May 25, 2011 at 11:25 pm
    Are the tabs for these columns still there? In that case, there should
    be an empty string in there. Something like this should work then:

    Y = foreach X generate
    (A == '' ? null : A),
    (B == '' ? null : B),
    ...

    Otherwise, you could load the full line using TextLoader and then use
    STRSPLIT on it to extract your columns. That allows you to check if E
    and F are present.

    Best,
    -Sven

    On Wed, May 25, 2011 at 3:43 PM, Arun Chandy Thomas
    wrote:
    Thanks for the quick reply, but my question is a little different.
    I am sorry if i am not clear in my initial post.

    I want the Pig script to consider E and F as null if the values are not present in the input line.

    So basically all the lines should be loaded while firing :
    A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
    irrespective of whether any of the fields are null or not.

    How can we achieve this?

    Thanks & Regards,
    Arun
    On May 25, 2011, at 3:35 PM, Alan Gates wrote:

    No, but you can make it by adding:

    B = filter A by E is not null;

    Alan.
    On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote:

    Hi ,

    I am trying to use pig to aggregate data from an applications log lines.

    Most of the data in the input file have the following format:
    A       B       C       D       E       F

    I am aggregating the data as follows:

    A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
    D = group A by (A, B,C,D,E,F);
    E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
    STORE E INTO '$in_dir._1' using PigStorage('\t');

    In some cases i see the input lines are only : A     B       C       D  (E,F columns are missing)
    Would the pig script ignore such lines.

    Thanks & Regards,
    Arun


    --
    http://sites.google.com/site/krasser/
  • Olga Natkovich at May 25, 2011 at 11:58 pm
    This will happen with Pig 0.9. You can make it happen with Pig 0.8 if you provide type information in the schema of the load statement.

    Olga

    -----Original Message-----
    From: Arun Chandy Thomas
    Sent: Wednesday, May 25, 2011 3:43 PM
    To: user@pig.apache.org
    Subject: Re: Null values while loading

    Thanks for the quick reply, but my question is a little different.
    I am sorry if i am not clear in my initial post.

    I want the Pig script to consider E and F as null if the values are not present in the input line.

    So basically all the lines should be loaded while firing :
    A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
    irrespective of whether any of the fields are null or not.

    How can we achieve this?

    Thanks & Regards,
    Arun
    On May 25, 2011, at 3:35 PM, Alan Gates wrote:

    No, but you can make it by adding:

    B = filter A by E is not null;

    Alan.
    On May 25, 2011, at 3:22 PM, Arun Chandy Thomas wrote:

    Hi ,

    I am trying to use pig to aggregate data from an applications log lines.

    Most of the data in the input file have the following format:
    A B C D E F

    I am aggregating the data as follows:

    A= load '$in_dir' using PigStorage('\t') as (A, B,C,D,E,F);
    D = group A by (A, B,C,D,E,F);
    E = FOREACH D GENERATE FLATTEN(group) as (A, B,C,D,E,F ),COUNT(A) as hit
    STORE E INTO '$in_dir._1' using PigStorage('\t');

    In some cases i see the input lines are only : A B C D (E,F columns are missing)
    Would the pig script ignore such lines.

    Thanks & Regards,
    Arun

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMay 25, '11 at 10:22p
activeMay 25, '11 at 11:58p
posts6
users5
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase