FAQ
hey all,

Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.

I get errors such as Error during parsing. Invalid alias: serverin {time: double,count: double}
I am basically trying to duplicate the following SQL query:

select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
from TableA
group by 1, 2, 3



;My script is as follows:

a = LOAD 'Report' AS (
dt:chararray,
Server:chararray,
Type:chararray,
Ops:chararray,
UserID:chararray,
U_cnt:int,
U_tm:int,
U_min_tm:int,
U_max_tm:int,
U_avg_tm:float,
)


;--Remove Test Servers
remtest = filter a by not Server matches 'Test%'
;-- Filter to required columns
reqd = foreach remtest generate $1,$2,$3,$4,$5,$6
;--Groupby
G2 = group reqd by Server,Type,Ops
;--Sum the User Counts and Times
G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count
;--byServeroperation = order G3 by Server
;store G3 into 'Servertest'

;ingvay7

Search Discussions

  • Ingvay7 at Nov 13, 2012 at 4:34 pm
    hey all,

    Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.

    I get errors such as Error during parsing. Invalid alias: serverin {time: double,count: double}
    I am basically trying to duplicate the following SQL query:

    select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
    from TableA
    group by 1, 2, 3;



    My script is as follows:

    a = LOAD 'Report' AS (
    dt:chararray,
    Server:chararray,
    Type:chararray,
    Ops:chararray,
    UserID:chararray,
    U_cnt:int,
    U_tm:int,
    U_min_tm:int,
    U_max_tm:int,
    U_avg_tm:float,
    );


    --Remove Test Servers
    remtest = filter a by not Server matches 'Test%';
    -- Filter to required columns
    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
    --Groupby
    G2 = group reqd by Server,Type,Ops;
    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
    --byServeroperation = order G3 by Server;
    store G3 into 'Servertest';

    ingvay7
  • Pablomar at Nov 13, 2012 at 4:59 pm
    just taking a quick look, I see a couple of errors:
    1_ your LOAD hast one more comma. You need to delete the last one, after
    U_avg_tm:float

    2_ and then, the group by, I think you need parenthesis
    G2 = group reqd by (Server,Type,Ops);

    by the way, where is you alias serverin ?


    On Tue, Nov 13, 2012 at 10:36 AM, ingvay7@yahoo.com wrote:

    hey all,

    Very new Pig user here. I think I'm trying to get something very simple
    done but getting a few errors. See me script below.Any guidance will be
    appreciated.Thanks.

    I get errors such as Error during parsing. Invalid alias: serverin {time:
    double,count: double}
    I am basically trying to duplicate the following SQL query:

    select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
    from TableA
    group by 1, 2, 3;



    My script is as follows:

    a = LOAD 'Report' AS (
    dt:chararray,
    Server:chararray,
    Type:chararray,
    Ops:chararray,
    UserID:chararray,
    U_cnt:int,
    U_tm:int,
    U_min_tm:int,
    U_max_tm:int,
    U_avg_tm:float,
    );


    --Remove Test Servers
    remtest = filter a by not Server matches 'Test%';
    -- Filter to required columns
    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
    --Groupby
    G2 = group reqd by Server,Type,Ops;
    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
    --byServeroperation = order G3 by Server;
    store G3 into 'Servertest';

    ingvay7
  • Prashant Kommireddi at Nov 13, 2012 at 4:59 pm
    Hi,

    Can you paste the error message here?

    Sent from my iPhone
    On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" wrote:

    hey all,

    Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.

    I get errors such as Error during parsing. Invalid alias: serverin {time: double,count: double}
    I am basically trying to duplicate the following SQL query:

    select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
    from TableA
    group by 1, 2, 3;



    My script is as follows:

    a = LOAD 'Report' AS (
    dt:chararray,
    Server:chararray,
    Type:chararray,
    Ops:chararray,
    UserID:chararray,
    U_cnt:int,
    U_tm:int,
    U_min_tm:int,
    U_max_tm:int,
    U_avg_tm:float,
    );


    --Remove Test Servers
    remtest = filter a by not Server matches 'Test%';
    -- Filter to required columns
    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
    --Groupby
    G2 = group reqd by Server,Type,Ops;
    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
    --byServeroperation = order G3 by Server;
    store G3 into 'Servertest';

    ingvay7
  • Vishwanath at Nov 13, 2012 at 5:26 pm
    This is the error I got:

    ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.

    Updated code:

    a = LOAD 'Report' AS (
    dt:chararray,
    Server:chararray,
    Type:chararray,
    Ops:chararray,
    UserID:chararray,
    U_cnt:int,
    U_tm:int,
    U_min_tm:int,
    U_max_tm:int,
    U_avg_tm:float
    );


    --Remove Test Servers
    remtest = filter a by not Server matches 'Test%';

    -- Filter to required columns
    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

    --Groupby
    G2 = group reqd by (Server,Type,Ops);

    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

    store G3 into 'Servertest';







    ----- Original Message -----
    From: Prashant Kommireddi <prash1784@gmail.com>
    To: "user@pig.apache.org" <user@pig.apache.org>
    Cc:
    Sent: Tuesday, November 13, 2012 11:59 AM
    Subject: Re: Help with Script

    Hi,

    Can you paste the error message here?

    Sent from my iPhone
    On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" wrote:

    hey all,

    Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.

    I get errors such as  Error during parsing. Invalid alias: serverin {time: double,count: double}
    I am basically trying to duplicate the following SQL query:

    select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
    from TableA
    group by 1, 2, 3;



    My script is as follows:

    a = LOAD 'Report' AS (
    dt:chararray,
    Server:chararray,
    Type:chararray,
    Ops:chararray,
    UserID:chararray,
    U_cnt:int,
    U_tm:int,
    U_min_tm:int,
    U_max_tm:int,
    U_avg_tm:float,
    );


    --Remove Test Servers
    remtest = filter a by not Server matches 'Test%';
    -- Filter to required columns
    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
    --Groupby
    G2 = group reqd by Server,Type,Ops;
    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
    --byServeroperation = order G3 by Server;
    store G3 into 'Servertest';

    ingvay7
  • Ingvay7 at Nov 13, 2012 at 5:58 pm
    (Apologies for resending but corrected script below)


    This is the error I got:

    ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.

    Updated code:

    a = LOAD 'Report' AS (
    dt:chararray,
    Server:chararray,
    Type:chararray,
    Ops:chararray,
    UserID:chararray,
    U_cnt:int,
    U_tm:int,
    U_min_tm:int,
    U_max_tm:int,
    U_avg_tm:float
    );


    --Remove Test Servers
    remtest = filter a by not Server matches 'Test%';

    -- Filter to required columns
    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

    --Groupby
    G2 = group reqd by (Server,Type,Ops);

    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

    store G3 into 'Servertest';







    ----- Original Message -----
    From: Prashant Kommireddi <prash1784@gmail.com>
    To: "user@pig.apache.org" <user@pig.apache.org>
    Cc:
    Sent: Tuesday, November 13, 2012 11:59 AM
    Subject: Re: Help with Script

    Hi,

    Can you paste the error message here?

    Sent from my iPhone
    On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" wrote:

    hey all,

    Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks.

    I get errors such as  Error during parsing. Invalid alias: serverin {time: double,count: double}
    I am basically trying to duplicate the following SQL query:

    select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
    from TableA
    group by 1, 2, 3;



    My script is as follows:

    a = LOAD 'Report' AS (
    dt:chararray,
    Server:chararray,
    Type:chararray,
    Ops:chararray,
    UserID:chararray,
    U_cnt:int,
    U_tm:int,
    U_min_tm:int,
    U_max_tm:int,
    U_avg_tm:float,
    );


    --Remove Test Servers
    remtest = filter a by not Server matches 'Test%';
    -- Filter to required columns
    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
    --Groupby
    G2 = group reqd by Server,Type,Ops;
    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
    --byServeroperation = order G3 by Server;
    store G3 into 'Servertest';

    ingvay7
  • Pablomar at Nov 13, 2012 at 6:37 pm
    what about something like this ?

    G3 = foreach G2 generate group,SUM(reqd.U_tm)as time,SUM(reqd.U_cnt)as
    count;


    On Tue, Nov 13, 2012 at 12:57 PM, ingvay7 wrote:

    (Apologies for resending but corrected script below)


    This is the error I got:

    ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the
    matching function for org.apache.pig.builtin.SUM as multiple or none of
    them fit. Please use an explicit cast.

    Updated code:

    a = LOAD 'Report' AS (
    dt:chararray,
    Server:chararray,
    Type:chararray,
    Ops:chararray,
    UserID:chararray,
    U_cnt:int,
    U_tm:int,
    U_min_tm:int,
    U_max_tm:int,
    U_avg_tm:float
    );


    --Remove Test Servers
    remtest = filter a by not Server matches 'Test%';

    -- Filter to required columns
    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

    --Groupby
    G2 = group reqd by (Server,Type,Ops);

    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

    store G3 into 'Servertest';







    ----- Original Message -----
    From: Prashant Kommireddi <prash1784@gmail.com>
    To: "user@pig.apache.org" <user@pig.apache.org>
    Cc:
    Sent: Tuesday, November 13, 2012 11:59 AM
    Subject: Re: Help with Script

    Hi,

    Can you paste the error message here?

    Sent from my iPhone
    On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" wrote:

    hey all,

    Very new Pig user here. I think I'm trying to get something very simple
    done but getting a few errors. See me script below.Any guidance will be
    appreciated.Thanks.
    I get errors such as Error during parsing. Invalid alias: serverin
    {time: double,count: double}
    I am basically trying to duplicate the following SQL query:

    select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)
    from TableA
    group by 1, 2, 3;



    My script is as follows:

    a = LOAD 'Report' AS (
    dt:chararray,
    Server:chararray,
    Type:chararray,
    Ops:chararray,
    UserID:chararray,
    U_cnt:int,
    U_tm:int,
    U_min_tm:int,
    U_max_tm:int,
    U_avg_tm:float,
    );


    --Remove Test Servers
    remtest = filter a by not Server matches 'Test%';
    -- Filter to required columns
    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;
    --Groupby
    G2 = group reqd by Server,Type,Ops;
    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;
    --byServeroperation = order G3 by Server;
    store G3 into 'Servertest';

    ingvay7
  • Prashant Kommireddi at Nov 13, 2012 at 6:41 pm
    SUM function requires that you specify the specific element from the
    grouping. In this case, U_tm and U_cnt are both within group/bags and need
    to be accessed as "reqd.U_tm" and "reqd.U_cnt".


    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(reqd.U_tm)as time,SUM(reqd.U_cnt)as
    count;

    On Nov 13, 2012, at 9:58 AM, ingvay7 wrote:

    (Apologies for resending but corrected script below)


    This is the error I got:

    ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the
    matching function for org.apache.pig.builtin.SUM as multiple or none of
    them fit. Please use an explicit cast.

    Updated code:

    a = LOAD 'Report' AS (
    dt:chararray,
    Server:chararray,
    Type:chararray,
    Ops:chararray,
    UserID:chararray,
    U_cnt:int,
    U_tm:int,
    U_min_tm:int,
    U_max_tm:int,
    U_avg_tm:float
    );


    --Remove Test Servers
    remtest = filter a by not Server matches 'Test%';

    -- Filter to required columns
    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

    --Groupby
    G2 = group reqd by (Server,Type,Ops);

    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

    store G3 into 'Servertest';







    ----- Original Message -----
    From: Prashant Kommireddi <prash1784@gmail.com>
    To: "user@pig.apache.org" <user@pig.apache.org>
    Cc:
    Sent: Tuesday, November 13, 2012 11:59 AM
    Subject: Re: Help with Script

    Hi,

    Can you paste the error message here?

    Sent from my iPhone

    On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" wrote:

    hey all,


    Very new Pig user here. I think I'm trying to get something very simple
    done but getting a few errors. See me script below.Any guidance will be
    appreciated.Thanks.


    I get errors such as Error during parsing. Invalid alias: serverin {time:
    double,count: double}

    I am basically trying to duplicate the following SQL query:


    select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)

    from TableA

    group by 1, 2, 3;




    My script is as follows:


    a = LOAD 'Report' AS (

    dt:chararray,

    Server:chararray,

    Type:chararray,

    Ops:chararray,

    UserID:chararray,

    U_cnt:int,

    U_tm:int,

    U_min_tm:int,

    U_max_tm:int,

    U_avg_tm:float,

    );



    --Remove Test Servers

    remtest = filter a by not Server matches 'Test%';

    -- Filter to required columns

    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

    --Groupby

    G2 = group reqd by Server,Type,Ops;

    --Sum the User Counts and Times

    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

    --byServeroperation = order G3 by Server;

    store G3 into 'Servertest';


    ingvay7
  • Ingvay7 at Nov 13, 2012 at 6:50 pm
    Thanks, Prashant and Pablomar. That fixed it!




    ----- Original Message -----
    From: Prashant Kommireddi <prash1784@gmail.com>
    To: "user@pig.apache.org" <user@pig.apache.org>
    Cc:
    Sent: Tuesday, November 13, 2012 1:40 PM
    Subject: Re: Help with Script

    SUM function requires that you specify the specific element from the
    grouping. In this case, U_tm and U_cnt are both within group/bags and need
    to be accessed as "reqd.U_tm" and "reqd.U_cnt".


    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(reqd.U_tm)as time,SUM(reqd.U_cnt)as
    count;

    On Nov 13, 2012, at 9:58 AM, ingvay7 wrote:

    (Apologies for resending but corrected script below)


    This is the error I got:

    ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the
    matching function for org.apache.pig.builtin.SUM as multiple or none of
    them fit. Please use an explicit cast.

    Updated code:

    a = LOAD 'Report' AS (
    dt:chararray,
    Server:chararray,
    Type:chararray,
    Ops:chararray,
    UserID:chararray,
    U_cnt:int,
    U_tm:int,
    U_min_tm:int,
    U_max_tm:int,
    U_avg_tm:float
    );


    --Remove Test Servers
    remtest = filter a by not Server matches 'Test%';

    -- Filter to required columns
    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

    --Groupby
    G2 = group reqd by (Server,Type,Ops);

    --Sum the User Counts and Times
    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

    store G3 into 'Servertest';







    ----- Original Message -----
    From: Prashant Kommireddi <prash1784@gmail.com>
    To: "user@pig.apache.org" <user@pig.apache.org>
    Cc:
    Sent: Tuesday, November 13, 2012 11:59 AM
    Subject: Re: Help with Script

    Hi,

    Can you paste the error message here?

    Sent from my iPhone

    On Nov 13, 2012, at 8:34 AM, "ingvay7@yahoo.com" wrote:

    hey all,


    Very new Pig user here. I think I'm trying to get something very simple
    done but getting a few errors. See me script below.Any guidance will be
    appreciated.Thanks.


    I get errors such as  Error during parsing. Invalid alias: serverin {time:
    double,count: double}

    I am basically trying to duplicate the following SQL query:


    select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt)

    from TableA

    group by 1, 2, 3;




    My script is as follows:


    a = LOAD 'Report' AS (

    dt:chararray,

    Server:chararray,

    Type:chararray,

    Ops:chararray,

    UserID:chararray,

    U_cnt:int,

    U_tm:int,

    U_min_tm:int,

    U_max_tm:int,

    U_avg_tm:float,

    );



    --Remove Test Servers

    remtest = filter a by not Server matches 'Test%';

    -- Filter to required columns

    reqd = foreach remtest generate $1,$2,$3,$4,$5,$6;

    --Groupby

    G2 = group reqd by Server,Type,Ops;

    --Sum the User Counts and Times

    G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count;

    --byServeroperation = order G3 by Server;

    store G3 into 'Servertest';


    ingvay7

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedNov 13, '12 at 4:12p
activeNov 13, '12 at 6:50p
posts9
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase