FAQ
Excuse me I could have missed important part of PIG document and asked this trivial question here :) What is the best way to find out the total number of tuples (rows) in the bag of data loaded? For example, after "a = LOAD 'sth' AS (key, value); b = GROUP a BY key; c = FOREACH b GENERATE key;" I want to know how many tuples are loaded to 'a' and total number left in 'c'. One way might be to use a udf function. But is there a support of counting this in PIG?

Thanks,

Michael

## Search Discussions

• at Feb 24, 2010 at 12:10 am ⇧ c = FOREACH b GENERATE group as key, COUNT(a);

will give you the number of rows in a per key.

a_all = group a ALL;
a_count = FOREACH a_all GENERATE COUNT(a);

will give you the total number of rows in a.

Does that answer your question?

On Tue, Feb 23, 2010 at 3:54 PM, jiang licht wrote:

Excuse me I could have missed important part of PIG document and asked this
trivial question here :) What is the best way to find out the total number
of tuples (rows) in the bag of data loaded? For example, after "a = LOAD
'sth' AS (key, value); b = GROUP a BY key; c = FOREACH b GENERATE key;" I
want to know how many tuples are loaded to 'a' and total number left in 'c'.
One way might be to use a udf function. But is there a support of counting
this in PIG?

Thanks,

Michael

• at Feb 24, 2010 at 12:28 am ⇧ Thanks Dmitriy. That's not sth I want. I want sth just like that in SQL, you can get a number of total count of tuples (or other things of interest) and use that like a variable (sorry, I don't know if I should use variable here in PIG, but PIG passes command line parameter as a variable, right?). So, this variable will be convenient for quick calculation of statistics in PIG scripts. Though I also realize it might not be true to use a variable in this way in PIG. So, it might be a misconcept in my mind anyway...

Thanks,

Michael

--- On Tue, 2/23/10, Dmitriy Ryaboy wrote:

From: Dmitriy Ryaboy <dvryaboy@gmail.com>
Subject: Re: count total number of tuples in a bag?
To: pig-user@hadoop.apache.org
Date: Tuesday, February 23, 2010, 6:10 PM

c = FOREACH b GENERATE group as key, COUNT(a);

will give you the number of rows in a per key.

a_all = group a ALL;
a_count = FOREACH a_all GENERATE COUNT(a);

will give you the total number of rows in a.

Does that answer your question?

On Tue, Feb 23, 2010 at 3:54 PM, jiang licht wrote:

Excuse me I could have missed important part of PIG document and asked this
trivial question here :) What is the best way to find out the total number
of tuples (rows) in the bag of data loaded? For example, after "a = LOAD
'sth' AS (key, value); b = GROUP a BY key; c = FOREACH b GENERATE key;" I
want to know how many tuples are loaded to 'a' and total number left in 'c'.
One way might be to use a udf function. But is there a support of counting
this in PIG?

Thanks,

Michael

• at Feb 24, 2010 at 2:32 am ⇧ One way I can think of is to store the total number of tuple in one
specified place, and then load in your UDF when you wan to use it.

a_all = group a ALL;
a_count = FOREACH a_all GENERATE COUNT(a);
store a_count into 'your_store_place';
.....................

d = foreach c generate YourUDF(\$0);

On Tue, Feb 23, 2010 at 4:28 PM, jiang licht wrote:

Thanks Dmitriy. That's not sth I want. I want sth just like that in SQL,
you can get a number of total count of tuples (or other things of interest)
and use that like a variable (sorry, I don't know if I should use variable
here in PIG, but PIG passes command line parameter as a variable, right?).
So, this variable will be convenient for quick calculation of statistics in
PIG scripts. Though I also realize it might not be true to use a variable in
this way in PIG. So, it might be a misconcept in my mind anyway...

Thanks,

Michael

--- On Tue, 2/23/10, Dmitriy Ryaboy wrote:

From: Dmitriy Ryaboy <dvryaboy@gmail.com>
Subject: Re: count total number of tuples in a bag?
To: pig-user@hadoop.apache.org
Date: Tuesday, February 23, 2010, 6:10 PM

c = FOREACH b GENERATE group as key, COUNT(a);

will give you the number of rows in a per key.

a_all = group a ALL;
a_count = FOREACH a_all GENERATE COUNT(a);

will give you the total number of rows in a.

Does that answer your question?

On Tue, Feb 23, 2010 at 3:54 PM, jiang licht wrote:

Excuse me I could have missed important part of PIG document and asked this
trivial question here :) What is the best way to find out the total number
of tuples (rows) in the bag of data loaded? For example, after "a = LOAD
'sth' AS (key, value); b = GROUP a BY key; c = FOREACH b GENERATE key;" I
want to know how many tuples are loaded to 'a' and total number left in 'c'.
One way might be to use a udf function. But is there a support of counting
this in PIG?

Thanks,

Michael

--
Best Regards

Jeff Zhang
• at Feb 24, 2010 at 6:18 am ⇧ If there are handy variables to carry values here and there, that'd be helpful :)

Thanks,

Michael

--- On Tue, 2/23/10, Jeff Zhang wrote:

From: Jeff Zhang <zjffdu@gmail.com>
Subject: Re: count total number of tuples in a bag?
To: pig-user@hadoop.apache.org
Date: Tuesday, February 23, 2010, 8:32 PM

One way I can think of is to store the total number of tuple in one
specified place, and then load in your UDF when you wan to use it.

a_all = group a ALL;
a_count = FOREACH a_all GENERATE COUNT(a);
store a_count into 'your_store_place';
.....................

d = foreach c generate YourUDF(\$0);

On Tue, Feb 23, 2010 at 4:28 PM, jiang licht wrote:

Thanks Dmitriy. That's not sth I want. I want sth just like that in SQL,
you can get a number of total count of tuples (or other things of interest)
and use that like a variable (sorry, I don't know if I should use variable
here in PIG, but PIG passes command line parameter as a variable, right?).
So, this variable will be convenient for quick calculation of statistics in
PIG scripts. Though I also realize it might not be true to use a variable in
this way in PIG. So, it might be a misconcept in my mind anyway...

Thanks,

Michael

--- On Tue, 2/23/10, Dmitriy Ryaboy wrote:

From: Dmitriy Ryaboy <dvryaboy@gmail.com>
Subject: Re: count total number of tuples in a bag?
To: pig-user@hadoop.apache.org
Date: Tuesday, February 23, 2010, 6:10 PM

c = FOREACH b GENERATE group as key, COUNT(a);

will give you the number of rows in a per key.

a_all = group a ALL;
a_count = FOREACH a_all GENERATE COUNT(a);

will give you the total number of rows in a.

Does that answer your question?

On Tue, Feb 23, 2010 at 3:54 PM, jiang licht wrote:

Excuse me I could have missed important part of PIG document and asked this
trivial question here :) What is the best way to find out the total number
of tuples (rows) in the bag of data loaded? For example, after "a = LOAD
'sth' AS (key, value); b = GROUP a BY key; c = FOREACH b GENERATE key;" I
want to know how many tuples are loaded to 'a' and total number left in 'c'.
One way might be to use a udf function. But is there a support of counting
this in PIG?

Thanks,

Michael

--
Best Regards

Jeff Zhang
• at Feb 24, 2010 at 6:39 am ⇧ I guess I'd take back some thoughts considering PIG is specially designed to produce m/r jobs. Unlike command line parameters or those specified by %declare, which wont change their values during the life of the whole job (may consist of multiple m/r tasks), variables that can change values from time to time do not fit in m/r scheme, which is good for those applications in which data once created are usually for read only. But a suggestion could be to allow to create variables and assign values to them only once and they carry the same values from the point they are assigned values to the end of the program, which means once a variable is assigned a value, it becomes immutable. ofcoz, even this will create some difficulty e.g. the difficulty for optimization since it may add extra data dependency ...

Michael

--- On Wed, 2/24/10, jiang licht wrote:

From: jiang licht <licht_jiang@yahoo.com>
Subject: Re: count total number of tuples in a bag?
To: pig-user@hadoop.apache.org
Date: Wednesday, February 24, 2010, 12:17 AM

If there are handy variables to carry values here and there, that'd be helpful :)

Thanks,

Michael

--- On Tue, 2/23/10, Jeff Zhang wrote:

From: Jeff Zhang <zjffdu@gmail.com>
Subject: Re: count total number of tuples in a bag?
To: pig-user@hadoop.apache.org
Date: Tuesday, February 23, 2010, 8:32 PM

One way I can think of is to store the total number of tuple in one
specified place, and then load in your UDF when you wan to use it.

a_all = group a ALL;
a_count = FOREACH a_all GENERATE COUNT(a);
store a_count into 'your_store_place';
.....................

d = foreach c generate YourUDF(\$0);

On Tue, Feb 23, 2010 at 4:28 PM, jiang licht wrote:

Thanks Dmitriy. That's not sth I want. I want sth just like that in SQL,
you can get a number of total count of tuples (or other things of interest)
and use that like a variable (sorry, I don't know if I should use variable
here in PIG, but PIG passes command line parameter as a variable, right?).
So, this variable will be convenient for quick calculation of statistics in
PIG scripts. Though I also realize it might not be true to use a variable in
this way in PIG. So, it might be a misconcept in my mind anyway...

Thanks,

Michael

--- On Tue, 2/23/10, Dmitriy Ryaboy wrote:

From: Dmitriy Ryaboy <dvryaboy@gmail.com>
Subject: Re: count total number of tuples in a bag?
To: pig-user@hadoop.apache.org
Date: Tuesday, February 23, 2010, 6:10 PM

c = FOREACH b GENERATE group as key, COUNT(a);

will give you the number of rows in a per key.

a_all = group a ALL;
a_count = FOREACH a_all GENERATE COUNT(a);

will give you the total number of rows in a.

Does that answer your question?

On Tue, Feb 23, 2010 at 3:54 PM, jiang licht wrote:

Excuse me I could have missed important part of PIG document and asked this
trivial question here :) What is the best way to find out the total number
of tuples (rows) in the bag of data loaded? For example, after "a = LOAD
'sth' AS (key, value); b = GROUP a BY key; c = FOREACH b GENERATE key;" I
want to know how many tuples are loaded to 'a' and total number left in 'c'.
One way might be to use a udf function. But is there a support of counting
this in PIG?

Thanks,

Michael

--
Best Regards

Jeff Zhang

## Related Discussions

Discussion Navigation
 view thread | post
Discussion Overview
 group user categories pig, hadoop posted Feb 23, '10 at 11:55p active Feb 24, '10 at 6:39a posts 6 users 3 website pig.apache.org

### 3 users in discussion

Content

People

Support

Translate

site design / logo © 2021 Grokbase