Grokbase Groups Pig dev January 2011
FAQ
Thank you very much for your explination ,
Just to verify that I understood correctly
For example if myfile contains the following data
1 3 4
3 4 6
7 8 2
4 5 9
9 3 5
6 6 2

so all this data will be sent to Proj(0) operator which gives as a results
1
3
7
4
9
6

After that all this data in myfile will be sent to the filter operator, so
that the filter take tow inputs the myfile data and the result of the
proj(0) > 5 which is
7
9
6

regards

On Mon, Jan 24, 2011 at 10:08 PM, Alan Gates wrote:

The logical plan for your script will look like:

Load -> Filter -> Store

Filter will have an expression plan that looks like Proj($0) > const(5)

So yes, all your data will go through the filter operator. But keep in
mind that there is a filter operator in each map task, so all your code will
not go through any one instance of the operator (unless myfile is small).
Hope that helps.

Unfortunately, there is not any great architecture document on Pig.
Probably the best substitute is a paper we published in VLDB 2009, which
you can get here:
http://infolab.stanford.edu/~olston/publications/vldb09.pdf. Since this
is almost 2 years old now some of the specific information is out of date
but the basic structure is still correct.

Alan.


On Jan 24, 2011, at 12:48 PM, Baraa Mohamad wrote:

Hello all:
I'm new user of Pig , and I'm very interested in the architecture of Pig.
I have a question about the logical plan

In the logical plan of this example: (in attach)
a = load 'myfile';
b = filter a by $0 > 5;
store b into 'myfilteredfile';


Does all the data in 'myfile' will be sent in it's totality to the Proj(0)
operator and to the Filter Operator ??
More generally what are runing on the arrows in the logical plan ??

what is the best documentation to understand the architecture of Pig not
only how to use it because I'll try to use it in the medical domain but
first I have to understand it
deeply

thank you very much for your help


Baraa MOHAMAD
Doctorante en informatique
ISIMA-LIMOS
Université Blaise Pascal
Clermont-Ferrand
France
Tél: +33 658900080

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 5 | next ›
Discussion Overview
groupdev @
categoriespig, hadoop
postedJan 24, '11 at 8:48p
activeJan 24, '11 at 9:46p
posts5
users2
websitepig.apache.org

2 users in discussion

Baraa Mohamad: 3 posts Alan Gates: 2 posts

People

Translate

site design / logo © 2021 Grokbase