Grokbase Groups Pig user January 2011
FAQ
Hi All,

I'm looking into embedding pig latin in a host language using pig trunk. so
far, basic features work fine for me. but I need to know how can I get
result tuples from the stored bag.
I need to apply some processing to each tuple from result bag. In wiki
http://wiki.apache.org/pig/TuringCompletePig , It seems 'getResults()'
method works like what I want. but I got an error like "returned PigStats
has no attribute 'getResults()' ..."

Any advice would be appreciated.

- Youngwoo

Search Discussions

  • Richard Ding at Jan 18, 2011 at 6:56 pm
    The method you’re looking for is PigStats.result(String alias) which returns a OutputStats object.

    Here is an example:

    R = Pig.compile(...).bind(...).runSingle()
    iter = R.result(“G”).iterator()
    while iter.hasNext():
    t = iter.next()
    ....

    On 1/17/11 9:34 PM, "김영우" wrote:

    Hi All,

    I'm looking into embedding pig latin in a host language using pig trunk. so
    far, basic features work fine for me. but I need to know how can I get
    result tuples from the stored bag.
    I need to apply some processing to each tuple from result bag. In wiki
    http://wiki.apache.org/pig/TuringCompletePig , It seems 'getResults()'
    method works like what I want. but I got an error like "returned PigStats
    has no attribute 'getResults()' ..."

    Any advice would be appreciated.

    - Youngwoo
  • 김영우 at Jan 19, 2011 at 1:36 am
    Hi Richard,

    I tried that before but it did not work. It seems like I missed something
    but I don't know the details.

    Here is a my script for testing:

    #!/usr/bin/python

    # need to explicitly import the Pig class
    from org.apache.pig.scripting import Pig

    output = 'outfile'
    #Pig.fs('-rmr /user/hanadmin/out*')
    p = Pig.compile("""
    records = LOAD '/user/hanadmin/DUAL.TXT' USING PigStorage() AS
    (input_line:chararray);
    r1 = FOREACH records GENERATE LOWER(records.input_line);

    STORE r1 INTO '$out';

    """)

    bs = p.bind({'out' : output})
    r = bs.runSingle()

    iter = r.result('r1').iterator()
    while iter.hasNext():

    t = iter.next()



    I got an error like the following:

    Backend error message

    ---------------------

    org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has
    more than one row in the output. 1st : (abcD), 2nd :(Abcd) at
    org.apache.pig.impl.builtin.ReadScalars.exec(ReadScalars.java:111) at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:203) at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:299) at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:323) at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:161) at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:186) at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:299) at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:323) at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:335) at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:287) at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:260) at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:255) at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:58)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315)

    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)

    at java.security.AccessController.doPrivileged(Native Method)

    at javax.security.auth.Subject.doAs(Subject.java:396)

    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)

    Error before Pig is launched
    ----------------------------

    ERROR 2088: Unable to get results for: hdfs://
    dev1.daum.net/user/hanadmin/outfile:org.apache.pig.builtin.PigStorage
    Traceback (most recent call last):
    File "<iostream>", line 19, in <module>

    at
    org.apache.pig.tools.pigstats.OutputStats.iterator(OutputStats.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)

    org.apache.pig.backend.executionengine.ExecException:
    org.apache.pig.backend.executionengine.ExecException: ERROR 2088: Unable to
    get results for: hdfs://
    dev1.daum.net/user/hanadmin/outfile:org.apache.pig.builtin.PigStorage
    ================================================================================



    Thanks.

    - Youngwoo

    2011년 1월 19일 오전 3:55, Richard Ding <rding@yahoo-inc.com>님의 말:
    The method you’re looking for is PigStats.result(String alias) which
    returns a OutputStats object.

    Here is an example:

    R = Pig.compile(...).bind(...).runSingle()
    iter = R.result(“G”).iterator()
    while iter.hasNext():
    t = iter.next()
    ....

    On 1/17/11 9:34 PM, "김영우" wrote:

    Hi All,

    I'm looking into embedding pig latin in a host language using pig trunk. so
    far, basic features work fine for me. but I need to know how can I get
    result tuples from the stored bag.
    I need to apply some processing to each tuple from result bag. In wiki
    http://wiki.apache.org/pig/TuringCompletePig , It seems 'getResults()'
    method works like what I want. but I got an error like "returned PigStats
    has no attribute 'getResults()' ..."

    Any advice would be appreciated.

    - Youngwoo
  • 김영우 at Jan 19, 2011 at 6:37 am
    Richard,

    I found my mistakes. Your example works well with 'normal' relations(bag).
    but in my test code Pig did implicit casting from relations to scalars.

    r1 = FOREACH records GENERATE LOWER(records.input_line);

    STORE r1 INTO '$out';



    I need to store 'r1' as a bag. How can I do this?

    Thanks.

    - Youngwoo

    2011년 1월 19일 오전 3:55, Richard Ding <rding@yahoo-inc.com>님의 말:
    The method you’re looking for is PigStats.result(String alias) which
    returns a OutputStats object.

    Here is an example:

    R = Pig.compile(...).bind(...).runSingle()
    iter = R.result(“G”).iterator()
    while iter.hasNext():
    t = iter.next()
    ....

    On 1/17/11 9:34 PM, "김영우" wrote:

    Hi All,

    I'm looking into embedding pig latin in a host language using pig trunk. so
    far, basic features work fine for me. but I need to know how can I get
    result tuples from the stored bag.
    I need to apply some processing to each tuple from result bag. In wiki
    http://wiki.apache.org/pig/TuringCompletePig , It seems 'getResults()'
    method works like what I want. but I got an error like "returned PigStats
    has no attribute 'getResults()' ..."

    Any advice would be appreciated.

    - Youngwoo
  • Richard Ding at Jan 19, 2011 at 8:44 pm
    Youngwoo,

    It will work if you change the foreach statement to

    r1 = FOREACH records GENERATE LOWER(input_line);

    Otherwise Pig thinks that your intent is to use relation (‘records’) as scalar (this is a new feature of 0.8).

    Thanks,
    - Richard


    On 1/18/11 10:36 PM, "김영우" wrote:

    Richard,

    I found my mistakes. Your example works well with 'normal' relations(bag). but in my test code Pig did implicit casting from relations to scalars.

    r1 = FOREACH records GENERATE LOWER(records.input_line);
    STORE r1 INTO '$out';


    I need to store 'r1' as a bag. How can I do this?

    Thanks.

    - Youngwoo?

    2011년 1월 19일 오전 3:55, Richard Ding <rding@yahoo-inc.com>님의 말:
    The method you’re looking for is PigStats.result(String alias) which returns a OutputStats object.

    Here is an example:

    R = Pig.compile(...).bind(...).runSingle()
    iter = R.result(“G”).iterator()
    while iter.hasNext():
    ????t = iter.next()
    ????....

    On 1/17/11 9:34 PM, "김영우" <warwithin@gmail.com wrote:

    Hi All,

    I'm looking into embedding pig latin in a host language using pig trunk. so
    far, basic features work fine for me. but I need to know how can I get
    result tuples from the stored bag.
    I need to apply some processing to each tuple from result bag. In wiki
    http://wiki.apache.org/pig/TuringCompletePig , It seems 'getResults()'
    method works like what I want. but I got an error like "returned PigStats
    has no attribute 'getResults()' ..."

    Any advice would be appreciated.

    - Youngwoo
  • 김영우 at Jan 20, 2011 at 1:40 am
    Richard,

    Got it! Thanks for your quick reply.

    - Youngwoo

    2011년 1월 20일 오전 5:43, Richard Ding <rding@yahoo-inc.com>님의 말:
    Youngwoo,

    It will work if you change the foreach statement to

    r1 = FOREACH records GENERATE LOWER(input_line);

    Otherwise Pig thinks that your intent is to use relation (‘records’) as
    scalar (this is a new feature of 0.8).

    Thanks,
    - Richard


    On 1/18/11 10:36 PM, "김영우" wrote:

    Richard,

    I found my mistakes. Your example works well with 'normal' relations(bag).
    but in my test code Pig did implicit casting from relations to scalars.

    r1 = FOREACH records GENERATE LOWER(records.input_line);
    STORE r1 INTO '$out';



    I need to store 'r1' as a bag. How can I do this?

    Thanks.

    - Youngwoo?

    2011년 1월 19일 오전 3:55, Richard Ding <rding@yahoo-inc.com>님의 말:

    The method you’re looking for is PigStats.result(String alias) which
    returns a OutputStats object.

    Here is an example:

    R = Pig.compile(...).bind(...).runSingle()
    iter = R.result(“G”).iterator()
    while iter.hasNext():
    ????t = iter.next()
    ????....

    On 1/17/11 9:34 PM, "김영우" <warwithin@gmail.com <http://warwithin@gmail.com>
    wrote:
    Hi All,

    I'm looking into embedding pig latin in a host language using pig trunk. so
    far, basic features work fine for me. but I need to know how can I get
    result tuples from the stored bag.
    I need to apply some processing to each tuple from result bag. In wiki
    http://wiki.apache.org/pig/TuringCompletePig , It seems 'getResults()'
    method works like what I want. but I got an error like "returned PigStats
    has no attribute 'getResults()' ..."

    Any advice would be appreciated.

    - Youngwoo


Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJan 18, '11 at 5:34a
activeJan 20, '11 at 1:40a
posts6
users2
websitepig.apache.org

2 users in discussion

김영우: 4 posts Richard Ding: 2 posts

People

Translate

site design / logo © 2021 Grokbase