Grokbase Groups Pig user October 2010
FAQ
I have a python script defined as
import sys

for line in sys.stdin:
if not line:
break
sys.stdout.write(line)

my data test looks like
({(19199vzFj6+uRbJf,7388,9074,50|22598,1267739954,0.0020,365,1,1)},1L)


my pig script is

temp = STREAM test THROUGH GroupStreamer as
(test_bag:chararray,·num_entries: long );

when I ran that my job will fail with
===== Task Information Header =====
Command: TestStream.py
(stdin-org.apache.pig.builtin.PigStreaming/stdout-org.apache.pig.builtin.PigStreaming)
Start time: Wed Oct 06 17:57:52 PDT 2010
===== * * * =====
/Users/felixgao/Documents/data/logs/TestStream.py: line 1: import: command
not found
/Users/felixgao/Documents/data/logs/TestStream.py: line 9: syntax error near
unexpected token `if'
/Users/felixgao/Documents/data/logs/TestStream.py: line 9: ` if not
line:'
2010-10-06 17:57:52,690 [Thread-21] ERROR
org.apache.pig.impl.streaming.ExecutableManager - 'TestStream.py ' failed
with exit status: 2
2010-10-06 17:57:52,697 [Thread-14] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
org.apache.pig.backend.executionengine.ExecException: ERROR 2090: Received
Error while processing the reduce plan: 'TestStream.py ' failed with exit
status: 2
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:465)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:401)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:381)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:250)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)

What did I do wrong here?




Another question is if I specify by alias as
temp = STREAM Test THROUGH GroupStreamer
as (test_grp_cnt:bag {test_none_zero: tuple(f1:chararray, f2:int, f3:int,
f4:chararray, f5:int, f6:double, f7:int, f8:int, f9:int)}, ·num_entries:
long );
I will get
2010-10-06 17:38:57,092 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1000: Error during parsing. Encountered " ";" "; "" at line 76, column
179.
Was expecting one of:
")" ...
"," ...

What is the correct way of specifying a bag of tuples based on my data
sample?

Thanks,

Felix

Search Discussions

  • Alan Gates at Oct 8, 2010 at 7:11 pm
    I don't think Pig understands that this is a Python script. What
    happens if you put #!/bin/python (or whatever is appropriate in your
    system) at the beginning of your GroupStreamer? Alternatively you
    could explicitly call python on this file in your command by saying

    STREAM test THROUGH `/bin/python GroupStreamer`

    Alan.
    On Oct 6, 2010, at 6:09 PM, felix gao wrote:

    I have a python script defined as
    import sys

    for line in sys.stdin:
    if not line:
    break
    sys.stdout.write(line)

    my data test looks like
    ({(19199vzFj6+uRbJf,7388,9074,50|22598,1267739954,0.0020,365,1,1)},1L)


    my pig script is

    temp = STREAM test THROUGH GroupStreamer as
    (test_bag:chararray,·num_entries: long );

    when I ran that my job will fail with
    ===== Task Information Header =====
    Command: TestStream.py
    (stdin-org.apache.pig.builtin.PigStreaming/stdout-
    org.apache.pig.builtin.PigStreaming)
    Start time: Wed Oct 06 17:57:52 PDT 2010
    ===== * * * =====
    /Users/felixgao/Documents/data/logs/TestStream.py: line 1: import:
    command
    not found
    /Users/felixgao/Documents/data/logs/TestStream.py: line 9: syntax
    error near
    unexpected token `if'
    /Users/felixgao/Documents/data/logs/TestStream.py: line 9: ` if not
    line:'
    2010-10-06 17:57:52,690 [Thread-21] ERROR
    org.apache.pig.impl.streaming.ExecutableManager - 'TestStream.py '
    failed
    with exit status: 2
    2010-10-06 17:57:52,697 [Thread-14] WARN
    org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
    org.apache.pig.backend.executionengine.ExecException: ERROR 2090:
    Received
    Error while processing the reduce plan: 'TestStream.py ' failed with
    exit
    status: 2
    at
    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
    $Reduce.runPipeline(PigMapReduce.java:465)
    at
    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
    $Reduce.processOnePackageOutput(PigMapReduce.java:401)
    at
    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
    $Reduce.reduce(PigMapReduce.java:381)
    at
    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
    $Reduce.reduce(PigMapReduce.java:250)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
    at
    org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
    at
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:
    216)

    What did I do wrong here?




    Another question is if I specify by alias as
    temp = STREAM Test THROUGH GroupStreamer
    as (test_grp_cnt:bag {test_none_zero: tuple(f1:chararray, f2:int,
    f3:int,
    f4:chararray, f5:int, f6:double, f7:int, f8:int, f9:int)},
    ·num_entries:
    long );
    I will get
    2010-10-06 17:38:57,092 [main] ERROR
    org.apache.pig.tools.grunt.Grunt -
    ERROR 1000: Error during parsing. Encountered " ";" "; "" at line
    76, column
    179.
    Was expecting one of:
    ")" ...
    "," ...

    What is the correct way of specifying a bag of tuples based on my data
    sample?

    Thanks,

    Felix

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 7, '10 at 1:09a
activeOct 8, '10 at 7:11p
posts2
users2
websitepig.apache.org

2 users in discussion

Felix gao: 1 post Alan Gates: 1 post

People

Translate

site design / logo © 2021 Grokbase