Grokbase Groups Pig dev January 2010
FAQ
I am loading some data stored in bzipped format.
Using Hadoop 0.20 and pig 0.5

The question is why I can't store anything? The output is zero byte.

Thanks,

Below is the copy from Grunt's output

store main_data into 'outputs/outfile' USING PigStorage();
2010-01-11 14:18:59,512 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2010-01-11 14:18:59,512 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2010-01-11 14:19:00,995 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2010-01-11 14:19:00,996 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting identity combiner class.
2010-01-11 14:19:01,004 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2010-01-11 14:19:01,005 [Thread-35] WARN org.apache.hadoop.mapred.JobClient
- Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
2010-01-11 14:19:01,507 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2010-01-11 14:19:01,583 [Thread-44] INFO org.apache.hadoop.mapred.MapTask -
numReduceTasks: 1
2010-01-11 14:19:01,585 [Thread-44] INFO org.apache.hadoop.mapred.MapTask -
io.sort.mb = 100
2010-01-11 14:19:01,849 [Thread-44] INFO org.apache.hadoop.mapred.MapTask -
data buffer = 79691776/99614720
2010-01-11 14:19:01,849 [Thread-44] INFO org.apache.hadoop.mapred.MapTask -
record buffer = 262144/327680
2010-01-11 14:19:01,885 [Thread-44] INFO org.apache.hadoop.mapred.MapTask -
Starting flush of map output
2010-01-11 14:19:01,994 [Thread-44] INFO org.apache.hadoop.mapred.MapTask -
Finished spill 0
2010-01-11 14:19:01,997 [Thread-44] INFO
org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0003_m_000000_0 is
done. And is in the process of commiting
2010-01-11 14:19:01,998 [Thread-44] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2010-01-11 14:19:01,999 [Thread-44] INFO
org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0003_m_000000_0'
done.
2010-01-11 14:19:02,002 [Thread-44] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2010-01-11 14:19:02,004 [Thread-44] INFO org.apache.hadoop.mapred.Merger -
Merging 1 sorted segments
2010-01-11 14:19:02,004 [Thread-44] INFO org.apache.hadoop.mapred.Merger -
Down to the last merge-pass, with 1 segments left of total size: 2252 bytes
2010-01-11 14:19:02,004 [Thread-44] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2010-01-11 14:19:02,076 [Thread-44] INFO
org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0003_r_000000_0 is
done. And is in the process of commiting
2010-01-11 14:19:02,077 [Thread-44] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2010-01-11 14:19:02,084 [Thread-44] INFO
org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0003_r_000000_0 is
allowed to commit now
2010-01-11 14:19:02,087 [Thread-44] INFO
org.apache.hadoop.mapred.FileOutputCommitter - Saved output of task
'attempt_local_0003_r_000000_0' to
file:/Users/me/Documents/pig/outputs/outfile
2010-01-11 14:19:02,088 [Thread-44] INFO
org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2010-01-11 14:19:02,088 [Thread-44] INFO
org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0003_r_000000_0'
done.
2010-01-11 14:19:06,511 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2010-01-11 14:19:06,511 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Successfully stored result in:
"file:/Users/me/Documents/pig/outputs/outfile"
2010-01-11 14:19:06,516 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Records written : 0
2010-01-11 14:19:06,516 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Bytes written : 0
2010-01-11 14:19:06,516 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!



grunt> dump main_data
2010-01-11 14:11:16,916 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2010-01-11 14:11:16,916 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2010-01-11 14:11:18,265 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2010-01-11 14:11:18,266 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting identity combiner class.
2010-01-11 14:11:18,273 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2010-01-11 14:11:18,276 [Thread-20] WARN org.apache.hadoop.mapred.JobClient
- Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
2010-01-11 14:11:18,778 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2010-01-11 14:11:19,038 [Thread-29] INFO org.apache.hadoop.mapred.MapTask -
numReduceTasks: 1
2010-01-11 14:11:19,039 [Thread-29] INFO org.apache.hadoop.mapred.MapTask -
io.sort.mb = 100
2010-01-11 14:11:19,357 [Thread-29] INFO org.apache.hadoop.mapred.MapTask -
data buffer = 79691776/99614720
2010-01-11 14:11:19,358 [Thread-29] INFO org.apache.hadoop.mapred.MapTask -
record buffer = 262144/327680
2010-01-11 14:11:19,415 [Thread-29] INFO org.apache.hadoop.mapred.MapTask -
Starting flush of map output
2010-01-11 14:11:19,547 [Thread-29] INFO org.apache.hadoop.mapred.MapTask -
Finished spill 0
2010-01-11 14:11:19,550 [Thread-29] INFO
org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is
done. And is in the process of commiting
2010-01-11 14:11:19,551 [Thread-29] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2010-01-11 14:11:19,551 [Thread-29] INFO
org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0'
done.
2010-01-11 14:11:19,554 [Thread-29] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2010-01-11 14:11:19,556 [Thread-29] INFO org.apache.hadoop.mapred.Merger -
Merging 1 sorted segments
2010-01-11 14:11:19,556 [Thread-29] INFO org.apache.hadoop.mapred.Merger -
Down to the last merge-pass, with 1 segments left of total size: 2252 bytes
2010-01-11 14:11:19,556 [Thread-29] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2010-01-11 14:11:19,621 [Thread-29] INFO
org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is
done. And is in the process of commiting
2010-01-11 14:11:19,622 [Thread-29] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2010-01-11 14:11:19,622 [Thread-29] INFO
org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is
allowed to commit now
2010-01-11 14:11:19,626 [Thread-29] INFO
org.apache.hadoop.mapred.FileOutputCommitter - Saved output of task
'attempt_local_0002_r_000000_0' to file:/tmp/temp-1212107061/tmp1152866073
2010-01-11 14:11:19,626 [Thread-29] INFO
org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2010-01-11 14:11:19,626 [Thread-29] INFO
org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0'
done.
2010-01-11 14:11:23,782 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2010-01-11 14:11:23,782 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Successfully stored result in: "file:/tmp/temp-1212107061/tmp1152866073"
2010-01-11 14:11:23,783 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Records written : 0
2010-01-11 14:11:23,783 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Bytes written : 0
2010-01-11 14:11:23,783 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!
(+9W99/BvvD+rdI54,1379,2)
(+9W99/BvvD+rdI54,1379,5)

Search Discussions

  • Jeff Zhang at Jan 12, 2010 at 1:18 am
    Hi Gao,

    It looks like you run your script in local mode, actually pig have some
    problems to process bzip file when in local mode(
    http://issues.apache.org/jira/browse/PIG-752). You can try to run it in
    mapreduce mode, it should work.

    On Mon, Jan 11, 2010 at 2:20 PM, felix gao wrote:

    I am loading some data stored in bzipped format.
    Using Hadoop 0.20 and pig 0.5

    The question is why I can't store anything? The output is zero byte.

    Thanks,

    Below is the copy from Grunt's output

    store main_data into 'outputs/outfile' USING PigStorage();
    2010-01-11 14:18:59,512 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    2010-01-11 14:18:59,512 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    2010-01-11 14:19:00,995 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    2010-01-11 14:19:00,996 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting identity combiner class.
    2010-01-11 14:19:01,004 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2010-01-11 14:19:01,005 [Thread-35] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    2010-01-11 14:19:01,507 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2010-01-11 14:19:01,583 [Thread-44] INFO org.apache.hadoop.mapred.MapTask
    -
    numReduceTasks: 1
    2010-01-11 14:19:01,585 [Thread-44] INFO org.apache.hadoop.mapred.MapTask
    -
    io.sort.mb = 100
    2010-01-11 14:19:01,849 [Thread-44] INFO org.apache.hadoop.mapred.MapTask
    -
    data buffer = 79691776/99614720
    2010-01-11 14:19:01,849 [Thread-44] INFO org.apache.hadoop.mapred.MapTask
    -
    record buffer = 262144/327680
    2010-01-11 14:19:01,885 [Thread-44] INFO org.apache.hadoop.mapred.MapTask
    -
    Starting flush of map output
    2010-01-11 14:19:01,994 [Thread-44] INFO org.apache.hadoop.mapred.MapTask
    -
    Finished spill 0
    2010-01-11 14:19:01,997 [Thread-44] INFO
    org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0003_m_000000_0
    is
    done. And is in the process of commiting
    2010-01-11 14:19:01,998 [Thread-44] INFO
    org.apache.hadoop.mapred.LocalJobRunner -
    2010-01-11 14:19:01,999 [Thread-44] INFO
    org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0003_m_000000_0'
    done.
    2010-01-11 14:19:02,002 [Thread-44] INFO
    org.apache.hadoop.mapred.LocalJobRunner -
    2010-01-11 14:19:02,004 [Thread-44] INFO org.apache.hadoop.mapred.Merger -
    Merging 1 sorted segments
    2010-01-11 14:19:02,004 [Thread-44] INFO org.apache.hadoop.mapred.Merger -
    Down to the last merge-pass, with 1 segments left of total size: 2252 bytes
    2010-01-11 14:19:02,004 [Thread-44] INFO
    org.apache.hadoop.mapred.LocalJobRunner -
    2010-01-11 14:19:02,076 [Thread-44] INFO
    org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0003_r_000000_0
    is
    done. And is in the process of commiting
    2010-01-11 14:19:02,077 [Thread-44] INFO
    org.apache.hadoop.mapred.LocalJobRunner -
    2010-01-11 14:19:02,084 [Thread-44] INFO
    org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0003_r_000000_0
    is
    allowed to commit now
    2010-01-11 14:19:02,087 [Thread-44] INFO
    org.apache.hadoop.mapred.FileOutputCommitter - Saved output of task
    'attempt_local_0003_r_000000_0' to
    file:/Users/me/Documents/pig/outputs/outfile
    2010-01-11 14:19:02,088 [Thread-44] INFO
    org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
    2010-01-11 14:19:02,088 [Thread-44] INFO
    org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0003_r_000000_0'
    done.
    2010-01-11 14:19:06,511 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    2010-01-11 14:19:06,511 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Successfully stored result in:
    "file:/Users/me/Documents/pig/outputs/outfile"
    2010-01-11 14:19:06,516 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written : 0
    2010-01-11 14:19:06,516 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written : 0
    2010-01-11 14:19:06,516 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!



    grunt> dump main_data
    2010-01-11 14:11:16,916 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size before optimization: 1
    2010-01-11 14:11:16,916 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
    - MR plan size after optimization: 1
    2010-01-11 14:11:18,265 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting up single store job
    2010-01-11 14:11:18,266 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
    - Setting identity combiner class.
    2010-01-11 14:11:18,273 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2010-01-11 14:11:18,276 [Thread-20] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    2010-01-11 14:11:18,778 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2010-01-11 14:11:19,038 [Thread-29] INFO org.apache.hadoop.mapred.MapTask
    -
    numReduceTasks: 1
    2010-01-11 14:11:19,039 [Thread-29] INFO org.apache.hadoop.mapred.MapTask
    -
    io.sort.mb = 100
    2010-01-11 14:11:19,357 [Thread-29] INFO org.apache.hadoop.mapred.MapTask
    -
    data buffer = 79691776/99614720
    2010-01-11 14:11:19,358 [Thread-29] INFO org.apache.hadoop.mapred.MapTask
    -
    record buffer = 262144/327680
    2010-01-11 14:11:19,415 [Thread-29] INFO org.apache.hadoop.mapred.MapTask
    -
    Starting flush of map output
    2010-01-11 14:11:19,547 [Thread-29] INFO org.apache.hadoop.mapred.MapTask
    -
    Finished spill 0
    2010-01-11 14:11:19,550 [Thread-29] INFO
    org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0
    is
    done. And is in the process of commiting
    2010-01-11 14:11:19,551 [Thread-29] INFO
    org.apache.hadoop.mapred.LocalJobRunner -
    2010-01-11 14:11:19,551 [Thread-29] INFO
    org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0'
    done.
    2010-01-11 14:11:19,554 [Thread-29] INFO
    org.apache.hadoop.mapred.LocalJobRunner -
    2010-01-11 14:11:19,556 [Thread-29] INFO org.apache.hadoop.mapred.Merger -
    Merging 1 sorted segments
    2010-01-11 14:11:19,556 [Thread-29] INFO org.apache.hadoop.mapred.Merger -
    Down to the last merge-pass, with 1 segments left of total size: 2252 bytes
    2010-01-11 14:11:19,556 [Thread-29] INFO
    org.apache.hadoop.mapred.LocalJobRunner -
    2010-01-11 14:11:19,621 [Thread-29] INFO
    org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0
    is
    done. And is in the process of commiting
    2010-01-11 14:11:19,622 [Thread-29] INFO
    org.apache.hadoop.mapred.LocalJobRunner -
    2010-01-11 14:11:19,622 [Thread-29] INFO
    org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0
    is
    allowed to commit now
    2010-01-11 14:11:19,626 [Thread-29] INFO
    org.apache.hadoop.mapred.FileOutputCommitter - Saved output of task
    'attempt_local_0002_r_000000_0' to file:/tmp/temp-1212107061/tmp1152866073
    2010-01-11 14:11:19,626 [Thread-29] INFO
    org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
    2010-01-11 14:11:19,626 [Thread-29] INFO
    org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0'
    done.
    2010-01-11 14:11:23,782 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 100% complete
    2010-01-11 14:11:23,782 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Successfully stored result in: "file:/tmp/temp-1212107061/tmp1152866073"
    2010-01-11 14:11:23,783 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Records written : 0
    2010-01-11 14:11:23,783 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Bytes written : 0
    2010-01-11 14:11:23,783 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Success!
    (+9W99/BvvD+rdI54,1379,2)
    (+9W99/BvvD+rdI54,1379,5)


    --
    Best Regards

    Jeff Zhang
  • Felix gao at Jan 12, 2010 at 3:27 am
    Follow up with the previous email. I have noticed the following

    I have a pig script called Overlap that reads in bunch *.bz2 files

    if I run the following command
    java -cp $PIGDIR/pig.jar:/root/MyUDFs.jar:$HADOOPSITEPATH
    org.apache.pig.Main Overlap.pig

    the store doesn't write anything to the HDFS but reports successful with 0
    byte written

    when I issue this command
    java -cp $PIGDIR/pig.jar:/root/MyUDFs.jar:$HADOOPSITEPATH
    org.apache.pig.Main -x mapreduce Overlap.pig
    The store command in my pig script is able to produce the output in the
    correct directory/file.

    Can someone explain to me what is the cause of the 0 byte written and why
    with -x mapreduce is running fine?
  • Dmitriy Ryaboy at Jan 12, 2010 at 5:49 am
    Both are caused by you running in local mode by default.


    On Mon, Jan 11, 2010 at 5:36 PM, felix gao wrote:
    Follow up with the previous email.  I have noticed the following

    I have a pig script called Overlap that reads in bunch *.bz2 files

    if I run the following command
    java -cp $PIGDIR/pig.jar:/root/MyUDFs.jar:$HADOOPSITEPATH
    org.apache.pig.Main  Overlap.pig

    the store doesn't write anything to the HDFS but reports successful with
    byte written

    when I issue this command
    java -cp $PIGDIR/pig.jar:/root/MyUDFs.jar:$HADOOPSITEPATH
    org.apache.pig.Main -x mapreduce Overlap.pig
    The store command in my pig script is able to produce the output in the
    correct directory/file.

    Can someone explain to me what is the cause of the 0 byte written and why
    with -x mapreduce is running fine?

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedJan 11, '10 at 10:24p
activeJan 12, '10 at 5:49a
posts4
users3
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase