Grokbase Groups Pig user April 2011
FAQ
Hi to all,

I am newbie and I am just testing small scripts for training.

My question is about the result of the script below in local mode:

grunt> cat nested.txt
{(8,9),(0,1)},{(8,9),(1,1)}
{(2,3),(4,5)},{(2,3),(4,5)}
{(6,7),(3,7)},{(2,2),(3,7)}
grunt> A = LOAD 'nested.txt' AS
(B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)});
grunt> DUMP A;
({(8,9),(0,1)},)
({(2,3),(4,5)},)
({(6,7),(3,7)},)

Why B2 is not displayed !????

When I executed the same script with PigPen, B2 is displayed but this
time I have only one result instead of three. You can find the
screenshot in the attachment.


When I use grunt shell, I have all the messages below before displaying
the result and it takes too much time.
Should I use a parameter with pig -x local to avoid this? or I made
errors with my installation?

THANKS IN ADVANCE

grunt> A = LOAD 'nested.txt' AS
(B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)});
grunt> DUMP
A;
2011-04-29 15:37:44,954 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2011-04-29 15:37:44,954 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-04-29 15:37:44,955 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:44,959 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
A:
Store(file:/tmp/temp643030084/tmp-1663465556:org.apache.pig.impl.io.InterStorage) - scope-48 Operator Key: scope-48)
2011-04-29 15:37:44,959 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
2011-04-29 15:37:44,960 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2011-04-29 15:37:44,960 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2011-04-29 15:37:44,961 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:44,964 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:44,966 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
added to the job
2011-04-29 15:37:44,966 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2011-04-29 15:37:46,270 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2011-04-29 15:37:46,273 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,275 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2011-04-29 15:37:46,295 [Thread-57] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,300 [Thread-57] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,308 [Thread-57] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2011-04-29 15:37:46,308 [Thread-57] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1
2011-04-29 15:37:46,308 [Thread-57] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2011-04-29 15:37:46,402 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,407 [Thread-66] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2011-04-29 15:37:46,407 [Thread-66] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1
2011-04-29 15:37:46,407 [Thread-66] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2011-04-29 15:37:46,442 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,446 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,449 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,452 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,486 [Thread-66] INFO
org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0005_m_000000_0
is done. And is in the process of commiting
2011-04-29 15:37:46,486 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,489 [Thread-66] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2011-04-29 15:37:46,489 [Thread-66] INFO
org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0005_m_000000_0
is allowed to commit now
2011-04-29 15:37:46,489 [Thread-66] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:46,494 [Thread-66] INFO
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved
output of task 'attempt_local_0005_m_000000_0' to
file:/tmp/temp643030084/tmp-1663465556
2011-04-29 15:37:46,496 [Thread-66] INFO
org.apache.hadoop.mapred.LocalJobRunner -
2011-04-29 15:37:46,496 [Thread-66] INFO
org.apache.hadoop.mapred.TaskRunner - Task
'attempt_local_0005_m_000000_0' done.
2011-04-29 15:37:46,776 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005
2011-04-29 15:37:46,776 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2011-04-29 15:37:51,778 [main] WARN
org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob
for job job_local_0005
2011-04-29 15:37:51,778 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2011-04-29 15:37:51,778 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
reported below may be incomplete
2011-04-29 15:37:51,778 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt
Features
0.20.2 0.8.1 pehlivanz 2011-04-29 15:37:44 2011-04-29
15:37:51 UNKNOWN

Success!

Job Stats (time in seconds):
JobId Alias Feature Outputs
job_local_0005 A MAP_ONLY
file:/tmp/temp643030084/tmp-1663465556,

Input(s):
Successfully read records from:
"file:///home/pehlivanz/PIG/pig-0.8.1/tutorial/scripts/testzp/nested.txt"

Output(s):
Successfully stored records in: "file:/tmp/temp643030084/tmp-1663465556"

Job DAG:
job_local_0005


2011-04-29 15:37:51,778 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:51,781 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2011-04-29 15:37:51,782 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2011-04-29 15:37:51,784 [main] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
paths to process : 1
2011-04-29 15:37:51,785 [main] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1

Search Discussions

  • Richard Ding at Apr 29, 2011 at 6:27 pm
    Before casting fields to the schema you specified, loader needs to split each record into fields. For PigStorage (the loader used in your script), the default field separator is '\t'. Since the data file doesn't use '\t' to mark the field boundary, the loader reads the whole record into a single field.

    -Richard






    On 4/29/11 7:12 AM, "Zeynep PEHLIVAN" wrote:

    Hi to all,

    I am newbie and I am just testing small scripts for training.

    My question is about the result of the script below in local mode:

    grunt> cat nested.txt
    {(8,9),(0,1)},{(8,9),(1,1)}
    {(2,3),(4,5)},{(2,3),(4,5)}
    {(6,7),(3,7)},{(2,2),(3,7)}
    grunt> A = LOAD 'nested.txt' AS
    (B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)});
    grunt> DUMP A;
    ({(8,9),(0,1)},)
    ({(2,3),(4,5)},)
    ({(6,7),(3,7)},)

    Why B2 is not displayed !????

    When I executed the same script with PigPen, B2 is displayed but this
    time I have only one result instead of three. You can find the
    screenshot in the attachment.


    When I use grunt shell, I have all the messages below before displaying
    the result and it takes too much time.
    Should I use a parameter with pig -x local to avoid this? or I made
    errors with my installation?

    THANKS IN ADVANCE

    grunt> A = LOAD 'nested.txt' AS
    (B1:bag{T1:tuple(t1:int,t2:int)},B2:bag{T2:tuple(f1:int,f2:int)});
    grunt> DUMP
    A;
    2011-04-29 15:37:44,954 [main] INFO
    org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
    script: UNKNOWN
    2011-04-29 15:37:44,954 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    pig.usenewlogicalplan is set to true. New logical plan will be used.
    2011-04-29 15:37:44,955 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:44,959 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
    A:
    Store(file:/tmp/temp643030084/tmp-1663465556:org.apache.pig.impl.io.InterStorage) - scope-48 Operator Key: scope-48)
    2011-04-29 15:37:44,959 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
    - File concatenation threshold: 100 optimistic? false
    2011-04-29 15:37:44,960 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2011-04-29 15:37:44,960 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2011-04-29 15:37:44,961 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:44,964 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:44,966 [main] INFO
    org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
    added to the job
    2011-04-29 15:37:44,966 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2011-04-29 15:37:46,270 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2011-04-29 15:37:46,273 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:46,275 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2011-04-29 15:37:46,295 [Thread-57] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:46,300 [Thread-57] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:46,308 [Thread-57] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
    paths to process : 1
    2011-04-29 15:37:46,308 [Thread-57] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
    input paths to process : 1
    2011-04-29 15:37:46,308 [Thread-57] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
    input paths (combined) to process : 1
    2011-04-29 15:37:46,402 [Thread-66] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:46,407 [Thread-66] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
    paths to process : 1
    2011-04-29 15:37:46,407 [Thread-66] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
    input paths to process : 1
    2011-04-29 15:37:46,407 [Thread-66] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
    input paths (combined) to process : 1
    2011-04-29 15:37:46,442 [Thread-66] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:46,446 [Thread-66] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:46,449 [Thread-66] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:46,452 [Thread-66] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:46,486 [Thread-66] INFO
    org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0005_m_000000_0
    is done. And is in the process of commiting
    2011-04-29 15:37:46,486 [Thread-66] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:46,489 [Thread-66] INFO
    org.apache.hadoop.mapred.LocalJobRunner -
    2011-04-29 15:37:46,489 [Thread-66] INFO
    org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0005_m_000000_0
    is allowed to commit now
    2011-04-29 15:37:46,489 [Thread-66] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:46,494 [Thread-66] INFO
    org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved
    output of task 'attempt_local_0005_m_000000_0' to
    file:/tmp/temp643030084/tmp-1663465556
    2011-04-29 15:37:46,496 [Thread-66] INFO
    org.apache.hadoop.mapred.LocalJobRunner -
    2011-04-29 15:37:46,496 [Thread-66] INFO
    org.apache.hadoop.mapred.TaskRunner - Task
    'attempt_local_0005_m_000000_0' done.
    2011-04-29 15:37:46,776 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005
    2011-04-29 15:37:46,776 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2011-04-29 15:37:51,778 [main] WARN
    org.apache.pig.tools.pigstats.PigStatsUtil - Failed to get RunningJob
    for job job_local_0005
    2011-04-29 15:37:51,778 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2011-04-29 15:37:51,778 [main] INFO
    org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
    reported below may be incomplete
    2011-04-29 15:37:51,778 [main] INFO
    org.apache.pig.tools.pigstats.PigStats - Script Statistics:

    HadoopVersion PigVersion UserId StartedAt FinishedAt
    Features
    0.20.2 0.8.1 pehlivanz 2011-04-29 15:37:44 2011-04-29
    15:37:51 UNKNOWN

    Success!

    Job Stats (time in seconds):
    JobId Alias Feature Outputs
    job_local_0005 A MAP_ONLY
    file:/tmp/temp643030084/tmp-1663465556,

    Input(s):
    Successfully read records from:
    "file:///home/pehlivanz/PIG/pig-0.8.1/tutorial/scripts/testzp/nested.txt"

    Output(s):
    Successfully stored records in: "file:/tmp/temp643030084/tmp-1663465556"

    Job DAG:
    job_local_0005


    2011-04-29 15:37:51,778 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:51,781 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
    2011-04-29 15:37:51,782 [main] INFO
    org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
    with processName=JobTracker, sessionId= - already initialized
    2011-04-29 15:37:51,784 [main] INFO
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input
    paths to process : 1
    2011-04-29 15:37:51,785 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
    input paths to process : 1

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 29, '11 at 2:13p
activeApr 29, '11 at 6:27p
posts2
users2
websitepig.apache.org

2 users in discussion

Richard Ding: 1 post Zeynep PEHLIVAN: 1 post

People

Translate

site design / logo © 2021 Grokbase