Grokbase Groups Pig user August 2010
FAQ
Hey,



While running in Java a LIMIT statement is not getting executed.



/code

myServer.registerQuery("flow_firstcut = FOREACH
data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

myServer.registerQuery("filtered = FILTER
flow_firstcut BY sIP matches 'someIP';");



myServer.registerQuery("O = ORDER filtered BY
bytes DESC;");



myServer.registerQuery("topTen = LIMIT O 10;");



myServer.store("topTen", outputFilePath);



/code



This produces a 699 line file. It should produce a 10 line file.



/code

registerQuery("flow_firstcut = FOREACH data
GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

myServer.registerQuery("filtered = FILTER
flow_firstcut BY sIP matches '"+parameters[1]+"';");



//myServer.registerQuery("O = ORDER filtered BY
bytes DESC;");



myServer.registerQuery("topTen = LIMIT filtered
10;");



myServer.store("topTen", outputFilePath);

/code



This produces a 10 line file.



Is there a known bug I am unaware of or can you not order then limit?

http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT

indicates that this is a valid sequence of calls.



Help?



Matt

Search Discussions

  • Ashutosh Chauhan at Aug 5, 2010 at 4:54 pm
    Matt,

    Which version you are on? What happens if you run your query through
    grunt instead of PigServer?
    I tried load-order-limit sequence on a small dataset on grunt and I
    got expected results.

    Ashutosh
    On Wed, Aug 4, 2010 at 15:07, Matthew Smith wrote:
    Hey,



    While running in Java a LIMIT statement is not getting executed.



    /code

    myServer.registerQuery("flow_firstcut = FOREACH
    data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches 'someIP';");



    myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT O 10;");



    myServer.store("topTen", outputFilePath);



    /code



    This produces a 699 line file. It should produce a 10 line file.



    /code

    registerQuery("flow_firstcut = FOREACH data
    GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches '"+parameters[1]+"';");



    //myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT filtered
    10;");



    myServer.store("topTen", outputFilePath);

    /code



    This produces a 10 line file.



    Is there a known bug I am unaware of or can you not order then limit?

    http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT

    indicates that this is a valid sequence of calls.



    Help?



    Matt
  • Matthew Smith at Aug 5, 2010 at 5:57 pm
    No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.

    -----Original Message-----
    From: Ashutosh Chauhan
    Sent: Thursday, August 05, 2010 12:54 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: LIMIT Issue

    Matt,

    Which version you are on? What happens if you run your query through
    grunt instead of PigServer?
    I tried load-order-limit sequence on a small dataset on grunt and I
    got expected results.

    Ashutosh
    On Wed, Aug 4, 2010 at 15:07, Matthew Smith wrote:
    Hey,



    While running in Java a LIMIT statement is not getting executed.



    /code

    myServer.registerQuery("flow_firstcut = FOREACH
    data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches 'someIP';");



    myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT O 10;");



    myServer.store("topTen", outputFilePath);



    /code



    This produces a 699 line file. It should produce a 10 line file.



    /code

    registerQuery("flow_firstcut = FOREACH data
    GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches '"+parameters[1]+"';");



    //myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT filtered
    10;");



    myServer.store("topTen", outputFilePath);

    /code



    This produces a 10 line file.



    Is there a known bug I am unaware of or can you not order then limit?

    http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT

    indicates that this is a valid sequence of calls.



    Help?



    Matt
  • Ashutosh Chauhan at Aug 5, 2010 at 7:11 pm
    To cut down on the problem space, can you try your query on grunt. If
    it works there, problem would be something to do with PigServer, else
    its related to Pig core itself.

    Ashutosh
    On Thu, Aug 5, 2010 at 10:57, Matthew Smith wrote:
    No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.

    -----Original Message-----
    From: Ashutosh Chauhan
    Sent: Thursday, August 05, 2010 12:54 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: LIMIT Issue

    Matt,

    Which version you are on? What happens if you run your query through
    grunt instead of PigServer?
    I tried load-order-limit sequence on a small dataset on grunt and I
    got expected results.

    Ashutosh
    On Wed, Aug 4, 2010 at 15:07, Matthew Smith wrote:
    Hey,



    While running in Java a LIMIT statement is not getting executed.



    /code

    myServer.registerQuery("flow_firstcut = FOREACH
    data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches 'someIP';");



    myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT O 10;");



    myServer.store("topTen", outputFilePath);



    /code



    This produces a 699 line file. It should produce a 10 line file.



    /code

    registerQuery("flow_firstcut = FOREACH data
    GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches '"+parameters[1]+"';");



    //myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT filtered
    10;");



    myServer.store("topTen", outputFilePath);

    /code



    This produces a 10 line file.



    Is there a known bug I am unaware of or can you not order then limit?

    http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT

    indicates that this is a valid sequence of calls.



    Help?



    Matt
  • Matthew Smith at Aug 5, 2010 at 9:54 pm
    While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.

    I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.

    Thoughts?


    grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
    grunt> B = FILTER A BY sIP matches '61.81.46.45';
    grunt> C = ORDER B BY bytes DESC;
    grunt> D = LIMIT C 10;
    grunt> DUMP D;




    2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
    2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
    2010-08-05 14:47:52,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
    2010-08-05 14:47:52,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
    2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
    2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
    2010-08-05 14:47:52,911 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:52,934 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:52,935 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2010-08-05 14:47:54,187 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2010-08-05 14:47:54,228 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2010-08-05 14:47:54,246 [Thread-5] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2010-08-05 14:47:54,434 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,455 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:47:54,734 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2010-08-05 14:47:54,754 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:47:54,821 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,827 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,839 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,841 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:55,245 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
    2010-08-05 14:47:56,352 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
    2010-08-05 14:47:56,354 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
    2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
    2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
    2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
    2010-08-05 14:47:59,754 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2010-08-05 14:48:00,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2010-08-05 14:48:00,890 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:00,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2010-08-05 14:48:00,891 [Thread-18] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2010-08-05 14:48:00,999 [Thread-18] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,003 [Thread-18] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,009 [Thread-18] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:01,009 [Thread-18] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:01,155 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,157 [Thread-27] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:01,157 [Thread-27] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:01,189 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,192 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,209 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
    2010-08-05 14:48:01,391 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
    2010-08-05 14:48:02,157 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
    2010-08-05 14:48:02,157 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
    2010-08-05 14:48:02,369 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
    2010-08-05 14:48:02,752 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
    2010-08-05 14:48:02,753 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:02,753 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
    2010-08-05 14:48:02,761 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:02,789 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:02,789 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:02,796 [Thread-27] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
    2010-08-05 14:48:02,932 [Thread-27] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
    2010-08-05 14:48:02,932 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:02,935 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:03,023 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
    2010-08-05 14:48:03,025 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
    2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:03,029 [Thread-27] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
    2010-08-05 14:48:03,031 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
    2010-08-05 14:48:03,031 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
    2010-08-05 14:48:06,431 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
    2010-08-05 14:48:06,432 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:06,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2010-08-05 14:48:08,062 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2010-08-05 14:48:08,194 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,195 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2010-08-05 14:48:08,197 [Thread-33] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2010-08-05 14:48:08,475 [Thread-33] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,478 [Thread-33] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,480 [Thread-33] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:08,480 [Thread-33] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:08,792 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,794 [Thread-42] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:08,794 [Thread-42] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:09,024 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:09,027 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:09,028 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
    2010-08-05 14:48:09,290 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
    2010-08-05 14:48:09,443 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
    2010-08-05 14:48:09,443 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
    2010-08-05 14:48:09,479 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:09,491 [Thread-42] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
    java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:613)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
    Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
    at org.apache.pig.impl.io.ReadToEndLoader.(WeightedRangePartitioner.java:108)
    ... 6 more
    2010-08-05 14:48:13,800 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
    2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
    2010-08-05 14:48:13,803 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
    2010-08-05 14:48:13,811 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
    Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log

    -----Original Message-----
    From: Ashutosh Chauhan
    Sent: Thursday, August 05, 2010 3:10 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: LIMIT Issue

    To cut down on the problem space, can you try your query on grunt. If
    it works there, problem would be something to do with PigServer, else
    its related to Pig core itself.

    Ashutosh
    On Thu, Aug 5, 2010 at 10:57, Matthew Smith wrote:
    No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.

    -----Original Message-----
    From: Ashutosh Chauhan
    Sent: Thursday, August 05, 2010 12:54 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: LIMIT Issue

    Matt,

    Which version you are on? What happens if you run your query through
    grunt instead of PigServer?
    I tried load-order-limit sequence on a small dataset on grunt and I
    got expected results.

    Ashutosh
    On Wed, Aug 4, 2010 at 15:07, Matthew Smith wrote:
    Hey,



    While running in Java a LIMIT statement is not getting executed.



    /code

    myServer.registerQuery("flow_firstcut = FOREACH
    data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches 'someIP';");



    myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT O 10;");



    myServer.store("topTen", outputFilePath);



    /code



    This produces a 699 line file. It should produce a 10 line file.



    /code

    registerQuery("flow_firstcut = FOREACH data
    GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches '"+parameters[1]+"';");



    //myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT filtered
    10;");



    myServer.store("topTen", outputFilePath);

    /code



    This produces a 10 line file.



    Is there a known bug I am unaware of or can you not order then limit?

    http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT

    indicates that this is a valid sequence of calls.



    Help?



    Matt
  • Ashutosh Chauhan at Aug 6, 2010 at 6:43 am
    This is most likely because B is empty. do

    grunt> dump A; -- to verify data is getting loaded as you are expecting.
    grunt> dump B; -- to verify that B is non-empty.

    Ashutosh
    On Thu, Aug 5, 2010 at 14:54, Matthew Smith wrote:
    While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.

    I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.

    Thoughts?


    grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
    grunt> B = FILTER A BY sIP matches '61.81.46.45';
    grunt> C = ORDER B BY bytes DESC;
    grunt> D = LIMIT C 10;
    grunt> DUMP D;




    2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
    2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
    2010-08-05 14:47:52,681 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
    2010-08-05 14:47:52,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
    2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
    2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
    2010-08-05 14:47:52,911 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:52,934 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:52,935 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2010-08-05 14:47:54,187 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2010-08-05 14:47:54,228 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,229 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2010-08-05 14:47:54,246 [Thread-5] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2010-08-05 14:47:54,434 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,455 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:47:54,734 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2010-08-05 14:47:54,754 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:47:54,821 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,827 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,839 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,841 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:55,245 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
    2010-08-05 14:47:56,352 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
    2010-08-05 14:47:56,354 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
    2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
    2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
    2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
    2010-08-05 14:47:59,754 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2010-08-05 14:48:00,873 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2010-08-05 14:48:00,890 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:00,891 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2010-08-05 14:48:00,891 [Thread-18] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2010-08-05 14:48:00,999 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,003 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:01,155 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:01,189 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,192 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,209 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
    2010-08-05 14:48:01,391 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
    2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
    2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
    2010-08-05 14:48:02,369 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - Starting flush of map output
    2010-08-05 14:48:02,752 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
    2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
    2010-08-05 14:48:02,761 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:02,796 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
    2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
    2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:02,935 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:03,023 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
    2010-08-05 14:48:03,025 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
    2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:03,029 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
    2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
    2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
    2010-08-05 14:48:06,431 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
    2010-08-05 14:48:06,432 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:06,432 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2010-08-05 14:48:08,062 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2010-08-05 14:48:08,194 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,195 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2010-08-05 14:48:08,197 [Thread-33] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2010-08-05 14:48:08,475 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,478 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:08,792 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:09,024 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:09,027 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:09,028 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
    2010-08-05 14:48:09,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
    2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
    2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
    2010-08-05 14:48:09,479 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:09,491 [Thread-42] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
    java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
    Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
    ... 6 more
    2010-08-05 14:48:13,800 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
    2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
    2010-08-05 14:48:13,803 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
    2010-08-05 14:48:13,811 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
    Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log

    -----Original Message-----
    From: Ashutosh Chauhan
    Sent: Thursday, August 05, 2010 3:10 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: LIMIT Issue

    To cut down on the problem space, can you try your query on grunt. If
    it works there, problem would be something to do with PigServer, else
    its related to Pig core itself.

    Ashutosh
    On Thu, Aug 5, 2010 at 10:57, Matthew Smith wrote:
    No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.

    -----Original Message-----
    From: Ashutosh Chauhan
    Sent: Thursday, August 05, 2010 12:54 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: LIMIT Issue

    Matt,

    Which version you are on? What happens if you run your query through
    grunt instead of PigServer?
    I tried load-order-limit sequence on a small dataset on grunt and I
    got expected results.

    Ashutosh
    On Wed, Aug 4, 2010 at 15:07, Matthew Smith wrote:
    Hey,



    While running in Java a LIMIT statement is not getting executed.



    /code

    myServer.registerQuery("flow_firstcut = FOREACH
    data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches 'someIP';");



    myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT O 10;");



    myServer.store("topTen", outputFilePath);



    /code



    This produces a 699 line file. It should produce a 10 line file.



    /code

    registerQuery("flow_firstcut = FOREACH data
    GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches '"+parameters[1]+"';");



    //myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT filtered
    10;");



    myServer.store("topTen", outputFilePath);

    /code



    This produces a 10 line file.



    Is there a known bug I am unaware of or can you not order then limit?

    http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT

    indicates that this is a valid sequence of calls.



    Help?



    Matt
  • Matthew Smith at Aug 6, 2010 at 2:20 pm
    B is not empty:
    (58.72.19.26, 58.72.19.26,38627,22196,6,512, FS PA)
    (58.72.19.26, 36.65.53.83,44133,10957,6,646, FS PA)
    (58.72.19.26, 68.99.24.4,43951,11023,6,364, FS PA)
    (58.72.19.26, 9.7.68.69,18644,20524,17,228, FS PA)
    (58.72.19.26, 73.77.82.19,25,1024,6,194, FS PA)
    (58.72.19.26, 36.65.53.83,56380,71718,6,1003, FS PA)
    (58.72.19.26, 58.72.19.26,10221,44938,6,277, FS PA)
    (58.72.19.26, 77.52.5.64,69247,11023,6,389, FS PA)
    (58.72.19.26, 93.6.87.73,38149,1024,6,138, FS PA)
    (58.72.19.26, 58.72.19.26,11558,24292,6,812, FS PA)
    (58.72.19.26, 58.72.19.26,65668,71318,6,175, FS PA)
    (58.72.19.26, 68.99.24.4,61923,1024,6,1598, FS PA)
    (58.72.19.26, 60.41.59.65,22421,65796,6,1402, FS PA)
    (58.72.19.26, 58.72.19.26,69740,21873,6,322, S A)
    (58.72.19.26, 95.70.58.21,11058,1024,6,1453, FS PA)
    (58.72.19.26, 42.10.50.36,44863,11023,6,251, FS PA)
    (58.72.19.26, 57.6.91.5,25857,1024,6,1546, FS PA)
    (58.72.19.26, 68.99.24.4,54756,11023,6,219, FS PA)
    (58.72.19.26, 36.65.53.83,73335,43857,6,9, FS PA)
    (58.72.19.26, 95.70.58.21,32204,11023,6,1635, S A)
    (58.72.19.26, 76.48.82.73,46483,1024,6,127, FS PA)
    (58.72.19.26, 81.88.14.14,55609,1024,6,507, FS PA)
    (58.72.19.26, 1.54.61.21,65763,1024,6,370, FS PA)


    But after I do:
    grunt> C = ORDER B BY bytes DESC;
    grunt> Dump C;
    I get the same error as before: > java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160


    Which would lead me to believe my ORDER is broken. Is there a conf I need to change?


    -----Original Message-----
    From: Ashutosh Chauhan
    Sent: Friday, August 06, 2010 2:43 AM
    To: Matthew Smith
    Cc: pig-user@hadoop.apache.org
    Subject: Re: LIMIT Issue

    This is most likely because B is empty. do

    grunt> dump A; -- to verify data is getting loaded as you are expecting.
    grunt> dump B; -- to verify that B is non-empty.

    Ashutosh
    On Thu, Aug 5, 2010 at 14:54, Matthew Smith wrote:
    While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.

    I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.

    Thoughts?


    grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
    grunt> B = FILTER A BY sIP matches '61.81.46.45';
    grunt> C = ORDER B BY bytes DESC;
    grunt> D = LIMIT C 10;
    grunt> DUMP D;




    2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
    2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
    2010-08-05 14:47:52,681 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
    2010-08-05 14:47:52,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
    2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
    2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
    2010-08-05 14:47:52,911 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:52,934 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:52,935 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2010-08-05 14:47:54,187 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2010-08-05 14:47:54,228 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,229 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2010-08-05 14:47:54,246 [Thread-5] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2010-08-05 14:47:54,434 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,455 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:47:54,734 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2010-08-05 14:47:54,754 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:47:54,821 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,827 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,839 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,841 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:55,245 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
    2010-08-05 14:47:56,352 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
    2010-08-05 14:47:56,354 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
    2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
    2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
    2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
    2010-08-05 14:47:59,754 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2010-08-05 14:48:00,873 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2010-08-05 14:48:00,890 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:00,891 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2010-08-05 14:48:00,891 [Thread-18] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2010-08-05 14:48:00,999 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,003 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:01,155 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:01,189 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,192 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,209 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
    2010-08-05 14:48:01,391 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
    2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
    2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
    2010-08-05 14:48:02,369 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - Starting flush of map output
    2010-08-05 14:48:02,752 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
    2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
    2010-08-05 14:48:02,761 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:02,796 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
    2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
    2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:02,935 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:03,023 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
    2010-08-05 14:48:03,025 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
    2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:03,029 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
    2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
    2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
    2010-08-05 14:48:06,431 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
    2010-08-05 14:48:06,432 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:06,432 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2010-08-05 14:48:08,062 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2010-08-05 14:48:08,194 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,195 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2010-08-05 14:48:08,197 [Thread-33] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2010-08-05 14:48:08,475 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,478 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:08,792 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:09,024 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:09,027 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:09,028 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
    2010-08-05 14:48:09,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
    2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
    2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
    2010-08-05 14:48:09,479 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:09,491 [Thread-42] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
    java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
    Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
    ... 6 more
    2010-08-05 14:48:13,800 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
    2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
    2010-08-05 14:48:13,803 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
    2010-08-05 14:48:13,811 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
    Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log

    -----Original Message-----
    From: Ashutosh Chauhan
    Sent: Thursday, August 05, 2010 3:10 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: LIMIT Issue

    To cut down on the problem space, can you try your query on grunt. If
    it works there, problem would be something to do with PigServer, else
    its related to Pig core itself.

    Ashutosh
    On Thu, Aug 5, 2010 at 10:57, Matthew Smith wrote:
    No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.

    -----Original Message-----
    From: Ashutosh Chauhan
    Sent: Thursday, August 05, 2010 12:54 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: LIMIT Issue

    Matt,

    Which version you are on? What happens if you run your query through
    grunt instead of PigServer?
    I tried load-order-limit sequence on a small dataset on grunt and I
    got expected results.

    Ashutosh
    On Wed, Aug 4, 2010 at 15:07, Matthew Smith wrote:
    Hey,



    While running in Java a LIMIT statement is not getting executed.



    /code

    myServer.registerQuery("flow_firstcut = FOREACH
    data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches 'someIP';");



    myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT O 10;");



    myServer.store("topTen", outputFilePath);



    /code



    This produces a 699 line file. It should produce a 10 line file.



    /code

    registerQuery("flow_firstcut = FOREACH data
    GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches '"+parameters[1]+"';");



    //myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT filtered
    10;");



    myServer.store("topTen", outputFilePath);

    /code



    This produces a 10 line file.



    Is there a known bug I am unaware of or can you not order then limit?

    http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT

    indicates that this is a valid sequence of calls.



    Help?



    Matt
  • Ashutosh Chauhan at Aug 9, 2010 at 12:58 am
    It looks like a bug then. Do you have a script and small enough
    dataset which you can upload on jira which reproduces the issue. If
    so, go ahead and create a jira ticket with script and data. Are you
    using local mode or mapreduce mode ?

    Ashutosh
    On Fri, Aug 6, 2010 at 07:16, Matthew Smith wrote:
    B is not empty:
    (58.72.19.26, 58.72.19.26,38627,22196,6,512, FS PA)
    (58.72.19.26, 36.65.53.83,44133,10957,6,646, FS PA)
    (58.72.19.26, 68.99.24.4,43951,11023,6,364, FS PA)
    (58.72.19.26, 9.7.68.69,18644,20524,17,228, FS PA)
    (58.72.19.26, 73.77.82.19,25,1024,6,194, FS PA)
    (58.72.19.26, 36.65.53.83,56380,71718,6,1003, FS PA)
    (58.72.19.26, 58.72.19.26,10221,44938,6,277, FS PA)
    (58.72.19.26, 77.52.5.64,69247,11023,6,389, FS PA)
    (58.72.19.26, 93.6.87.73,38149,1024,6,138, FS PA)
    (58.72.19.26, 58.72.19.26,11558,24292,6,812, FS PA)
    (58.72.19.26, 58.72.19.26,65668,71318,6,175, FS PA)
    (58.72.19.26, 68.99.24.4,61923,1024,6,1598, FS PA)
    (58.72.19.26, 60.41.59.65,22421,65796,6,1402, FS PA)
    (58.72.19.26, 58.72.19.26,69740,21873,6,322, S A)
    (58.72.19.26, 95.70.58.21,11058,1024,6,1453, FS PA)
    (58.72.19.26, 42.10.50.36,44863,11023,6,251, FS PA)
    (58.72.19.26, 57.6.91.5,25857,1024,6,1546, FS PA)
    (58.72.19.26, 68.99.24.4,54756,11023,6,219, FS PA)
    (58.72.19.26, 36.65.53.83,73335,43857,6,9, FS PA)
    (58.72.19.26, 95.70.58.21,32204,11023,6,1635, S A)
    (58.72.19.26, 76.48.82.73,46483,1024,6,127, FS PA)
    (58.72.19.26, 81.88.14.14,55609,1024,6,507, FS PA)
    (58.72.19.26, 1.54.61.21,65763,1024,6,370, FS PA)


    But after I do:
    grunt> C = ORDER B BY bytes DESC;
    grunt> Dump C;
    I get the same error as before: > java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160


    Which would lead me to believe my ORDER is broken. Is there a conf I need to change?


    -----Original Message-----
    From: Ashutosh Chauhan
    Sent: Friday, August 06, 2010 2:43 AM
    To: Matthew Smith
    Cc: pig-user@hadoop.apache.org
    Subject: Re: LIMIT Issue

    This is most likely because B is empty. do

    grunt> dump A; -- to verify data is getting loaded as you are expecting.
    grunt> dump B; -- to verify that B is non-empty.

    Ashutosh
    On Thu, Aug 5, 2010 at 14:54, Matthew Smith wrote:
    While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.

    I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.

    Thoughts?


    grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
    grunt> B = FILTER A BY sIP matches '61.81.46.45';
    grunt> C = ORDER B BY bytes DESC;
    grunt> D = LIMIT C 10;
    grunt> DUMP D;




    2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
    2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
    2010-08-05 14:47:52,681 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
    2010-08-05 14:47:52,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
    2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
    2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
    2010-08-05 14:47:52,911 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:52,934 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:52,935 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2010-08-05 14:47:54,187 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2010-08-05 14:47:54,228 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,229 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2010-08-05 14:47:54,246 [Thread-5] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2010-08-05 14:47:54,434 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,455 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:47:54,734 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2010-08-05 14:47:54,754 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:47:54,821 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,827 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,839 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:54,841 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:55,245 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
    2010-08-05 14:47:56,352 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
    2010-08-05 14:47:56,354 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
    2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
    2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
    2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
    2010-08-05 14:47:59,754 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2010-08-05 14:48:00,873 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2010-08-05 14:48:00,890 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:00,891 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2010-08-05 14:48:00,891 [Thread-18] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2010-08-05 14:48:00,999 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,003 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:01,155 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:01,189 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,192 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:01,209 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
    2010-08-05 14:48:01,391 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
    2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
    2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
    2010-08-05 14:48:02,369 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - Starting flush of map output
    2010-08-05 14:48:02,752 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
    2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
    2010-08-05 14:48:02,761 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:02,796 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
    2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
    2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:02,935 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:03,023 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
    2010-08-05 14:48:03,025 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
    2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
    2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:03,029 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
    2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
    2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
    2010-08-05 14:48:06,431 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
    2010-08-05 14:48:06,432 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:06,432 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2010-08-05 14:48:08,062 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2010-08-05 14:48:08,194 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,195 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2010-08-05 14:48:08,197 [Thread-33] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2010-08-05 14:48:08,475 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,478 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:08,792 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
    2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
    2010-08-05 14:48:09,024 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:09,027 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:09,028 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
    2010-08-05 14:48:09,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
    2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
    2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
    2010-08-05 14:48:09,479 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:09,491 [Thread-42] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
    java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
    Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
    at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
    at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
    ... 6 more
    2010-08-05 14:48:13,800 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
    2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
    2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
    2010-08-05 14:48:13,803 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
    2010-08-05 14:48:13,811 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
    2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
    Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log

    -----Original Message-----
    From: Ashutosh Chauhan
    Sent: Thursday, August 05, 2010 3:10 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: LIMIT Issue

    To cut down on the problem space, can you try your query on grunt. If
    it works there, problem would be something to do with PigServer, else
    its related to Pig core itself.

    Ashutosh
    On Thu, Aug 5, 2010 at 10:57, Matthew Smith wrote:
    No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.

    -----Original Message-----
    From: Ashutosh Chauhan
    Sent: Thursday, August 05, 2010 12:54 PM
    To: pig-user@hadoop.apache.org
    Subject: Re: LIMIT Issue

    Matt,

    Which version you are on? What happens if you run your query through
    grunt instead of PigServer?
    I tried load-order-limit sequence on a small dataset on grunt and I
    got expected results.

    Ashutosh
    On Wed, Aug 4, 2010 at 15:07, Matthew Smith wrote:
    Hey,



    While running in Java a LIMIT statement is not getting executed.



    /code

    myServer.registerQuery("flow_firstcut = FOREACH
    data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches 'someIP';");



    myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT O 10;");



    myServer.store("topTen", outputFilePath);



    /code



    This produces a 699 line file. It should produce a 10 line file.



    /code

    registerQuery("flow_firstcut = FOREACH data
    GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

    myServer.registerQuery("filtered = FILTER
    flow_firstcut BY sIP matches '"+parameters[1]+"';");



    //myServer.registerQuery("O = ORDER filtered BY
    bytes DESC;");



    myServer.registerQuery("topTen = LIMIT filtered
    10;");



    myServer.store("topTen", outputFilePath);

    /code



    This produces a 10 line file.



    Is there a known bug I am unaware of or can you not order then limit?

    http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT

    indicates that this is a valid sequence of calls.



    Help?



    Matt

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 4, '10 at 10:07p
activeAug 9, '10 at 12:58a
posts8
users2
websitepig.apache.org

2 users in discussion

Ashutosh Chauhan: 4 posts Matthew Smith: 4 posts

People

Translate

site design / logo © 2021 Grokbase