Grokbase Groups Pig user August 2010
FAQ
While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.

I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.

Thoughts?


grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
grunt> B = FILTER A BY sIP matches '61.81.46.45';
grunt> C = ORDER B BY bytes DESC;
grunt> D = LIMIT C 10;
grunt> DUMP D;




2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
2010-08-05 14:47:52,622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
2010-08-05 14:47:52,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2010-08-05 14:47:52,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2010-08-05 14:47:52,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
2010-08-05 14:47:52,911 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:52,934 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:52,935 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-08-05 14:47:54,187 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-08-05 14:47:54,228 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,229 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-08-05 14:47:54,246 [Thread-5] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-08-05 14:47:54,434 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,455 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:47:54,461 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:47:54,734 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2010-08-05 14:47:54,754 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:47:54,757 [Thread-14] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:47:54,821 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,827 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,839 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,841 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:55,245 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
2010-08-05 14:47:56,352 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
2010-08-05 14:47:56,354 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
2010-08-05 14:47:56,355 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:47:56,358 [Thread-14] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
2010-08-05 14:47:59,754 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:59,754 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-08-05 14:48:00,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-08-05 14:48:00,890 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:00,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-08-05 14:48:00,891 [Thread-18] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-08-05 14:48:00,999 [Thread-18] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,003 [Thread-18] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,009 [Thread-18] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:01,009 [Thread-18] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:01,155 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,157 [Thread-27] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:01,157 [Thread-27] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:01,189 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,192 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,209 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2010-08-05 14:48:01,391 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
2010-08-05 14:48:02,157 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2010-08-05 14:48:02,157 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2010-08-05 14:48:02,369 [Thread-27] INFO org.apache.hadoop.mapred.MapTask - Starting flush of map output
2010-08-05 14:48:02,752 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
2010-08-05 14:48:02,753 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:48:02,753 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
2010-08-05 14:48:02,761 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:02,789 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:02,789 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:48:02,796 [Thread-27] INFO org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
2010-08-05 14:48:02,932 [Thread-27] INFO org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
2010-08-05 14:48:02,932 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:48:02,935 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:03,023 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
2010-08-05 14:48:03,025 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
2010-08-05 14:48:03,026 [Thread-27] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:03,029 [Thread-27] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
2010-08-05 14:48:03,031 [Thread-27] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2010-08-05 14:48:03,031 [Thread-27] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
2010-08-05 14:48:06,431 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
2010-08-05 14:48:06,432 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:06,432 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-08-05 14:48:08,062 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-08-05 14:48:08,194 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,195 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-08-05 14:48:08,197 [Thread-33] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-08-05 14:48:08,475 [Thread-33] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,478 [Thread-33] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,480 [Thread-33] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:08,480 [Thread-33] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:08,792 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,794 [Thread-42] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:08,794 [Thread-42] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:09,024 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:09,027 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:09,028 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2010-08-05 14:48:09,290 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
2010-08-05 14:48:09,443 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2010-08-05 14:48:09,443 [Thread-42] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2010-08-05 14:48:09,479 [Thread-42] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:09,491 [Thread-42] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
at org.apache.pig.impl.io.ReadToEndLoader.(WeightedRangePartitioner.java:108)
... 6 more
2010-08-05 14:48:13,800 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
2010-08-05 14:48:13,803 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2010-08-05 14:48:13,811 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log

-----Original Message-----
From: Ashutosh Chauhan
Sent: Thursday, August 05, 2010 3:10 PM
To: pig-user@hadoop.apache.org
Subject: Re: LIMIT Issue

To cut down on the problem space, can you try your query on grunt. If
it works there, problem would be something to do with PigServer, else
its related to Pig core itself.

Ashutosh
On Thu, Aug 5, 2010 at 10:57, Matthew Smith wrote:
No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.

-----Original Message-----
From: Ashutosh Chauhan
Sent: Thursday, August 05, 2010 12:54 PM
To: pig-user@hadoop.apache.org
Subject: Re: LIMIT Issue

Matt,

Which version you are on? What happens if you run your query through
grunt instead of PigServer?
I tried load-order-limit sequence on a small dataset on grunt and I
got expected results.

Ashutosh
On Wed, Aug 4, 2010 at 15:07, Matthew Smith wrote:
Hey,



While running in Java a LIMIT statement is not getting executed.



/code

myServer.registerQuery("flow_firstcut = FOREACH
data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

myServer.registerQuery("filtered = FILTER
flow_firstcut BY sIP matches 'someIP';");



myServer.registerQuery("O = ORDER filtered BY
bytes DESC;");



myServer.registerQuery("topTen = LIMIT O 10;");



myServer.store("topTen", outputFilePath);



/code



This produces a 699 line file. It should produce a 10 line file.



/code

registerQuery("flow_firstcut = FOREACH data
GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

myServer.registerQuery("filtered = FILTER
flow_firstcut BY sIP matches '"+parameters[1]+"';");



//myServer.registerQuery("O = ORDER filtered BY
bytes DESC;");



myServer.registerQuery("topTen = LIMIT filtered
10;");



myServer.store("topTen", outputFilePath);

/code



This produces a 10 line file.



Is there a known bug I am unaware of or can you not order then limit?

http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT

indicates that this is a valid sequence of calls.



Help?



Matt

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 8 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 4, '10 at 10:07p
activeAug 9, '10 at 12:58a
posts8
users2
websitepig.apache.org

2 users in discussion

Ashutosh Chauhan: 4 posts Matthew Smith: 4 posts

People

Translate

site design / logo © 2021 Grokbase