Grokbase Groups Pig user August 2010
FAQ
B is not empty:
(58.72.19.26, 58.72.19.26,38627,22196,6,512, FS PA)
(58.72.19.26, 36.65.53.83,44133,10957,6,646, FS PA)
(58.72.19.26, 68.99.24.4,43951,11023,6,364, FS PA)
(58.72.19.26, 9.7.68.69,18644,20524,17,228, FS PA)
(58.72.19.26, 73.77.82.19,25,1024,6,194, FS PA)
(58.72.19.26, 36.65.53.83,56380,71718,6,1003, FS PA)
(58.72.19.26, 58.72.19.26,10221,44938,6,277, FS PA)
(58.72.19.26, 77.52.5.64,69247,11023,6,389, FS PA)
(58.72.19.26, 93.6.87.73,38149,1024,6,138, FS PA)
(58.72.19.26, 58.72.19.26,11558,24292,6,812, FS PA)
(58.72.19.26, 58.72.19.26,65668,71318,6,175, FS PA)
(58.72.19.26, 68.99.24.4,61923,1024,6,1598, FS PA)
(58.72.19.26, 60.41.59.65,22421,65796,6,1402, FS PA)
(58.72.19.26, 58.72.19.26,69740,21873,6,322, S A)
(58.72.19.26, 95.70.58.21,11058,1024,6,1453, FS PA)
(58.72.19.26, 42.10.50.36,44863,11023,6,251, FS PA)
(58.72.19.26, 57.6.91.5,25857,1024,6,1546, FS PA)
(58.72.19.26, 68.99.24.4,54756,11023,6,219, FS PA)
(58.72.19.26, 36.65.53.83,73335,43857,6,9, FS PA)
(58.72.19.26, 95.70.58.21,32204,11023,6,1635, S A)
(58.72.19.26, 76.48.82.73,46483,1024,6,127, FS PA)
(58.72.19.26, 81.88.14.14,55609,1024,6,507, FS PA)
(58.72.19.26, 1.54.61.21,65763,1024,6,370, FS PA)


But after I do:
grunt> C = ORDER B BY bytes DESC;
grunt> Dump C;
I get the same error as before: > java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160


Which would lead me to believe my ORDER is broken. Is there a conf I need to change?


-----Original Message-----
From: Ashutosh Chauhan
Sent: Friday, August 06, 2010 2:43 AM
To: Matthew Smith
Cc: pig-user@hadoop.apache.org
Subject: Re: LIMIT Issue

This is most likely because B is empty. do

grunt> dump A; -- to verify data is getting loaded as you are expecting.
grunt> dump B; -- to verify that B is non-empty.

Ashutosh
On Thu, Aug 5, 2010 at 14:54, Matthew Smith wrote:
While running grunt I ran into another error. I see it is looking for another file, but I have never run into this problem with grunt before. This environment was freshly installed this morning before the grunt shell was executed.

I also checked my PigServer() Java code on the new install, and it still produces a 699 line file which is ORDERed but not LIMITed.

Thoughts?


grunt> A = LOAD '0' USING PigStorage('|') as (sIP:chararray,dIP:chararray,sPort:int, dPort:int,protocol:int, bytes:int, flags:chararray);
grunt> B = FILTER A BY sIP matches '61.81.46.45';
grunt> C = ORDER B BY bytes DESC;
grunt> D = LIMIT C 10;
grunt> DUMP D;




2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for A
2010-08-05 14:47:52,622 [main] INFO  org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned for A
2010-08-05 14:47:52,681 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId=
2010-08-05 14:47:52,819 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: Store(file:/tmp/temp1184504472/tmp-1623830760:org.apache.pig.builtin.BinStorage) - 1-54 Operator Key: 1-54)
2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2010-08-05 14:47:52,895 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
2010-08-05 14:47:52,911 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:52,934 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:52,935 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-08-05 14:47:54,187 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-08-05 14:47:54,228 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,229 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-08-05 14:47:54,246 [Thread-5] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-08-05 14:47:54,434 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,455 [Thread-5] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:47:54,461 [Thread-5] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:47:54,734 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2010-08-05 14:47:54,754 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:47:54,757 [Thread-14] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:47:54,821 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,827 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,839 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:54,841 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:55,245 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001
2010-08-05 14:47:56,352 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
2010-08-05 14:47:56,354 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0001_m_000000_0 is allowed to commit now
2010-08-05 14:47:56,355 [Thread-14] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0001_m_000000_0' to file:/tmp/temp1184504472/tmp-842564749
2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:47:56,358 [Thread-14] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_000000_0' done.
2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
2010-08-05 14:47:59,754 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:47:59,754 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-08-05 14:48:00,873 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-08-05 14:48:00,890 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:00,891 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-08-05 14:48:00,891 [Thread-18] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-08-05 14:48:00,999 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,003 [Thread-18] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:01,009 [Thread-18] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:01,155 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:01,157 [Thread-27] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:01,189 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,192 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:01,209 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2010-08-05 14:48:01,391 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0002
2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2010-08-05 14:48:02,157 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2010-08-05 14:48:02,369 [Thread-27] INFO  org.apache.hadoop.mapred.MapTask - Starting flush of map output
2010-08-05 14:48:02,752 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting
2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:48:02,753 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_000000_0' done.
2010-08-05 14:48:02,761 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:02,789 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:48:02,796 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Merging 1 sorted segments
2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.Merger - Down to the last merge-pass, with 0 segments left of total size: 0 bytes
2010-08-05 14:48:02,932 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:48:02,935 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:03,023 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting
2010-08-05 14:48:03,025 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner -
2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task attempt_local_0002_r_000000_0 is allowed to commit now
2010-08-05 14:48:03,026 [Thread-27] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:03,029 [Thread-27] INFO  org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_local_0002_r_000000_0' to file:/tmp/temp1184504472/tmp-657784620
2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2010-08-05 14:48:03,031 [Thread-27] INFO  org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_r_000000_0' done.
2010-08-05 14:48:06,431 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
2010-08-05 14:48:06,432 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:06,432 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2010-08-05 14:48:08,062 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2010-08-05 14:48:08,194 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,195 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2010-08-05 14:48:08,197 [Thread-33] WARN  org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
2010-08-05 14:48:08,475 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,478 [Thread-33] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:08,480 [Thread-33] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:08,792 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2010-08-05 14:48:08,794 [Thread-42] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2010-08-05 14:48:09,024 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:09,027 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:09,028 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - io.sort.mb = 100
2010-08-05 14:48:09,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0003
2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2010-08-05 14:48:09,443 [Thread-42] INFO  org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2010-08-05 14:48:09,479 [Thread-42] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:09,491 [Thread-42] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0003
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:527)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/user/matt/pigsample_19823722_1281044888160
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)
... 6 more
2010-08-05 14:48:13,800 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2010-08-05 14:48:13,801 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed!
2010-08-05 14:48:13,802 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: "file:/tmp/temp1184504472/tmp-1623830760"
2010-08-05 14:48:13,803 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2010-08-05 14:48:13,811 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2010-08-05 14:48:13,814 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias D
Details at logfile: /home/matt/workspace/pig-0.7.0/pig_1281044613580.log

-----Original Message-----
From: Ashutosh Chauhan
Sent: Thursday, August 05, 2010 3:10 PM
To: pig-user@hadoop.apache.org
Subject: Re: LIMIT Issue

To cut down on the problem space, can you try your query on grunt. If
it works there, problem would be something to do with PigServer, else
its related to Pig core itself.

Ashutosh
On Thu, Aug 5, 2010 at 10:57, Matthew Smith wrote:
No I have not used it in grunt. I am looking to use the pigServer because of the parameter passing that is doable through Java. I am using Pig 0.7.0.

-----Original Message-----
From: Ashutosh Chauhan
Sent: Thursday, August 05, 2010 12:54 PM
To: pig-user@hadoop.apache.org
Subject: Re: LIMIT Issue

Matt,

Which version you are on? What happens if you run your query through
grunt instead of PigServer?
I tried load-order-limit sequence on a small dataset on grunt and I
got expected results.

Ashutosh
On Wed, Aug 4, 2010 at 15:07, Matthew Smith wrote:
Hey,



While running in Java a LIMIT statement is not getting executed.



/code

myServer.registerQuery("flow_firstcut = FOREACH
data GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

myServer.registerQuery("filtered = FILTER
flow_firstcut BY sIP matches 'someIP';");



myServer.registerQuery("O = ORDER filtered BY
bytes DESC;");



myServer.registerQuery("topTen = LIMIT O 10;");



myServer.store("topTen", outputFilePath);



/code



This produces a 699 line file. It should produce a 10 line file.



/code

registerQuery("flow_firstcut = FOREACH data
GENERATE sIP, dIP, sPort, dPort, protocol, bytes, flags;");

myServer.registerQuery("filtered = FILTER
flow_firstcut BY sIP matches '"+parameters[1]+"';");



//myServer.registerQuery("O = ORDER filtered BY
bytes DESC;");



myServer.registerQuery("topTen = LIMIT filtered
10;");



myServer.store("topTen", outputFilePath);

/code



This produces a 10 line file.



Is there a known bug I am unaware of or can you not order then limit?

http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#LIMIT

indicates that this is a valid sequence of calls.



Help?



Matt

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 7 of 8 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 4, '10 at 10:07p
activeAug 9, '10 at 12:58a
posts8
users2
websitepig.apache.org

2 users in discussion

Ashutosh Chauhan: 4 posts Matthew Smith: 4 posts

People

Translate

site design / logo © 2021 Grokbase