Grokbase Groups Pig user August 2010
FAQ
UPDATE: I attempted my code in the amazon cloud (aws.amazon.com) and the
script worked as intended over the data set. This leads me to believe
that the issue is with pig-0.7.0 or my configuration. I would however
like to not pay for something that is free :D. Any other ideas would be
most welcome



@Thejas

I changed the Script to:

start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);

target = FILTER start BY sip matches '51.37.8.63';

just_bytes= FOREACH target GENERATE bytes;

fail = ORDER just_bytes BY bytes DESC;

not_reached = LIMIT fail 10;

dump not_reached;



and received the same error as before. I then changed the script to:



start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);

target = FILTER start BY sip matches '51.37.8.63';

stored = STORE target INTO 'myoutput';

second_start = LOAD 'myoutput/part-m-00000' USING PigStorage('\t') AS
(sip:chararray, dip:chararray, sport:int, dport:int, protocol:int,
packets:int, bytes:int, flags:chararray, startTime:long, endTime:long);

fail = ORDER second_start BY bytes DESC;

not_reached = LIMIT fail 10;

dump not_reached;



and received the same error.



@Mridul

I am using local mode at the moment. I don't understand the second
question.



Thanks,

Matt







From: Thejas M Nair
Sent: Thursday, August 19, 2010 5:34 PM
To: pig-user@hadoop.apache.org; Matthew Smith
Subject: Re: ORDER Issue (repost to avoid spam filters)



I think 0.7 had an issue where order-by used to fail if the input was
empty. But that does not seem to be the case here.
I am wondering if there is a parsing/data-format issue that is causing
bytes column to be empty , though I am not aware of emtpy/null value of
sort column causing issues.
Can you try dumping just the bytes column ?
Another thing you can try is to store the output of filter and load data
again before doing order-by ..

Please let us know what you find.

Thanks,
Thejas




On 8/19/10 11:35 AM, "Matthew Smith" wrote:

All,



I am running pig-0.7.0 and I have been running into an issue running the
ORDER command. I have attempted to run pig out of the box on 2 separate
LINUX OS (Ubuntu 10.4 and OpenSuse 11.2) and the same issue has
occurred. I run these commands in a script file:



start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);



target = FILTER start BY sip matches '51.37.8.63';



fail = ORDER target BY bytes DESC;



not_reached = LIMIT fail 10;



dump not_reached;





The error is listed below. I then run:





start = LOAD 'inputData' USING PigStorage('|') AS (sip:chararray,
dip:chararray, sport:int, dport:int, protocol:int, packets:int,
bytes:int, flags:chararray, startTime:long, endTime:long);



target = FILTER start BY sip matches '51.37.8.63';



dump target;





This script produces a large list of sips matching the filter. What am
I doing wrong that causes pig to not want to ORDER these properly? I
have been wrestling with this issue for a week now. Any help would be
greatly appreciated.







Best,



Matthew



/ERROR



java.lang.RuntimeException:

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: file:/user/matt/pigsample_24118161_1282155871461



at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner

s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:135)



at

org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)



at

org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:

117)



at

org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:

527)



at

org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:613)



at

org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)



at

org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)



Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:

Input path does not exist:

file:/user/matt/pigsample_24118161_1282155871461



at

org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInp

utFormat.java:224)



at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInpu

tFormat.listStatus(PigFileInputFormat.java:37)



at

org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInpu

tFormat.java:241)



at

org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)



at

org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:115)



at

org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioner

s.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:108)



... 6 more

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 8 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedAug 19, '10 at 6:36p
activeAug 25, '10 at 4:04p
posts8
users3
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase