Grokbase Groups Pig user May 2011
FAQ
Hello,

I'm running into a weird problem that I'm hoping you can help me with.

I'm basically just loading a access log, grouping, ordering and then
dumping the data.

I can load the log, group and order when I'm in local mode, but when I try
to do the same in the hadoop cluster I always get a error with the 'order
by' clause.

Here's the relevant bits:

REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar
define logloader
org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader();

logs = LOAD
'/logs/2011/05/12/16/localhost.access.log_hadoop01_2011-05-12_16-18-30.log'
using logloader as (remoteHost:CHARARRAY, hyphen:CHARARRAY,
hyphen2:CHARARRAY, time:CHARARRAY, method:CHARARRAY, uri:CHARARRAY,
protocol:CHARARRAY, statusCode:CHARARRAY, responseSize:CHARARRAY,
treferer:CHARARRAY, agent:CHARARRAY);

grp = GROUP logs BY treferer;

out = FOREACH grp GENERATE group, COUNT($1) as ref_cnt;

out2 = ORDER out BY ref_cnt;

dump out2;

In the cluster I get the following:
java.lang.RuntimeException:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does
not exist: file:/home/hdfs/pigsample_1861447257_1305315373876
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:139)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:638)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Input path does not exist:
file:/home/hdfs/pigsample_1861447257_1305315373876
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:231)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:248)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:153)
at
org.apache.pig.impl.io.ReadToEndLoader.(WeightedRangePartitioner.java:112)
... 6 more

....

2011-05-13 12:36:19,386 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Some jobs have failed! Stop running all dependent jobs
2011-05-13 12:36:19,429 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1066: Unable to open iterator for alias out2


Any help on this would be appreciated.

Search Discussions

  • Thejas M Nair at May 13, 2011 at 11:15 pm
    The exception stack has LocalJobRunner, that is strange.
    Have you specified the cmd line option "-x mapreduce" ? Is the hadoop conf dir in class path?
    -Thejas



    On 5/13/11 12:37 PM, "Irooniam" wrote:

    Hello,

    I'm running into a weird problem that I'm hoping you can help me with.

    I'm basically just loading a access log, grouping, ordering and then
    dumping the data.

    I can load the log, group and order when I'm in local mode, but when I try
    to do the same in the hadoop cluster I always get a error with the 'order
    by' clause.

    ..

    .java:117)
    at
    org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:638)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
    at
    org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
    Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
    Input path does not exist:


    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedMay 13, '11 at 7:37p
activeMay 13, '11 at 11:15p
posts2
users2
websitepig.apache.org

2 users in discussion

Irooniam: 1 post Thejas M Nair: 1 post

People

Translate

site design / logo © 2021 Grokbase