Grokbase Groups Pig user June 2009
FAQ
pig 0.20.0
hadoop 0.18.0

I am running into a an exception when I try to my pig script in mapreduce
mode which incidentally works fine in the local mode

register my.jar;
define scoreEval ScoreEval();
aData = load 'input/a1.txt';
bData = load 'input/b1.txt';
crossproduct = cross aData , bData;
scoreTuple = foreach crossproduct generate scoreEval(*);
store scoreTuple into 'output';

When running this in hadoop mode

pig testCrossProduct.pig

I run into this exception trace. Any clues? thanks

2009-06-26 16:21:27,328 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
- Connecting to hadoop file system at: hdfs://localhost:9000
2009-06-26 16:21:28,046 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to map-reduce job tracker at: localhost:9001
2009-06-26 16:21:28,406 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
- Rewrite: POPackage->POForEach to POJoinPackage
2009-06-26 16:21:29,312 [Thread-7] WARN org.apache.hadoop.mapred.JobClient
- Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
2009-06-26 16:21:34,312 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2009-06-26 16:21:49,312 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
2009-06-26 16:22:09,312 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Map reduce job failed
2009-06-26 16:22:09,328 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2997: Unable to recreate exception from backed error:
java.lang.NoSuchMethodError: java.io.IOException: method
<init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
Details at logfile: C:\pigscripts\pig_1246058487187.log


Exception trace:

ERROR 2998: Unhandled internal error. java.lang.NoSuchMethodError:
java.io.IOException: method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
not found
at org.apache.pig.PigException.(PigException.java:191)
at
org.apache.pig.backend.BackendException.(ExecException.java:103)
at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at java.util.ArrayList.readObject(ArrayList.java:591)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at java.util.HashMap.readObject(HashMap.java:1067)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at java.util.ArrayList.readObject(ArrayList.java:591)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at java.util.HashMap.readObject(HashMap.java:1067)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at
org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

java.lang.Exception: java.lang.NoSuchMethodError: java.io.IOException:
method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
at org.apache.pig.PigException.(PigException.java:191)
at
org.apache.pig.backend.BackendException.(ExecException.java:103)
at
org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
at
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at java.util.ArrayList.readObject(ArrayList.java:591)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at java.util.HashMap.readObject(HashMap.java:1067)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at java.util.ArrayList.readObject(ArrayList.java:591)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at java.util.HashMap.readObject(HashMap.java:1067)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
at
org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
at org.apache.pig.PigServer.execute(PigServer.java:686)
at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
at org.apache.pig.Main.main(Main.java:354)
ERROR 2997: Unable to recreate exception from backed error:
java.lang.NoSuchMethodError: java.io.IOException: method
<init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to
store alias 10
at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
at
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
at org.apache.pig.Main.main(Main.java:354)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997:
Unable to recreate exception from backed error: java.lang.NoSuchMethodError:
java.io.IOException: method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
not found
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
at
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
at org.apache.pig.PigServer.execute(PigServer.java:686)
at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
... 5 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
... 11 more

Search Discussions

  • Parmod Mehta at Jun 27, 2009 at 12:56 am
    pig 0.20.0
    hadoop 0.18.0

    I am running into a an exception when I try to my pig script in mapreduce mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2009-06-26 16:21:49,312 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Unable to recreate exception from backed error: java.lang.NoSuchMethodError: java.io.IOException: method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error. java.lang.NoSuchMethodError: java.io.IOException: method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    at org.apache.pig.PigException.(PigException.java:191)
    at org.apache.pig.backend.BackendException.(ExecException.java:103)
    at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

    java.lang.Exception: java.lang.NoSuchMethodError: java.io.IOException: method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    at org.apache.pig.PigException.(PigException.java:191)
    at org.apache.pig.backend.BackendException.(ExecException.java:103)
    at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error: java.lang.NoSuchMethodError: java.io.IOException: method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 10
    at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to recreate exception from backed error: java.lang.NoSuchMethodError: java.io.IOException: method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more
  • Alan Gates at Jun 27, 2009 at 12:59 am
    It looks like you are running Java 1.5. You need 1.6 for Pig 0.2.0.

    Alan.
    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0
    hadoop 0.18.0

    I am running into a an exception when I try to my pig script in
    mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO
    org
    .apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
    $LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO
    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO
    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR
    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR
    org.apache.pig.tools.grunt.Grunt -
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error. java.lang.NoSuchMethodError:
    java.io.IOException: method <init>(Ljava/lang/String;Ljava/lang/
    Throwable;)V
    not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at
    org.apache.pig.backend.BackendException.<init>(BackendException.java:
    101)
    at
    org
    .apache
    .pig.backend.executionengine.ExecException.<init>(ExecException.java:
    103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at
    org
    .apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:
    446)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at
    org
    .apache
    .pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at
    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
    $Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:
    58)
    at
    org
    .apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
    82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

    java.lang.Exception: java.lang.NoSuchMethodError: java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at
    org.apache.pig.backend.BackendException.<init>(BackendException.java:
    101)
    at
    org
    .apache
    .pig.backend.executionengine.ExecException.<init>(ExecException.java:
    103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at
    org
    .apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:
    446)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at
    org
    .apache
    .pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at
    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
    $Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:
    58)
    at
    org
    .apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
    82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:
    133)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:
    695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    529)
    at
    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    280)
    at
    org
    .apache
    .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
    Unable to
    store alias 10
    at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    529)
    at
    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    280)
    at
    org
    .apache
    .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by: org.apache.pig.backend.executionengine.ExecException:
    ERROR 2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method <init>(Ljava/lang/String;Ljava/lang/
    Throwable;)V
    not found
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:
    133)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:
    695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at
    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more
  • Parmod Mehta at Jun 29, 2009 at 5:10 pm
    thanks alan! yeah that was the problem. Now I have a general question.

    I have two tab delimited input files FileA and FileB of different formats. I
    want to compare every line of FileA with every record in FileB basically
    cross product and compare them using heuristics. Here is my pig script

    register my.jar;
    define scoreEval ScoreEval();
    define scoreFilter ScoreFilter();
    storeData = load 'input/FileA.txt';
    amgData = load 'input/FileB.txt';
    crossproduct = cross storeData, amgData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    filteredScore = filter scoreTuple by ScoreFilter($0);
    store filteredScore into 'output';


    crossproduct = cross storeData, amgData (I can use parallel to span multiple
    reduce instances); creates the cross product which gets evaluated using my
    UDF and then filtered using another UDF.

    When running on a hadoop cluster for a block size of for example 50M input
    splits will be mapped to different data nodes in the cluster. If that is the
    case then would I not be missing cross products?

    FileA mapped into a1, a2, a3....
    FileB mapped into b1, b2, b3....

    and on one node I only have a1Xb1 and missing a1Xb2 etc. etc.





    On Fri, Jun 26, 2009 at 5:59 PM, Alan Gates wrote:

    It looks like you are running Java 1.5. You need 1.6 for Pig 0.2.0.

    Alan.


    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0
    hadoop 0.18.0

    I am running into a an exception when I try to my pig script in mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR org.apache.pig.tools.grunt.Grunt -
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error. java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at
    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at

    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at

    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at

    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at

    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

    java.lang.Exception: java.lang.NoSuchMethodError: java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at
    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at

    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at

    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at

    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at
    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at

    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at

    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at

    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at

    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at

    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to
    store alias 10
    at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at

    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at

    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
    2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at

    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at

    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more
  • Alan Gates at Jul 1, 2009 at 4:29 pm
    The cross product is not done in the map. The way pig handles this is
    is loads each input in the various maps, and then sends all of the
    data to a single reduce that does the cross product. The foreach and
    filter operators are then applied to the result of the cross in that
    same reduce.

    There are algorithms to do cross products in parallel, but pig does
    not currently use any of them. This means that as your data gets
    large, your process will slow down because of the single threading.

    Alan.
    On Jun 29, 2009, at 10:09 AM, Parmod Mehta wrote:

    thanks alan! yeah that was the problem. Now I have a general question.

    I have two tab delimited input files FileA and FileB of different
    formats. I
    want to compare every line of FileA with every record in FileB
    basically
    cross product and compare them using heuristics. Here is my pig script

    register my.jar;
    define scoreEval ScoreEval();
    define scoreFilter ScoreFilter();
    storeData = load 'input/FileA.txt';
    amgData = load 'input/FileB.txt';
    crossproduct = cross storeData, amgData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    filteredScore = filter scoreTuple by ScoreFilter($0);
    store filteredScore into 'output';


    crossproduct = cross storeData, amgData (I can use parallel to span
    multiple
    reduce instances); creates the cross product which gets evaluated
    using my
    UDF and then filtered using another UDF.

    When running on a hadoop cluster for a block size of for example 50M
    input
    splits will be mapped to different data nodes in the cluster. If
    that is the
    case then would I not be missing cross products?

    FileA mapped into a1, a2, a3....
    FileB mapped into b1, b2, b3....

    and on one node I only have a1Xb1 and missing a1Xb2 etc. etc.





    On Fri, Jun 26, 2009 at 5:59 PM, Alan Gates wrote:

    It looks like you are running Java 1.5. You need 1.6 for Pig 0.2.0.

    Alan.


    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0
    hadoop 0.18.0

    I am running into a an exception when I try to my pig script in
    mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO

    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
    $LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO

    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO

    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR

    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR
    org.apache.pig.tools.grunt.Grunt -
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error. java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at
    org
    .apache.pig.backend.BackendException.<init>(BackendException.java:
    101)
    at

    org
    .apache
    .pig
    .backend.executionengine.ExecException.<init>(ExecException.java:
    103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at

    org
    .apache
    .pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at

    org
    .apache
    .pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:
    53)
    at

    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
    $Reduce.configure(PigMapReduce.java:177)
    at
    org
    .apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at

    org
    .apache
    .hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
    2209)

    java.lang.Exception: java.lang.NoSuchMethodError:
    java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at
    org
    .apache.pig.backend.BackendException.<init>(BackendException.java:
    101)
    at

    org
    .apache
    .pig
    .backend.executionengine.ExecException.<init>(ExecException.java:
    103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at

    org
    .apache
    .pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at

    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at

    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at
    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at

    org
    .apache
    .pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:
    53)
    at

    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
    $Reduce.configure(PigMapReduce.java:177)
    at
    org
    .apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at

    org
    .apache
    .hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
    2209)

    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:
    133)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:
    695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    529)
    at

    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    280)
    at

    org
    .apache
    .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
    Unable to
    store alias 10
    at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    529)
    at

    org
    .apache
    .pig
    .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    280)
    at

    org
    .apache
    .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by: org.apache.pig.backend.executionengine.ExecException:
    ERROR
    2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:
    133)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:
    695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at

    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more
  • Parmod Mehta at Jul 1, 2009 at 5:06 pm
    Thanks Alan

    cross product won't even scale even if we use PARALLEL to set the number of
    reducers e.g. to 10?
    On Wed, Jul 1, 2009 at 9:27 AM, Alan Gates wrote:

    The cross product is not done in the map. The way pig handles this is is
    loads each input in the various maps, and then sends all of the data to a
    single reduce that does the cross product. The foreach and filter operators
    are then applied to the result of the cross in that same reduce.

    There are algorithms to do cross products in parallel, but pig does not
    currently use any of them. This means that as your data gets large, your
    process will slow down because of the single threading.

    Alan.


    On Jun 29, 2009, at 10:09 AM, Parmod Mehta wrote:

    thanks alan! yeah that was the problem. Now I have a general question.
    I have two tab delimited input files FileA and FileB of different formats.
    I
    want to compare every line of FileA with every record in FileB basically
    cross product and compare them using heuristics. Here is my pig script

    register my.jar;
    define scoreEval ScoreEval();
    define scoreFilter ScoreFilter();
    storeData = load 'input/FileA.txt';
    amgData = load 'input/FileB.txt';
    crossproduct = cross storeData, amgData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    filteredScore = filter scoreTuple by ScoreFilter($0);
    store filteredScore into 'output';


    crossproduct = cross storeData, amgData (I can use parallel to span
    multiple
    reduce instances); creates the cross product which gets evaluated using
    my
    UDF and then filtered using another UDF.

    When running on a hadoop cluster for a block size of for example 50M input
    splits will be mapped to different data nodes in the cluster. If that is
    the
    case then would I not be missing cross products?

    FileA mapped into a1, a2, a3....
    FileB mapped into b1, b2, b3....

    and on one node I only have a1Xb1 and missing a1Xb2 etc. etc.






    On Fri, Jun 26, 2009 at 5:59 PM, Alan Gates wrote:

    It looks like you are running Java 1.5. You need 1.6 for Pig 0.2.0.
    Alan.


    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0
    hadoop 0.18.0

    I am running into a an exception when I try to my pig script in
    mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR org.apache.pig.tools.grunt.Grunt -
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error. java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

    java.lang.Exception: java.lang.NoSuchMethodError: java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
    to
    store alias 10
    at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
    2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more
  • Alan Gates at Jul 1, 2009 at 7:03 pm
    At this point, no. In a cross product every record has to be combined
    with every other record. The simplest way to accomplish this is to
    send every record to a single reducer. One fairly simple improvement
    would be to implement cross by splitting one file across the reducers
    and having it open the entire second file in each reducer and do the
    cross there. This would give some parallelism, though it would mean a
    lot of network traffic. I'm sure there are more sophisticated
    algorithms available as well.

    Alan.
    On Jul 1, 2009, at 9:54 AM, Parmod Mehta wrote:

    Thanks Alan

    cross product won't even scale even if we use PARALLEL to set the
    number of
    reducers e.g. to 10?
    On Wed, Jul 1, 2009 at 9:27 AM, Alan Gates wrote:

    The cross product is not done in the map. The way pig handles this
    is is
    loads each input in the various maps, and then sends all of the
    data to a
    single reduce that does the cross product. The foreach and filter
    operators
    are then applied to the result of the cross in that same reduce.

    There are algorithms to do cross products in parallel, but pig does
    not
    currently use any of them. This means that as your data gets
    large, your
    process will slow down because of the single threading.

    Alan.


    On Jun 29, 2009, at 10:09 AM, Parmod Mehta wrote:

    thanks alan! yeah that was the problem. Now I have a general
    question.
    I have two tab delimited input files FileA and FileB of different
    formats.
    I
    want to compare every line of FileA with every record in FileB
    basically
    cross product and compare them using heuristics. Here is my pig
    script

    register my.jar;
    define scoreEval ScoreEval();
    define scoreFilter ScoreFilter();
    storeData = load 'input/FileA.txt';
    amgData = load 'input/FileB.txt';
    crossproduct = cross storeData, amgData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    filteredScore = filter scoreTuple by ScoreFilter($0);
    store filteredScore into 'output';


    crossproduct = cross storeData, amgData (I can use parallel to span
    multiple
    reduce instances); creates the cross product which gets evaluated
    using
    my
    UDF and then filtered using another UDF.

    When running on a hadoop cluster for a block size of for example
    50M input
    splits will be mapped to different data nodes in the cluster. If
    that is
    the
    case then would I not be missing cross products?

    FileA mapped into a1, a2, a3....
    FileB mapped into b1, b2, b3....

    and on one node I only have a1Xb1 and missing a1Xb2 etc. etc.






    On Fri, Jun 26, 2009 at 5:59 PM, Alan Gates <gates@yahoo-inc.com>
    wrote:

    It looks like you are running Java 1.5. You need 1.6 for Pig 0.2.0.
    Alan.


    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0
    hadoop 0.18.0

    I am running into a an exception when I try to my pig script in
    mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO


    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
    $LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO


    org
    .apache
    .pig
    .backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO


    org
    .apache
    .pig
    .backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR


    org
    .apache
    .pig
    .backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR
    org.apache.pig.tools.grunt.Grunt -
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error. java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org
    .apache
    .pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org
    .apache
    .pig
    .backend.executionengine.ExecException.<init>(ExecException.java:
    103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:
    425)
    at


    org
    .apache
    .pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:
    97)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at


    org
    .apache
    .pig
    .impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
    $Reduce.configure(PigMapReduce.java:177)
    at
    org
    .apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:
    58)
    at


    org
    .apache
    .hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
    2209)

    java.lang.Exception: java.lang.NoSuchMethodError:
    java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org
    .apache
    .pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org
    .apache
    .pig
    .backend.executionengine.ExecException.<init>(ExecException.java:
    103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:
    425)
    at


    org
    .apache
    .pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:
    97)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at

    java
    .io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at


    org
    .apache
    .pig
    .impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
    $Reduce.configure(PigMapReduce.java:177)
    at
    org
    .apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:
    58)
    at


    org
    .apache
    .hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
    2209)

    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:
    143)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer
    .MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine.HExecutionEngine.execute(HExecutionEngine.java:
    261)
    at
    org
    .apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:
    695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at
    org
    .apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    529)
    at


    org
    .apache
    .pig
    .tools
    .pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org
    .apache
    .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
    Unable
    to
    store alias 10
    at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at
    org
    .apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:
    529)
    at


    org
    .apache
    .pig
    .tools
    .pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org
    .apache
    .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by: org.apache.pig.backend.executionengine.ExecException:
    ERROR
    2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:
    143)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer
    .MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine.HExecutionEngine.execute(HExecutionEngine.java:
    261)
    at
    org
    .apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:
    695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:
    383)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at


    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more
  • Parmod Mehta at Jul 1, 2009 at 11:36 pm
    This could be done in pig or map/reduce?
    On Wed, Jul 1, 2009 at 12:02 PM, Alan Gates wrote:

    At this point, no. In a cross product every record has to be combined with
    every other record. The simplest way to accomplish this is to send every
    record to a single reducer. One fairly simple improvement would be to
    implement cross by splitting one file across the reducers and having it open
    the entire second file in each reducer and do the cross there. This would
    give some parallelism, though it would mean a lot of network traffic. I'm
    sure there are more sophisticated algorithms available as well.

    Alan.


    On Jul 1, 2009, at 9:54 AM, Parmod Mehta wrote:

    Thanks Alan
    cross product won't even scale even if we use PARALLEL to set the number
    of
    reducers e.g. to 10?

    On Wed, Jul 1, 2009 at 9:27 AM, Alan Gates wrote:

    The cross product is not done in the map. The way pig handles this is is
    loads each input in the various maps, and then sends all of the data to a
    single reduce that does the cross product. The foreach and filter
    operators
    are then applied to the result of the cross in that same reduce.

    There are algorithms to do cross products in parallel, but pig does not
    currently use any of them. This means that as your data gets large, your
    process will slow down because of the single threading.

    Alan.


    On Jun 29, 2009, at 10:09 AM, Parmod Mehta wrote:

    thanks alan! yeah that was the problem. Now I have a general question.
    I have two tab delimited input files FileA and FileB of different
    formats.
    I
    want to compare every line of FileA with every record in FileB basically
    cross product and compare them using heuristics. Here is my pig script

    register my.jar;
    define scoreEval ScoreEval();
    define scoreFilter ScoreFilter();
    storeData = load 'input/FileA.txt';
    amgData = load 'input/FileB.txt';
    crossproduct = cross storeData, amgData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    filteredScore = filter scoreTuple by ScoreFilter($0);
    store filteredScore into 'output';


    crossproduct = cross storeData, amgData (I can use parallel to span
    multiple
    reduce instances); creates the cross product which gets evaluated using
    my
    UDF and then filtered using another UDF.

    When running on a hadoop cluster for a block size of for example 50M
    input
    splits will be mapped to different data nodes in the cluster. If that is
    the
    case then would I not be missing cross products?

    FileA mapped into a1, a2, a3....
    FileB mapped into b1, b2, b3....

    and on one node I only have a1Xb1 and missing a1Xb2 etc. etc.






    On Fri, Jun 26, 2009 at 5:59 PM, Alan Gates <gates@yahoo-inc.com>
    wrote:

    It looks like you are running Java 1.5. You need 1.6 for Pig 0.2.0.
    Alan.


    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0

    hadoop 0.18.0
    I am running into a an exception when I try to my pig script in
    mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR org.apache.pig.tools.grunt.Grunt
    -
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error. java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at


    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at



    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at



    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at



    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at



    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at



    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at



    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at



    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at

    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at

    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at



    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at

    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at



    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at

    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at



    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at

    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at



    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

    java.lang.Exception: java.lang.NoSuchMethodError: java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at


    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at



    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at



    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at



    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at



    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at



    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at



    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at



    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at

    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at

    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at



    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at

    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at



    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at

    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at


    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at



    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at

    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at



    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)

    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at



    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at

    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at

    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at



    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at



    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
    to
    store alias 10
    at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at

    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at



    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at



    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
    2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at



    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at

    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at



    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more

  • Alan Gates at Jul 1, 2009 at 11:43 pm
    Either. Someone could provide a patch to pig so that it does cross
    this way. Someone could also write a job in MR to do this.

    Alan.
    On Jul 1, 2009, at 4:36 PM, Parmod Mehta wrote:

    This could be done in pig or map/reduce?
    On Wed, Jul 1, 2009 at 12:02 PM, Alan Gates wrote:

    At this point, no. In a cross product every record has to be
    combined with
    every other record. The simplest way to accomplish this is to send
    every
    record to a single reducer. One fairly simple improvement would be
    to
    implement cross by splitting one file across the reducers and
    having it open
    the entire second file in each reducer and do the cross there.
    This would
    give some parallelism, though it would mean a lot of network
    traffic. I'm
    sure there are more sophisticated algorithms available as well.

    Alan.


    On Jul 1, 2009, at 9:54 AM, Parmod Mehta wrote:

    Thanks Alan
    cross product won't even scale even if we use PARALLEL to set the
    number
    of
    reducers e.g. to 10?

    On Wed, Jul 1, 2009 at 9:27 AM, Alan Gates <gates@yahoo-inc.com>
    wrote:

    The cross product is not done in the map. The way pig handles
    this is is
    loads each input in the various maps, and then sends all of the
    data to a
    single reduce that does the cross product. The foreach and filter
    operators
    are then applied to the result of the cross in that same reduce.

    There are algorithms to do cross products in parallel, but pig
    does not
    currently use any of them. This means that as your data gets
    large, your
    process will slow down because of the single threading.

    Alan.


    On Jun 29, 2009, at 10:09 AM, Parmod Mehta wrote:

    thanks alan! yeah that was the problem. Now I have a general
    question.
    I have two tab delimited input files FileA and FileB of different
    formats.
    I
    want to compare every line of FileA with every record in FileB
    basically
    cross product and compare them using heuristics. Here is my pig
    script

    register my.jar;
    define scoreEval ScoreEval();
    define scoreFilter ScoreFilter();
    storeData = load 'input/FileA.txt';
    amgData = load 'input/FileB.txt';
    crossproduct = cross storeData, amgData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    filteredScore = filter scoreTuple by ScoreFilter($0);
    store filteredScore into 'output';


    crossproduct = cross storeData, amgData (I can use parallel to
    span
    multiple
    reduce instances); creates the cross product which gets
    evaluated using
    my
    UDF and then filtered using another UDF.

    When running on a hadoop cluster for a block size of for example
    50M
    input
    splits will be mapped to different data nodes in the cluster. If
    that is
    the
    case then would I not be missing cross products?

    FileA mapped into a1, a2, a3....
    FileB mapped into b1, b2, b3....

    and on one node I only have a1Xb1 and missing a1Xb2 etc. etc.






    On Fri, Jun 26, 2009 at 5:59 PM, Alan Gates <gates@yahoo-inc.com>
    wrote:

    It looks like you are running Java 1.5. You need 1.6 for Pig
    0.2.0.
    Alan.


    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0

    hadoop 0.18.0
    I am running into a an exception when I try to my pig script in
    mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO



    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
    $LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments.
    Applications
    should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO



    org
    .apache
    .pig
    .backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO



    org
    .apache
    .pig
    .backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR



    org
    .apache
    .pig
    .backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR
    org.apache.pig.tools.grunt.Grunt
    -
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error.
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at


    org
    .apache
    .pig.backend.BackendException.<init>(BackendException.java:101)
    at



    org
    .apache
    .pig
    .backend
    .executionengine.ExecException.<init>(ExecException.java:103)
    at
    org
    .apache.pig.impl.PigContext.resolveClassName(PigContext.java:
    425)
    at



    org
    .apache
    .pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:
    446)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators
    .POUserFunc.instantiateFunc(POUserFunc.java:97)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
    39)
    at



    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
    39)
    at



    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
    39)
    at



    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at

    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at

    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
    39)
    at



    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at

    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
    39)
    at



    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at

    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at



    org
    .apache
    .pig
    .impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:
    53)
    at



    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
    $Reduce.configure(PigMapReduce.java:177)
    at

    org
    .apache
    .hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at



    org
    .apache
    .hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
    82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker
    $Child.main(TaskTracker.java:2209)

    java.lang.Exception: java.lang.NoSuchMethodError:
    java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not
    found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at


    org
    .apache
    .pig.backend.BackendException.<init>(BackendException.java:101)
    at



    org
    .apache
    .pig
    .backend
    .executionengine.ExecException.<init>(ExecException.java:103)
    at
    org
    .apache.pig.impl.PigContext.resolveClassName(PigContext.java:
    425)
    at



    org
    .apache
    .pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:
    446)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators
    .POUserFunc.instantiateFunc(POUserFunc.java:97)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .physicalLayer
    .expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
    39)
    at



    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
    39)
    at



    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
    39)
    at



    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at

    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at

    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
    39)
    at



    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at

    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at



    sun
    .reflect
    .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
    39)
    at



    sun
    .reflect
    .DelegatingMethodAccessorImpl
    .invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java
    .io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:
    946)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1809)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at

    java
    .io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:
    1908)
    at
    java
    .io.ObjectInputStream.readSerialData(ObjectInputStream.java:
    1832)
    at


    java
    .io
    .ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:
    1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:
    1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:
    348)
    at



    org
    .apache
    .pig
    .impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:
    53)
    at



    org
    .apache
    .pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce
    $Reduce.configure(PigMapReduce.java:177)
    at

    org
    .apache
    .hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at



    org
    .apache
    .hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:
    82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker
    $Child.main(TaskTracker.java:2209)

    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer
    .MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .HExecutionEngine.execute(HExecutionEngine.java:261)
    at

    org
    .apache
    .pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at

    org
    .apache
    .pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at



    org
    .apache
    .pig
    .tools
    .pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    280)
    at



    org
    .apache
    .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:
    99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR
    1002: Unable
    to
    store alias 10
    at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at

    org
    .apache
    .pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at



    org
    .apache
    .pig
    .tools
    .pigscript.parser.PigScriptParser.parse(PigScriptParser.java:
    280)
    at



    org
    .apache
    .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:
    99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by:
    org.apache.pig.backend.executionengine.ExecException: ERROR
    2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer
    .MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .HExecutionEngine.execute(HExecutionEngine.java:261)
    at

    org
    .apache
    .pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:
    383)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:
    284)
    at



    org
    .apache
    .pig
    .backend
    .hadoop
    .executionengine
    .mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more

  • Dmitriy Ryaboy at Jul 1, 2009 at 11:57 pm
    Parmod,
    Alan describes a Map/Reduce algorithm, but anything that can be done with MR
    can be done with Pig if you don't want to be switching between the two (you
    may pay overhead for Pig processing, though).

    To do this in Pig you can simply write a UDF that fetches the smaller file
    and does the cross locally, and call it inside FOREACH GENERATE

    Use the distributed cache to put the smaller file on all your nodes (you can
    do this through pig using DEFINE -- see the docs for details).

    -Dmitriy
    On Wed, Jul 1, 2009 at 4:36 PM, Parmod Mehta wrote:

    This could be done in pig or map/reduce?
    On Wed, Jul 1, 2009 at 12:02 PM, Alan Gates wrote:

    At this point, no. In a cross product every record has to be combined with
    every other record. The simplest way to accomplish this is to send every
    record to a single reducer. One fairly simple improvement would be to
    implement cross by splitting one file across the reducers and having it open
    the entire second file in each reducer and do the cross there. This would
    give some parallelism, though it would mean a lot of network traffic. I'm
    sure there are more sophisticated algorithms available as well.

    Alan.


    On Jul 1, 2009, at 9:54 AM, Parmod Mehta wrote:

    Thanks Alan
    cross product won't even scale even if we use PARALLEL to set the
    number
    of
    reducers e.g. to 10?

    On Wed, Jul 1, 2009 at 9:27 AM, Alan Gates wrote:

    The cross product is not done in the map. The way pig handles this is
    is
    loads each input in the various maps, and then sends all of the data to
    a
    single reduce that does the cross product. The foreach and filter
    operators
    are then applied to the result of the cross in that same reduce.

    There are algorithms to do cross products in parallel, but pig does not
    currently use any of them. This means that as your data gets large,
    your
    process will slow down because of the single threading.

    Alan.


    On Jun 29, 2009, at 10:09 AM, Parmod Mehta wrote:

    thanks alan! yeah that was the problem. Now I have a general question.
    I have two tab delimited input files FileA and FileB of different
    formats.
    I
    want to compare every line of FileA with every record in FileB
    basically
    cross product and compare them using heuristics. Here is my pig script

    register my.jar;
    define scoreEval ScoreEval();
    define scoreFilter ScoreFilter();
    storeData = load 'input/FileA.txt';
    amgData = load 'input/FileB.txt';
    crossproduct = cross storeData, amgData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    filteredScore = filter scoreTuple by ScoreFilter($0);
    store filteredScore into 'output';


    crossproduct = cross storeData, amgData (I can use parallel to span
    multiple
    reduce instances); creates the cross product which gets evaluated
    using
    my
    UDF and then filtered using another UDF.

    When running on a hadoop cluster for a block size of for example 50M
    input
    splits will be mapped to different data nodes in the cluster. If that
    is
    the
    case then would I not be missing cross products?

    FileA mapped into a1, a2, a3....
    FileB mapped into b1, b2, b3....

    and on one node I only have a1Xb1 and missing a1Xb2 etc. etc.






    On Fri, Jun 26, 2009 at 5:59 PM, Alan Gates <gates@yahoo-inc.com>
    wrote:

    It looks like you are running Java 1.5. You need 1.6 for Pig 0.2.0.
    Alan.


    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0

    hadoop 0.18.0
    I am running into a an exception when I try to my pig script in
    mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR
    org.apache.pig.tools.grunt.Grunt
    -
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error. java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    java.lang.Exception: java.lang.NoSuchMethodError:
    java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
    Unable
    to
    store alias 10
    at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by: org.apache.pig.backend.executionengine.ExecException:
    ERROR
    2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more

  • Parmod Mehta at Jul 2, 2009 at 12:11 am
    Basically this is what you saying

    register myJar.jar;
    define cross Cross();
    define evaluate Evaluate();
    biggerData = load 'C:/pigscripts/biggerfile.txt';
    crossedData = foreach biggerData {
    ------
    generate cross();
    }
    evaluatedData = foreach crossedData generate Evaluate(*);
    dump evaluatedData;

    In the Cross UDF I load the smaller file and put the smaller file on the
    distributed cache. do a cross between the tupple from the biggerData and
    smaller file and evaluate the crossedData? Or this could be done inside the
    Cross UDF?
    On Wed, Jul 1, 2009 at 4:57 PM, Dmitriy Ryaboy wrote:

    Parmod,
    Alan describes a Map/Reduce algorithm, but anything that can be done with
    MR
    can be done with Pig if you don't want to be switching between the two (you
    may pay overhead for Pig processing, though).

    To do this in Pig you can simply write a UDF that fetches the smaller file
    and does the cross locally, and call it inside FOREACH GENERATE

    Use the distributed cache to put the smaller file on all your nodes (you
    can
    do this through pig using DEFINE -- see the docs for details).

    -Dmitriy
    On Wed, Jul 1, 2009 at 4:36 PM, Parmod Mehta wrote:

    This could be done in pig or map/reduce?
    On Wed, Jul 1, 2009 at 12:02 PM, Alan Gates wrote:

    At this point, no. In a cross product every record has to be combined with
    every other record. The simplest way to accomplish this is to send
    every
    record to a single reducer. One fairly simple improvement would be to
    implement cross by splitting one file across the reducers and having it open
    the entire second file in each reducer and do the cross there. This would
    give some parallelism, though it would mean a lot of network traffic. I'm
    sure there are more sophisticated algorithms available as well.

    Alan.


    On Jul 1, 2009, at 9:54 AM, Parmod Mehta wrote:

    Thanks Alan
    cross product won't even scale even if we use PARALLEL to set the
    number
    of
    reducers e.g. to 10?

    On Wed, Jul 1, 2009 at 9:27 AM, Alan Gates wrote:

    The cross product is not done in the map. The way pig handles this
    is
    is
    loads each input in the various maps, and then sends all of the data
    to
    a
    single reduce that does the cross product. The foreach and filter
    operators
    are then applied to the result of the cross in that same reduce.

    There are algorithms to do cross products in parallel, but pig does
    not
    currently use any of them. This means that as your data gets large,
    your
    process will slow down because of the single threading.

    Alan.


    On Jun 29, 2009, at 10:09 AM, Parmod Mehta wrote:

    thanks alan! yeah that was the problem. Now I have a general
    question.
    I have two tab delimited input files FileA and FileB of different
    formats.
    I
    want to compare every line of FileA with every record in FileB
    basically
    cross product and compare them using heuristics. Here is my pig
    script
    register my.jar;
    define scoreEval ScoreEval();
    define scoreFilter ScoreFilter();
    storeData = load 'input/FileA.txt';
    amgData = load 'input/FileB.txt';
    crossproduct = cross storeData, amgData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    filteredScore = filter scoreTuple by ScoreFilter($0);
    store filteredScore into 'output';


    crossproduct = cross storeData, amgData (I can use parallel to span
    multiple
    reduce instances); creates the cross product which gets evaluated
    using
    my
    UDF and then filtered using another UDF.

    When running on a hadoop cluster for a block size of for example 50M
    input
    splits will be mapped to different data nodes in the cluster. If
    that
    is
    the
    case then would I not be missing cross products?

    FileA mapped into a1, a2, a3....
    FileB mapped into b1, b2, b3....

    and on one node I only have a1Xb1 and missing a1Xb2 etc. etc.






    On Fri, Jun 26, 2009 at 5:59 PM, Alan Gates <gates@yahoo-inc.com>
    wrote:

    It looks like you are running Java 1.5. You need 1.6 for Pig 0.2.0.
    Alan.


    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0

    hadoop 0.18.0
    I am running into a an exception when I try to my pig script in
    mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments. Applications
    should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR
    org.apache.pig.tools.grunt.Grunt
    -
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error. java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    java.lang.Exception: java.lang.NoSuchMethodError:
    java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
    Unable
    to
    store alias 10
    at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by: org.apache.pig.backend.executionengine.ExecException:
    ERROR
    2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more

  • Dmitriy Ryaboy at Jul 2, 2009 at 12:19 am
    You want to put things in the distributed cache before you get to the tasks,
    so you want to use a DEFINE for that.
    You should be able to do the cross and evaluate in a single foreach, and
    possibly a single UDF.
    You probably don't want to call your function "cross" to avoid naming
    ambiguity.

    -D
    On Wed, Jul 1, 2009 at 5:11 PM, Parmod Mehta wrote:

    Basically this is what you saying

    register myJar.jar;
    define cross Cross();
    define evaluate Evaluate();
    biggerData = load 'C:/pigscripts/biggerfile.txt';
    crossedData = foreach biggerData {
    ------
    generate cross();
    }
    evaluatedData = foreach crossedData generate Evaluate(*);
    dump evaluatedData;

    In the Cross UDF I load the smaller file and put the smaller file on the
    distributed cache. do a cross between the tupple from the biggerData and
    smaller file and evaluate the crossedData? Or this could be done inside the
    Cross UDF?

    On Wed, Jul 1, 2009 at 4:57 PM, Dmitriy Ryaboy <dvryaboy@cloudera.com
    wrote:
    Parmod,
    Alan describes a Map/Reduce algorithm, but anything that can be done with
    MR
    can be done with Pig if you don't want to be switching between the two (you
    may pay overhead for Pig processing, though).

    To do this in Pig you can simply write a UDF that fetches the smaller file
    and does the cross locally, and call it inside FOREACH GENERATE

    Use the distributed cache to put the smaller file on all your nodes (you
    can
    do this through pig using DEFINE -- see the docs for details).

    -Dmitriy

    On Wed, Jul 1, 2009 at 4:36 PM, Parmod Mehta <parmod.mehta@gmail.com>
    wrote:
    This could be done in pig or map/reduce?
    On Wed, Jul 1, 2009 at 12:02 PM, Alan Gates wrote:

    At this point, no. In a cross product every record has to be
    combined
    with
    every other record. The simplest way to accomplish this is to send
    every
    record to a single reducer. One fairly simple improvement would be
    to
    implement cross by splitting one file across the reducers and having
    it
    open
    the entire second file in each reducer and do the cross there. This would
    give some parallelism, though it would mean a lot of network traffic. I'm
    sure there are more sophisticated algorithms available as well.

    Alan.


    On Jul 1, 2009, at 9:54 AM, Parmod Mehta wrote:

    Thanks Alan
    cross product won't even scale even if we use PARALLEL to set the
    number
    of
    reducers e.g. to 10?

    On Wed, Jul 1, 2009 at 9:27 AM, Alan Gates <gates@yahoo-inc.com>
    wrote:
    The cross product is not done in the map. The way pig handles this
    is
    is
    loads each input in the various maps, and then sends all of the
    data
    to
    a
    single reduce that does the cross product. The foreach and filter
    operators
    are then applied to the result of the cross in that same reduce.

    There are algorithms to do cross products in parallel, but pig does
    not
    currently use any of them. This means that as your data gets
    large,
    your
    process will slow down because of the single threading.

    Alan.


    On Jun 29, 2009, at 10:09 AM, Parmod Mehta wrote:

    thanks alan! yeah that was the problem. Now I have a general
    question.
    I have two tab delimited input files FileA and FileB of different
    formats.
    I
    want to compare every line of FileA with every record in FileB
    basically
    cross product and compare them using heuristics. Here is my pig
    script
    register my.jar;
    define scoreEval ScoreEval();
    define scoreFilter ScoreFilter();
    storeData = load 'input/FileA.txt';
    amgData = load 'input/FileB.txt';
    crossproduct = cross storeData, amgData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    filteredScore = filter scoreTuple by ScoreFilter($0);
    store filteredScore into 'output';


    crossproduct = cross storeData, amgData (I can use parallel to
    span
    multiple
    reduce instances); creates the cross product which gets evaluated
    using
    my
    UDF and then filtered using another UDF.

    When running on a hadoop cluster for a block size of for example
    50M
    input
    splits will be mapped to different data nodes in the cluster. If
    that
    is
    the
    case then would I not be missing cross products?

    FileA mapped into a1, a2, a3....
    FileB mapped into b1, b2, b3....

    and on one node I only have a1Xb1 and missing a1Xb2 etc. etc.






    On Fri, Jun 26, 2009 at 5:59 PM, Alan Gates <gates@yahoo-inc.com>
    wrote:

    It looks like you are running Java 1.5. You need 1.6 for Pig
    0.2.0.
    Alan.


    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0

    hadoop 0.18.0
    I am running into a an exception when I try to my pig script in
    mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments.
    Applications
    should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR
    org.apache.pig.tools.grunt.Grunt
    -
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error.
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    java.lang.Exception: java.lang.NoSuchMethodError:
    java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not
    found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
    Unable
    to
    store alias 10
    at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by: org.apache.pig.backend.executionengine.ExecException:
    ERROR
    2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more

  • Parmod Mehta at Jul 2, 2009 at 7:39 pm
    How do you get access to the cached file in a UDF? Using
    org.apache.hadoop.filecache.DistributedCache? If so then how can I get
    access to JobConf instance for DistributedCache.getCacheFiles(conf)

    Also do I need to change hadoop config

    mapred.cache.archives=<location>
    mapred.create.symlink=yes

    On Wed, Jul 1, 2009 at 5:19 PM, Dmitriy Ryaboy wrote:

    You want to put things in the distributed cache before you get to the
    tasks,
    so you want to use a DEFINE for that.
    You should be able to do the cross and evaluate in a single foreach, and
    possibly a single UDF.
    You probably don't want to call your function "cross" to avoid naming
    ambiguity.

    -D
    On Wed, Jul 1, 2009 at 5:11 PM, Parmod Mehta wrote:

    Basically this is what you saying

    register myJar.jar;
    define cross Cross();
    define evaluate Evaluate();
    biggerData = load 'C:/pigscripts/biggerfile.txt';
    crossedData = foreach biggerData {
    ------
    generate cross();
    }
    evaluatedData = foreach crossedData generate Evaluate(*);
    dump evaluatedData;

    In the Cross UDF I load the smaller file and put the smaller file on the
    distributed cache. do a cross between the tupple from the biggerData and
    smaller file and evaluate the crossedData? Or this could be done inside the
    Cross UDF?

    On Wed, Jul 1, 2009 at 4:57 PM, Dmitriy Ryaboy <dvryaboy@cloudera.com
    wrote:
    Parmod,
    Alan describes a Map/Reduce algorithm, but anything that can be done
    with
    MR
    can be done with Pig if you don't want to be switching between the two (you
    may pay overhead for Pig processing, though).

    To do this in Pig you can simply write a UDF that fetches the smaller file
    and does the cross locally, and call it inside FOREACH GENERATE

    Use the distributed cache to put the smaller file on all your nodes
    (you
    can
    do this through pig using DEFINE -- see the docs for details).

    -Dmitriy

    On Wed, Jul 1, 2009 at 4:36 PM, Parmod Mehta <parmod.mehta@gmail.com>
    wrote:
    This could be done in pig or map/reduce?

    On Wed, Jul 1, 2009 at 12:02 PM, Alan Gates <gates@yahoo-inc.com>
    wrote:
    At this point, no. In a cross product every record has to be
    combined
    with
    every other record. The simplest way to accomplish this is to send
    every
    record to a single reducer. One fairly simple improvement would be
    to
    implement cross by splitting one file across the reducers and
    having
    it
    open
    the entire second file in each reducer and do the cross there.
    This
    would
    give some parallelism, though it would mean a lot of network
    traffic.
    I'm
    sure there are more sophisticated algorithms available as well.

    Alan.


    On Jul 1, 2009, at 9:54 AM, Parmod Mehta wrote:

    Thanks Alan
    cross product won't even scale even if we use PARALLEL to set the
    number
    of
    reducers e.g. to 10?

    On Wed, Jul 1, 2009 at 9:27 AM, Alan Gates <gates@yahoo-inc.com>
    wrote:
    The cross product is not done in the map. The way pig handles
    this
    is
    is
    loads each input in the various maps, and then sends all of the
    data
    to
    a
    single reduce that does the cross product. The foreach and
    filter
    operators
    are then applied to the result of the cross in that same reduce.

    There are algorithms to do cross products in parallel, but pig
    does
    not
    currently use any of them. This means that as your data gets
    large,
    your
    process will slow down because of the single threading.

    Alan.


    On Jun 29, 2009, at 10:09 AM, Parmod Mehta wrote:

    thanks alan! yeah that was the problem. Now I have a general
    question.
    I have two tab delimited input files FileA and FileB of
    different
    formats.
    I
    want to compare every line of FileA with every record in FileB
    basically
    cross product and compare them using heuristics. Here is my pig
    script
    register my.jar;
    define scoreEval ScoreEval();
    define scoreFilter ScoreFilter();
    storeData = load 'input/FileA.txt';
    amgData = load 'input/FileB.txt';
    crossproduct = cross storeData, amgData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    filteredScore = filter scoreTuple by ScoreFilter($0);
    store filteredScore into 'output';


    crossproduct = cross storeData, amgData (I can use parallel to
    span
    multiple
    reduce instances); creates the cross product which gets
    evaluated
    using
    my
    UDF and then filtered using another UDF.

    When running on a hadoop cluster for a block size of for example
    50M
    input
    splits will be mapped to different data nodes in the cluster. If
    that
    is
    the
    case then would I not be missing cross products?

    FileA mapped into a1, a2, a3....
    FileB mapped into b1, b2, b3....

    and on one node I only have a1Xb1 and missing a1Xb2 etc. etc.






    On Fri, Jun 26, 2009 at 5:59 PM, Alan Gates <
    gates@yahoo-inc.com>
    wrote:

    It looks like you are running Java 1.5. You need 1.6 for Pig
    0.2.0.
    Alan.


    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0

    hadoop 0.18.0
    I am running into a an exception when I try to my pig script
    in
    mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments.
    Applications
    should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR
    org.apache.pig.tools.grunt.Grunt
    -
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error.
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    java.lang.Exception: java.lang.NoSuchMethodError:
    java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not
    found
    at org.apache.pig.PigException.<init>(PigException.java:244)
    at org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR
    1002:
    Unable
    to
    store alias 10
    at org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by:
    org.apache.pig.backend.executionengine.ExecException:
    ERROR
    2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more

  • Dmitriy Ryaboy at Jul 2, 2009 at 8:19 pm
    If mapred.create.symlink is on, the file you put in the distributed cache
    will be available in the task working directory by the name you provide
    after the # sign; it can be picked up from there by your UDF or streaming
    command.

    To keep the UDF general, you can provide the file name to it in a
    constructor; the constructor would get called when you issue a define on
    your UDF, like this:

    DEFINE myUdf my.package.pig.udfs.MyUDF('this_is_a_file.txt')
    CACHE('hdfs://path/to/file#this_is_a_file.txt')

    I don't know of a good way to get at a JobConf in a UDF.

    Hope this helps,
    -Dmitriy
    On Thu, Jul 2, 2009 at 12:38 PM, Parmod Mehta wrote:

    How do you get access to the cached file in a UDF? Using
    org.apache.hadoop.filecache.DistributedCache? If so then how can I get
    access to JobConf instance for DistributedCache.getCacheFiles(conf)

    Also do I need to change hadoop config

    mapred.cache.archives=<location>
    mapred.create.symlink=yes


    On Wed, Jul 1, 2009 at 5:19 PM, Dmitriy Ryaboy <dvryaboy@cloudera.com
    wrote:
    You want to put things in the distributed cache before you get to the
    tasks,
    so you want to use a DEFINE for that.
    You should be able to do the cross and evaluate in a single foreach, and
    possibly a single UDF.
    You probably don't want to call your function "cross" to avoid naming
    ambiguity.

    -D

    On Wed, Jul 1, 2009 at 5:11 PM, Parmod Mehta <parmod.mehta@gmail.com>
    wrote:
    Basically this is what you saying

    register myJar.jar;
    define cross Cross();
    define evaluate Evaluate();
    biggerData = load 'C:/pigscripts/biggerfile.txt';
    crossedData = foreach biggerData {
    ------
    generate cross();
    }
    evaluatedData = foreach crossedData generate Evaluate(*);
    dump evaluatedData;

    In the Cross UDF I load the smaller file and put the smaller file on
    the
    distributed cache. do a cross between the tupple from the biggerData
    and
    smaller file and evaluate the crossedData? Or this could be done inside the
    Cross UDF?

    On Wed, Jul 1, 2009 at 4:57 PM, Dmitriy Ryaboy <dvryaboy@cloudera.com
    wrote:
    Parmod,
    Alan describes a Map/Reduce algorithm, but anything that can be done
    with
    MR
    can be done with Pig if you don't want to be switching between the
    two
    (you
    may pay overhead for Pig processing, though).

    To do this in Pig you can simply write a UDF that fetches the smaller file
    and does the cross locally, and call it inside FOREACH GENERATE

    Use the distributed cache to put the smaller file on all your nodes
    (you
    can
    do this through pig using DEFINE -- see the docs for details).

    -Dmitriy

    On Wed, Jul 1, 2009 at 4:36 PM, Parmod Mehta <parmod.mehta@gmail.com
    wrote:
    This could be done in pig or map/reduce?

    On Wed, Jul 1, 2009 at 12:02 PM, Alan Gates <gates@yahoo-inc.com>
    wrote:
    At this point, no. In a cross product every record has to be
    combined
    with
    every other record. The simplest way to accomplish this is to
    send
    every
    record to a single reducer. One fairly simple improvement would
    be
    to
    implement cross by splitting one file across the reducers and
    having
    it
    open
    the entire second file in each reducer and do the cross there.
    This
    would
    give some parallelism, though it would mean a lot of network
    traffic.
    I'm
    sure there are more sophisticated algorithms available as well.

    Alan.


    On Jul 1, 2009, at 9:54 AM, Parmod Mehta wrote:

    Thanks Alan
    cross product won't even scale even if we use PARALLEL to set
    the
    number
    of
    reducers e.g. to 10?

    On Wed, Jul 1, 2009 at 9:27 AM, Alan Gates <gates@yahoo-inc.com
    wrote:
    The cross product is not done in the map. The way pig handles
    this
    is
    is
    loads each input in the various maps, and then sends all of the
    data
    to
    a
    single reduce that does the cross product. The foreach and
    filter
    operators
    are then applied to the result of the cross in that same
    reduce.
    There are algorithms to do cross products in parallel, but pig
    does
    not
    currently use any of them. This means that as your data gets
    large,
    your
    process will slow down because of the single threading.

    Alan.


    On Jun 29, 2009, at 10:09 AM, Parmod Mehta wrote:

    thanks alan! yeah that was the problem. Now I have a general
    question.
    I have two tab delimited input files FileA and FileB of
    different
    formats.
    I
    want to compare every line of FileA with every record in FileB
    basically
    cross product and compare them using heuristics. Here is my
    pig
    script
    register my.jar;
    define scoreEval ScoreEval();
    define scoreFilter ScoreFilter();
    storeData = load 'input/FileA.txt';
    amgData = load 'input/FileB.txt';
    crossproduct = cross storeData, amgData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    filteredScore = filter scoreTuple by ScoreFilter($0);
    store filteredScore into 'output';


    crossproduct = cross storeData, amgData (I can use parallel to
    span
    multiple
    reduce instances); creates the cross product which gets
    evaluated
    using
    my
    UDF and then filtered using another UDF.

    When running on a hadoop cluster for a block size of for
    example
    50M
    input
    splits will be mapped to different data nodes in the cluster.
    If
    that
    is
    the
    case then would I not be missing cross products?

    FileA mapped into a1, a2, a3....
    FileB mapped into b1, b2, b3....

    and on one node I only have a1Xb1 and missing a1Xb2 etc. etc.






    On Fri, Jun 26, 2009 at 5:59 PM, Alan Gates <
    gates@yahoo-inc.com>
    wrote:

    It looks like you are running Java 1.5. You need 1.6 for Pig
    0.2.0.
    Alan.


    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0

    hadoop 0.18.0
    I am running into a an exception when I try to my pig script
    in
    mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at: hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments.
    Applications
    should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR
    org.apache.pig.tools.grunt.Grunt
    -
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error.
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at
    org.apache.pig.PigException.<init>(PigException.java:244)
    at
    org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    java.lang.Exception: java.lang.NoSuchMethodError:
    java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not
    found
    at
    org.apache.pig.PigException.<init>(PigException.java:244)
    at
    org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR
    1002:
    Unable
    to
    store alias 10
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by:
    org.apache.pig.backend.executionengine.ExecException:
    ERROR
    2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more

  • Parmod Mehta at Jul 6, 2009 at 9:21 pm
    DEFINE A `smallFile.txt`
    cache('hdfs://localhost:9000/user/myuser/input/smallFile.txt#smallFile.txt');
    define crossProduct CrossProduct('smallFile.txt');

    I added this is my hadoop-site.xml

    <property>
    <name>mapred.create.symlink</name>
    <value>yes</value>
    </property>

    in my pig script logs I see

    java.lang.Exception: org.apache.pig.backend.executionengine.ExecException:
    ERROR 2078: Caught error from UDF: CrossProduct [Caught exception processing
    input row [smallFile.txt (The system cannot find the file specified)]]
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:229)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:272)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:198)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:217)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
    at
    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    Caused by: java.io.IOException: Caught exception processing input row
    [smallFile.txt (The system cannot find the file specified)]
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.(FileInputStream.java:66)
    at CrossProduct.exec(Unknown Source)
    at CrossProduct.exec(Unknown Source)
    at
    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201)
    ... 9 more
    Caused by: java.io.FileNotFoundException: smallFile.txt (The system cannot
    find the file specified)
    ... 15 more


    Here is my udf code. don't pay attention to the logic though. the logic is
    not coded yet.

    public class CrossProduct extends EvalFunc<String> {

    private String fileName;

    public CrossProduct(String fileName){
    this.fileName = fileName;
    }


    @Override
    public String exec(Tuple input) throws IOException {

    // if nothing to compare then score is 0
    if(input==null || input.size()==0){
    return "0";
    }

    try {

    StringBuilder builder = new StringBuilder();

    builder.append(input.toString());


    InputStream in = new FileInputStream(fileName);
    BufferedReader d = new BufferedReader(new
    InputStreamReader(in));
    String line;
    while((line=d.readLine())!=null){
    builder.append(line);
    }

    return builder.toString();
    }catch(Exception e){
    throw WrappedIOException.wrap("Caught exception processing
    input row ", e);
    }
    }

    On Thu, Jul 2, 2009 at 1:19 PM, Dmitriy Ryaboy wrote:

    If mapred.create.symlink is on, the file you put in the distributed cache
    will be available in the task working directory by the name you provide
    after the # sign; it can be picked up from there by your UDF or streaming
    command.

    To keep the UDF general, you can provide the file name to it in a
    constructor; the constructor would get called when you issue a define on
    your UDF, like this:

    DEFINE myUdf my.package.pig.udfs.MyUDF('this_is_a_file.txt')
    CACHE('hdfs://path/to/file#this_is_a_file.txt')

    I don't know of a good way to get at a JobConf in a UDF.

    Hope this helps,
    -Dmitriy

    On Thu, Jul 2, 2009 at 12:38 PM, Parmod Mehta <parmod.mehta@gmail.com
    wrote:
    How do you get access to the cached file in a UDF? Using
    org.apache.hadoop.filecache.DistributedCache? If so then how can I get
    access to JobConf instance for DistributedCache.getCacheFiles(conf)

    Also do I need to change hadoop config

    mapred.cache.archives=<location>
    mapred.create.symlink=yes


    On Wed, Jul 1, 2009 at 5:19 PM, Dmitriy Ryaboy <dvryaboy@cloudera.com
    wrote:
    You want to put things in the distributed cache before you get to the
    tasks,
    so you want to use a DEFINE for that.
    You should be able to do the cross and evaluate in a single foreach,
    and
    possibly a single UDF.
    You probably don't want to call your function "cross" to avoid naming
    ambiguity.

    -D

    On Wed, Jul 1, 2009 at 5:11 PM, Parmod Mehta <parmod.mehta@gmail.com>
    wrote:
    Basically this is what you saying

    register myJar.jar;
    define cross Cross();
    define evaluate Evaluate();
    biggerData = load 'C:/pigscripts/biggerfile.txt';
    crossedData = foreach biggerData {
    ------
    generate cross();
    }
    evaluatedData = foreach crossedData generate Evaluate(*);
    dump evaluatedData;

    In the Cross UDF I load the smaller file and put the smaller file on
    the
    distributed cache. do a cross between the tupple from the biggerData
    and
    smaller file and evaluate the crossedData? Or this could be done
    inside
    the
    Cross UDF?

    On Wed, Jul 1, 2009 at 4:57 PM, Dmitriy Ryaboy <
    dvryaboy@cloudera.com
    wrote:
    Parmod,
    Alan describes a Map/Reduce algorithm, but anything that can be
    done
    with
    MR
    can be done with Pig if you don't want to be switching between the
    two
    (you
    may pay overhead for Pig processing, though).

    To do this in Pig you can simply write a UDF that fetches the
    smaller
    file
    and does the cross locally, and call it inside FOREACH GENERATE

    Use the distributed cache to put the smaller file on all your nodes
    (you
    can
    do this through pig using DEFINE -- see the docs for details).

    -Dmitriy

    On Wed, Jul 1, 2009 at 4:36 PM, Parmod Mehta <
    parmod.mehta@gmail.com
    wrote:
    This could be done in pig or map/reduce?

    On Wed, Jul 1, 2009 at 12:02 PM, Alan Gates <gates@yahoo-inc.com
    wrote:
    At this point, no. In a cross product every record has to be
    combined
    with
    every other record. The simplest way to accomplish this is to
    send
    every
    record to a single reducer. One fairly simple improvement
    would
    be
    to
    implement cross by splitting one file across the reducers and
    having
    it
    open
    the entire second file in each reducer and do the cross there.
    This
    would
    give some parallelism, though it would mean a lot of network
    traffic.
    I'm
    sure there are more sophisticated algorithms available as well.

    Alan.


    On Jul 1, 2009, at 9:54 AM, Parmod Mehta wrote:

    Thanks Alan
    cross product won't even scale even if we use PARALLEL to set
    the
    number
    of
    reducers e.g. to 10?

    On Wed, Jul 1, 2009 at 9:27 AM, Alan Gates <
    gates@yahoo-inc.com
    wrote:
    The cross product is not done in the map. The way pig
    handles
    this
    is
    is
    loads each input in the various maps, and then sends all of
    the
    data
    to
    a
    single reduce that does the cross product. The foreach and
    filter
    operators
    are then applied to the result of the cross in that same
    reduce.
    There are algorithms to do cross products in parallel, but
    pig
    does
    not
    currently use any of them. This means that as your data gets
    large,
    your
    process will slow down because of the single threading.

    Alan.


    On Jun 29, 2009, at 10:09 AM, Parmod Mehta wrote:

    thanks alan! yeah that was the problem. Now I have a general
    question.
    I have two tab delimited input files FileA and FileB of
    different
    formats.
    I
    want to compare every line of FileA with every record in
    FileB
    basically
    cross product and compare them using heuristics. Here is my
    pig
    script
    register my.jar;
    define scoreEval ScoreEval();
    define scoreFilter ScoreFilter();
    storeData = load 'input/FileA.txt';
    amgData = load 'input/FileB.txt';
    crossproduct = cross storeData, amgData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    filteredScore = filter scoreTuple by ScoreFilter($0);
    store filteredScore into 'output';


    crossproduct = cross storeData, amgData (I can use parallel
    to
    span
    multiple
    reduce instances); creates the cross product which gets
    evaluated
    using
    my
    UDF and then filtered using another UDF.

    When running on a hadoop cluster for a block size of for
    example
    50M
    input
    splits will be mapped to different data nodes in the
    cluster.
    If
    that
    is
    the
    case then would I not be missing cross products?

    FileA mapped into a1, a2, a3....
    FileB mapped into b1, b2, b3....

    and on one node I only have a1Xb1 and missing a1Xb2 etc.
    etc.





    On Fri, Jun 26, 2009 at 5:59 PM, Alan Gates <
    gates@yahoo-inc.com>
    wrote:

    It looks like you are running Java 1.5. You need 1.6 for
    Pig
    0.2.0.
    Alan.


    On Jun 26, 2009, at 4:39 PM, Parmod Mehta wrote:

    pig 0.20.0

    hadoop 0.18.0
    I am running into a an exception when I try to my pig
    script
    in
    mapreduce
    mode which incidentally works fine in the local mode

    register my.jar;
    define scoreEval ScoreEval();
    aData = load 'input/a1.txt';
    bData = load 'input/b1.txt';
    crossproduct = cross aData , bData;
    scoreTuple = foreach crossproduct generate scoreEval(*);
    store scoreTuple into 'output';

    When running this in hadoop mode

    pig testCrossProduct.pig

    I run into this exception trace. Any clues? thanks

    2009-06-26 16:21:27,328 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    - Connecting to hadoop file system at:
    hdfs://localhost:9000
    2009-06-26 16:21:28,046 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
    -
    Connecting
    to map-reduce job tracker at: localhost:9001
    2009-06-26 16:21:28,406 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
    - Rewrite: POPackage->POForEach to POJoinPackage
    2009-06-26 16:21:29,312 [Thread-7] WARN
    org.apache.hadoop.mapred.JobClient
    - Use GenericOptionsParser for parsing the arguments.
    Applications
    should
    implement Tool for the same.
    2009-06-26 16:21:34,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 0% complete
    2009-06-26 16:21:49,312 [main] INFO


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - 50% complete
    2009-06-26 16:22:09,312 [main] ERROR


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
    - Map reduce job failed
    2009-06-26 16:22:09,328 [main] ERROR
    org.apache.pig.tools.grunt.Grunt
    -
    ERROR 2997: Unable to recreate exception from backed
    error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    Details at logfile: C:\pigscripts\pig_1246058487187.log


    Exception trace:

    ERROR 2998: Unhandled internal error.
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at
    org.apache.pig.PigException.<init>(PigException.java:244)
    at
    org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    java.lang.Exception: java.lang.NoSuchMethodError:
    java.io.IOException:
    method <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not
    found
    at
    org.apache.pig.PigException.<init>(PigException.java:244)
    at
    org.apache.pig.PigException.<init>(PigException.java:191)
    at

    org.apache.pig.backend.BackendException.<init>(BackendException.java:101)
    at


    org.apache.pig.backend.executionengine.ExecException.<init>(ExecException.java:103)
    at
    org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:425)
    at


    org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:446)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.instantiateFunc(POUserFunc.java:97)
    at


    org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.readObject(POUserFunc.java:410)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.ArrayList.readObject(ArrayList.java:591)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at java.util.HashMap.readObject(HashMap.java:1067)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
    Method)
    at


    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at


    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:585)
    at
    java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:946)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1809)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1908)
    at
    java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1832)
    at

    java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1719)
    at
    java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1305)
    at
    java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
    at


    org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.configure(PigMapReduce.java:177)
    at
    org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at


    org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
    at
    org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:240)
    at
    org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2209)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:191)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    ERROR 2997: Unable to recreate exception from backed
    error:
    java.lang.NoSuchMethodError: java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V not found
    org.apache.pig.impl.logicalLayer.FrontendException: ERROR
    1002:
    Unable
    to
    store alias 10
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:295)
    at
    org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
    at


    org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
    at


    org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:82)
    at org.apache.pig.Main.main(Main.java:354)
    Caused by:
    org.apache.pig.backend.executionengine.ExecException:
    ERROR
    2997:
    Unable to recreate exception from backed error:
    java.lang.NoSuchMethodError:
    java.io.IOException: method
    <init>(Ljava/lang/String;Ljava/lang/Throwable;)V
    not found
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:195)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:143)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:133)
    at


    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
    at
    org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:695)
    at org.apache.pig.PigServer.execute(PigServer.java:686)
    at
    org.apache.pig.PigServer.registerQuery(PigServer.java:291)
    ... 5 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(Launcher.java:546)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(Launcher.java:383)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromString(Launcher.java:284)
    at


    org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:187)
    ... 11 more

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 26, '09 at 11:39p
activeJul 6, '09 at 9:21p
posts15
users3
websitepig.apache.org

People

Translate

site design / logo © 2022 Grokbase