|
Mohamed Riadh Trad |
at Jul 1, 2010 at 2:33 pm
|
⇧ |
| |
Hi,
Has any one addressed the org.apache.hadoop.mapreduce.lib.input.TextInputFormat compatibility with hadoop streaming?
The new API generates the following exception when lunching pipes jobs with org.apache.hadoop.mapreduce.lib.input.TextInputFormat Input Format instead of org.apache.hadoop.mapred.TextInputFormat.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: class org.apache.hadoop.mapreduce.lib.input.TextInputFormat not org.apache.hadoop.mapred.InputFormat
My problem with the deprecated classes stands in mapred.min.split.size and the Map Tasks number.
I need to generate N Maps on splits of approximately a same size. However, by fixing the mapred.min.split.size to 20MB I get splits of 6 to 64 MB.
Any suggestions?
Trad Mohamed Riadh, M.Sc, Ing.
PhD. student
INRIA-TELECOM PARISTECH
Office: 11-15
Phone: (33)-1 39 63 59 33
Fax: (33)-1 39 63 56 74
Email: Riadh.Trad(a)inria.fr
Home page:
http://www-rocq.inria.fr/who/Mohamed.Trad/