I am writing an M-R code using MapRunnable interface.
The input format is SequenceFileInputFormat.
Each Sequence-record contains a key-value pair of type <Text key,Text value> (Text: org.apache.hadoop.io.Text)
The "key" Text object contains small string where as "value" Text object contains large XML string.
"value" Text object can contain the data as large as 100 to 300 MB.
I convert the "value" Text object to String using value.toString() method.
It goes OutOfMemory for large data in "value" object.
Is there any other way for converting large Text object to java String object?
Alternatively, can I limit the number of records in RecordReader object coming to run method so that total memory utilization would be limited?
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Grokbase › Groups › Hadoop › common-user › October 2009