I am trying to create lucene indexes using the "contrib/index/hadoop-0.19.1-index.jar" provided by Hadoop.
Since it can be executed in map-reduced manner, I expect it to process large data very fast.
It processes small amount of data (< 5MB) very quickly.
Now 5 GB of input data is provided; and the fun starts :)
It goes out of memory. I increased the parameter "mapred.child.java.opts" in the file "hadoop-default.xml" to -Xmx1000m.
The processing went smoothly for 1.5 hours, completing 30% job.
Then master node hung.
Is there any way to get the ""contrib/index/hadoop-0.19.1-index.jar"" get going?
Is there any memory leak in the jar?
Can you suggest some alternatives?
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.