I've heard that HDFS starts to slow down after it's been running for a long time. And I believe I've experienced this. So, I was thinking to set up a cron job to execute every week to shutdown HDFS and start it up again.
In concept, it would be something like:
0 0 0 0 0 $HADOOP_HOME/bin/stop-dfs.sh; $HADOOP_HOME/bin/start-dfs.sh
But I'm wondering if there is a safer way to do this. In particular:
* What if a map/reduce job is running when this cron hits. Is there a way to suspend jobs while the HDFS restart happens?
* Should I also restart the mapred daemons?
* Should I wait some time after "stop-dfs.sh" for things to settle down, before executing "start-dfs.sh"? Or maybe I should run a command to verify that it is stopped before I run the start?
Thanks for any help.
PRIVATE AND CONFIDENTIAL - NOTICE TO RECIPIENT: THIS E-MAIL IS MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGE BY LAW. IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS EMAIL IS STRICTLY PROHIBITED. PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM.