FAQ
Hello all.

I need to process many Gigs of new data each 10 minutes. Each 10 minutes
cron launches bash script "do.sh" that puts data into HDFS and launches
processing. But...

Hadoop isn't military software, so there is probability of errors with
HDFS. So i need to watch LOG files to catch problems. For example, HDFS
may crash and it will be need to format whole HDFS, delete /tmp/hadoop*,
ets...

So i decided to do full restart each 10 mins whole cluster before begining
of data processing. I am erasing all /tmp/hadoop* on each node by ssh,
start dfs, start mapred, put binaries and data and then run processing.

But after formatting and starting DFS i need to wait some time (sleep 60)
before putting data into HDFS. Else i will receive
"NotReplicatedYetException".

What you think about this all? Thank you :)

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 3 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 3, '09 at 10:14a
activeJun 3, '09 at 11:06p
posts3
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase