FAQ
Hi,
I am just wondering what do facebook/yahoo do with the data in hdfs after
they finish processing the log files or whatever that are in hdfs?
Are they simply deleted? or get backed up in tape ?
whats the typical process?
Also what is the process of adding a new node to the hadoop cluster? simply
connect a new computer to the network (and setup the hadoop conf)?

Search Discussions

  • Allen Wittenauer at Jun 15, 2009 at 7:03 pm

    On 6/13/09 9:00 AM, "PORTO aLET" wrote:
    I am just wondering what do facebook/yahoo do with the data in hdfs after
    they finish processing the log files or whatever that are in hdfs?
    Are they simply deleted? or get backed up in tape ?
    whats the typical process?
    The grid ops team here at Yahoo! has a strict retention policy that
    dictates the data is deleted after X time period. We perform no backups of
    the data on the grid. It is also worth mentioning that the data is loaded
    from the primary source, so in the case of data corruption (hai hadoop-0.18)
    or accidental deletion (where are my snapshots dev people?), we reload the
    data from that primary source. (dependent, of course, on whether they still
    have it or not)
    Also what is the process of adding a new node to the hadoop cluster? simply
    connect a new computer to the network (and setup the hadoop conf)?
    http://wiki.apache.org/hadoop/FAQ#17

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 13, '09 at 4:01p
activeJun 15, '09 at 7:03p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

PORTO aLET: 1 post Allen Wittenauer: 1 post

People

Translate

site design / logo © 2022 Grokbase