FAQ
I'm interested in hearing how you get data into and out of HDFS. Are you
using tools like Flume? Are you using fuse_dfs? Are you putting files on
HDFS with "hadoop dfs -put ..."?
And how does your method scale? Can you move terrabytes of data per day? Or
are we talking gigabytes?

Search Discussions

  • Will Maier at Mar 29, 2011 at 1:39 pm
    Hi Eric-
    On Tue, Mar 29, 2011 at 03:20:38PM +0200, Eric wrote:
    I'm interested in hearing how you get data into and out of HDFS. Are you
    using tools like Flume? Are you using fuse_dfs? Are you putting files on
    HDFS with "hadoop dfs -put ..."?
    And how does your method scale? Can you move terrabytes of data per day? Or
    are we talking gigabytes?
    I'm currently migrating our ~600TB datastore to HDFS. To transfer the data,
    we iterate through the raw files stored on our legacy data servers and write
    them to HDFS using `hadoop fs -put`. So far, I've limited the number of servers
    participating in the migration, so we've only had on the order of 20 parallel
    writers. This week, I plan to increase that by at least an order of magnitude.
    I expect to be able to scale the migration horizontally without impacting our
    current production system. Then, when the transfers are complete, we can cut our
    protocol endpoints over without significant downtime. At least, that's the plan.
    ;)

    --

    Will Maier - UW High Energy Physics
    cel: 608.438.6162
    tel: 608.263.9692
    web: http://www.hep.wisc.edu/~wcmaier/
  • Eric at Mar 29, 2011 at 2:22 pm
    Hi Will,

    In theory, your only bottleneck is the network and the amount of datanodes
    you have running, so it should scale quite well. I'm very interested to hear
    about your experiences after adding more writers.

    Thanks,
    Eric

    2011/3/29 Will Maier <wcmaier@hep.wisc.edu>
    Hi Eric-
    On Tue, Mar 29, 2011 at 03:20:38PM +0200, Eric wrote:
    I'm interested in hearing how you get data into and out of HDFS. Are you
    using tools like Flume? Are you using fuse_dfs? Are you putting files on
    HDFS with "hadoop dfs -put ..."?
    And how does your method scale? Can you move terrabytes of data per day? Or
    are we talking gigabytes?
    I'm currently migrating our ~600TB datastore to HDFS. To transfer the data,
    we iterate through the raw files stored on our legacy data servers and
    write
    them to HDFS using `hadoop fs -put`. So far, I've limited the number of
    servers
    participating in the migration, so we've only had on the order of 20
    parallel
    writers. This week, I plan to increase that by at least an order of
    magnitude.
    I expect to be able to scale the migration horizontally without impacting
    our
    current production system. Then, when the transfers are complete, we can
    cut our
    protocol endpoints over without significant downtime. At least, that's the
    plan.
    ;)

    --

    Will Maier - UW High Energy Physics
    cel: 608.438.6162
    tel: 608.263.9692
    web: http://www.hep.wisc.edu/~wcmaier/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouphdfs-user @
categorieshadoop
postedMar 29, '11 at 1:21p
activeMar 29, '11 at 2:22p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Eric: 2 posts Will Maier: 1 post

People

Translate

site design / logo © 2022 Grokbase