Hi Eric-
On Tue, Mar 29, 2011 at 03:20:38PM +0200, Eric wrote:
I'm interested in hearing how you get data into and out of HDFS. Are you
using tools like Flume? Are you using fuse_dfs? Are you putting files on
HDFS with "hadoop dfs -put ..."?
And how does your method scale? Can you move terrabytes of data per day? Or
are we talking gigabytes?
I'm currently migrating our ~600TB datastore to HDFS. To transfer the data,
we iterate through the raw files stored on our legacy data servers and write
them to HDFS using `hadoop fs -put`. So far, I've limited the number of servers
participating in the migration, so we've only had on the order of 20 parallel
writers. This week, I plan to increase that by at least an order of magnitude.
I expect to be able to scale the migration horizontally without impacting our
current production system. Then, when the transfers are complete, we can cut our
protocol endpoints over without significant downtime. At least, that's the plan.


Will Maier - UW High Energy Physics
cel: 608.438.6162
tel: 608.263.9692
web: http://www.hep.wisc.edu/~wcmaier/

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 3 | next ›
Discussion Overview
grouphdfs-user @
postedMar 29, '11 at 1:21p
activeMar 29, '11 at 2:22p

2 users in discussion

Eric: 2 posts Will Maier: 1 post



site design / logo © 2022 Grokbase