Just to toss out some numbers.... (and because our users are making
interesting numbers right now)
Here's our external network router:
http://mrtg.unl.edu/~cricket/?target=%2Frouter-interfaces%2Fborder2%2Ftengigabitethernet2_2;view=OctetsHere's the application-level transfer graph:
http://t2.unl.edu/phedex/graphs/quantity_rates?link=src&no_mss=true&to_node=NebraskaIn a squeeze, we can move 20-50TB / day to/from other heterogenous
sites. Usually, we run out of free space before we can find the upper
limit for a 24-hour period.
We use a protocol called GridFTP to move data back and forth between
external (non-HDFS) clusters. The other sites we transfer with use
niche software you probably haven't heard of (Castor, DPM, and dCache)
because, well, it's niche software. I have no available data on HDFS<-
S3 systems, but I'd again claim it's mostly a function of the amount
of hardware you throw at it and the size of your network pipes.
There are currently 182 datanodes; 180 are "traditional" ones of <3TB
and 2 are big honking RAID arrays of 40TB. Transfers are load-
balanced amongst ~ 7 GridFTP servers which each have 1Gbps connection.
Does that help?
Brian
On Feb 10, 2009, at 4:46 PM, Brian Bockelman wrote:On Feb 10, 2009, at 4:10 PM, Wasim Bari wrote:
Hi,
Could someone help me to find some real Figures (transfer rate)
about Hadoop File transfer from local filesystem to HDFS, S3 etc
and among Storage Systems (HDFS to S3 etc)
Thanks,
Wasim
What are you looking for? Maximum possible transfer rate? Maximum
possible transfer rate per client? Generally, if you're using the
Java client, transfer rate to/from HDFS is limited by the hardware
you have and the network connection (if you have 1Gbps per client).
I could give you a graph showing a peak of 9Gbps from our Hadoop
instance to the WAN, but that's not very interesting if you don't
have a 10Gbps pipe...
Brian