On Wed, Jun 10, 2009 at 4:55 AM, Sugandha Naolekar wrote:
If I want to make the data transfer fast, then what am I supposed
to do? I want to place the data in HDFS and replicate it in fraction of
seconds.
I want to go to France, but it takes 10+ hours to get there from California
on the fastest plane. How can I get there faster?
Can that be possible. and How? Placing a 5GB file will take atleast
half n hour...or so...but, if its a large cluster, lets say, of 7nodes, and
then placing it in HDFS would take around 2-3 hours. So, how that time
delay
can be avoided..?
HDFS will only replicate as many times as you want it to. The write is also
pipelined. This means that writing a 5G file that is replicated to 3 nodes
is only marginally faster than the same file on 10 nodes, if for some reason
you wanted to set your replication count to 10 (unnecessary for 99.99999% of
use cases)
Also, My simply aim is to transfer the data, i.e; dumping the data
into HDFS and gettign it back whenever needed. So, for this, transfer, how
speed can be achieved?
HDFS isn't magic. You can only write as fast as your disk and network can.
If your disk has 50MB/sec of throughput, you'll probably be limited at
50MB/sec. Expecting much more than this in real life scenarios is
unrealistic.
-Todd