FAQ
If I want to make the data transfer fast, then what am I supposed
to do? I want to place the data in HDFS and replicate it in fraction of
seconds. Can that be possible. and How? Placing a 5GB file will take atleast
half n hour...or so...but, if its a large cluster, lets say, of 7nodes, and
then placing it in HDFS would take around 2-3 hours. So, how that time delay
can be avoided..?

Also, My simply aim is to transfer the data, i.e; dumping the data
into HDFS and gettign it back whenever needed. So, for this, transfer, how
speed can be achieved?
--
Regards!
Sugandha

Search Discussions

  • Todd Lipcon at Jun 10, 2009 at 2:46 pm

    On Wed, Jun 10, 2009 at 4:55 AM, Sugandha Naolekar wrote:

    If I want to make the data transfer fast, then what am I supposed
    to do? I want to place the data in HDFS and replicate it in fraction of
    seconds.

    I want to go to France, but it takes 10+ hours to get there from California
    on the fastest plane. How can I get there faster?

    Can that be possible. and How? Placing a 5GB file will take atleast
    half n hour...or so...but, if its a large cluster, lets say, of 7nodes, and
    then placing it in HDFS would take around 2-3 hours. So, how that time
    delay
    can be avoided..?
    HDFS will only replicate as many times as you want it to. The write is also
    pipelined. This means that writing a 5G file that is replicated to 3 nodes
    is only marginally faster than the same file on 10 nodes, if for some reason
    you wanted to set your replication count to 10 (unnecessary for 99.99999% of
    use cases)

    Also, My simply aim is to transfer the data, i.e; dumping the data
    into HDFS and gettign it back whenever needed. So, for this, transfer, how
    speed can be achieved?

    HDFS isn't magic. You can only write as fast as your disk and network can.
    If your disk has 50MB/sec of throughput, you'll probably be limited at
    50MB/sec. Expecting much more than this in real life scenarios is
    unrealistic.

    -Todd

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 10, '09 at 11:55a
activeJun 10, '09 at 2:46p
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Sugandha Naolekar: 1 post Todd Lipcon: 1 post

People

Translate

site design / logo © 2022 Grokbase