FAQ
Hello!

If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
cluster) into HDFS, and get it back, how much time is it supposed to take?

No map-reduce involved. Simply Writing files in and out from HDFS through a
simple code of java (usage of API's).

--
Regards!
Sugandha

Search Discussions

  • Kartik saxena at Jun 10, 2009 at 9:17 am
    I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
    file.
    Secura

    On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
    wrote:It
    Hello!

    If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
    cluster) into HDFS, and get it back, how much time is it supposed to take?

    No map-reduce involved. Simply Writing files in and out from HDFS through a
    simple code of java (usage of API's).

    --
    Regards!
    Sugandha
  • Sugandha Naolekar at Jun 10, 2009 at 9:25 am
    But if I want to make it fast, then??? I want to place the data in HDFS and
    reoplicate it in fraction of seconds. Can that be possible. and How?
    On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena wrote:

    I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
    file.
    Secura

    On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
    wrote:It
    Hello!

    If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
    cluster) into HDFS, and get it back, how much time is it supposed to take?
    No map-reduce involved. Simply Writing files in and out from HDFS through a
    simple code of java (usage of API's).

    --
    Regards!
    Sugandha


    --
    Regards!
    Sugandha
  • Brian Bockelman at Jun 10, 2009 at 1:44 pm
    Hey Sugandha,

    Transfer rates depend on the quality/quantity of your hardware and the
    quality of your client disk that is generating the data. I usually
    say that you should expect near-hardware-bottleneck speeds for an
    otherwise idle cluster.

    There should be no "make it fast" required (though you should reviewi
    the logs for errors if it's going slow). I would expect a 5GB file to
    take around 3-5 minutes to write on our cluster, but it's a well-tuned
    and operational cluster.

    As Todd (I think) mentioned before, we can't help any when you say "I
    want to make it faster". You need to provide diagnostic information -
    logs, Ganglia plots, stack traces, something - that folks can look at.

    Brian
    On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:

    But if I want to make it fast, then??? I want to place the data in
    HDFS and
    reoplicate it in fraction of seconds. Can that be possible. and How?

    On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena
    wrote:
    I would suppose about 2-3 hours. It took me some 2 days to load a
    160 Gb
    file.
    Secura

    On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
    wrote:It
    Hello!

    If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
    cluster) into HDFS, and get it back, how much time is it supposed to take?
    No map-reduce involved. Simply Writing files in and out from HDFS
    through a
    simple code of java (usage of API's).

    --
    Regards!
    Sugandha


    --
    Regards!
    Sugandha
  • Raghu Angadi at Jun 11, 2009 at 6:31 pm
    Thanks Brian for the good advice.

    Slightly off topic from original post: there will be occasions where it
    is necessary or better to copy different portions of a file in parallel
    (distcp can benefit a lot). There is a proposal to let HDFS 'stitch'
    multiple files into one: something like

    NameNode.stitchFiles(Path to, Path[] files)

    This way a very large file can be copied more efficiently (with a
    map/red job, for e.g). Another use case is for high latency and high
    bandwidth connections (like coast-to-coast). High latency can be some
    what worked around by using large buffers for tcp connections, but
    usually users don't have that control. It is just simpler to use
    multiple connections.

    This will obviously be HDFS only interface (i.e. not a FileSystem
    method) at least initially.

    Raghu.

    Brian Bockelman wrote:
    Hey Sugandha,

    Transfer rates depend on the quality/quantity of your hardware and the
    quality of your client disk that is generating the data. I usually say
    that you should expect near-hardware-bottleneck speeds for an otherwise
    idle cluster.

    There should be no "make it fast" required (though you should reviewi
    the logs for errors if it's going slow). I would expect a 5GB file to
    take around 3-5 minutes to write on our cluster, but it's a well-tuned
    and operational cluster.

    As Todd (I think) mentioned before, we can't help any when you say "I
    want to make it faster". You need to provide diagnostic information -
    logs, Ganglia plots, stack traces, something - that folks can look at.

    Brian
    On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:

    But if I want to make it fast, then??? I want to place the data in
    HDFS and
    reoplicate it in fraction of seconds. Can that be possible. and How?

    On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <kartik.sxn@gmail.com>
    wrote:
    I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
    file.
    Secura

    On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
    wrote:It
    Hello!

    If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
    cluster) into HDFS, and get it back, how much time is it supposed to take?
    No map-reduce involved. Simply Writing files in and out from HDFS
    through a
    simple code of java (usage of API's).

    --
    Regards!
    Sugandha


    --
    Regards!
    Sugandha
  • Scott at Jun 12, 2009 at 1:48 pm
    So is ~ 1GB/minute transfer rate a reasonable performance benchmark?
    Our test cluster consists of 4 quad core xeon machines with 2 non-raided
    drives each. My initial tests show a transfer rate of around
    1GB/minute, and that was slower that I expected it to be.

    Thanks,
    Scott


    Brian Bockelman wrote:
    Hey Sugandha,

    Transfer rates depend on the quality/quantity of your hardware and the
    quality of your client disk that is generating the data. I usually
    say that you should expect near-hardware-bottleneck speeds for an
    otherwise idle cluster.

    There should be no "make it fast" required (though you should reviewi
    the logs for errors if it's going slow). I would expect a 5GB file to
    take around 3-5 minutes to write on our cluster, but it's a well-tuned
    and operational cluster.

    As Todd (I think) mentioned before, we can't help any when you say "I
    want to make it faster". You need to provide diagnostic information -
    logs, Ganglia plots, stack traces, something - that folks can look at.

    Brian
    On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:

    But if I want to make it fast, then??? I want to place the data in
    HDFS and
    reoplicate it in fraction of seconds. Can that be possible. and How?

    On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <kartik.sxn@gmail.com>
    wrote:
    I would suppose about 2-3 hours. It took me some 2 days to load a
    160 Gb
    file.
    Secura

    On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
    wrote:It
    Hello!

    If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
    cluster) into HDFS, and get it back, how much time is it supposed to take?
    No map-reduce involved. Simply Writing files in and out from HDFS
    through a
    simple code of java (usage of API's).

    --
    Regards!
    Sugandha


    --
    Regards!
    Sugandha
  • Brian Bockelman at Jun 12, 2009 at 3:34 pm
    What'd you do for the tests? Was it a single stream or a multiple
    stream test?

    Brian
    On Jun 12, 2009, at 6:48 AM, Scott wrote:

    So is ~ 1GB/minute transfer rate a reasonable performance
    benchmark? Our test cluster consists of 4 quad core xeon machines
    with 2 non-raided drives each. My initial tests show a transfer
    rate of around 1GB/minute, and that was slower that I expected it to
    be.

    Thanks,
    Scott


    Brian Bockelman wrote:
    Hey Sugandha,

    Transfer rates depend on the quality/quantity of your hardware and
    the quality of your client disk that is generating the data. I
    usually say that you should expect near-hardware-bottleneck speeds
    for an otherwise idle cluster.

    There should be no "make it fast" required (though you should
    reviewi the logs for errors if it's going slow). I would expect a
    5GB file to take around 3-5 minutes to write on our cluster, but
    it's a well-tuned and operational cluster.

    As Todd (I think) mentioned before, we can't help any when you say
    "I want to make it faster". You need to provide diagnostic
    information - logs, Ganglia plots, stack traces, something - that
    folks can look at.

    Brian
    On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:

    But if I want to make it fast, then??? I want to place the data in
    HDFS and
    reoplicate it in fraction of seconds. Can that be possible. and How?

    On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena
    wrote:
    I would suppose about 2-3 hours. It took me some 2 days to load a
    160 Gb
    file.
    Secura

    On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
    wrote:It
    Hello!

    If I try to transfer a 5GB VDI file from a remote host(not a
    part of hadoop
    cluster) into HDFS, and get it back, how much time is it
    supposed to take?
    No map-reduce involved. Simply Writing files in and out from
    HDFS through a
    simple code of java (usage of API's).

    --
    Regards!
    Sugandha


    --
    Regards!
    Sugandha
  • Scott at Jun 12, 2009 at 4:03 pm
    I ran the put command on 3 of the nodes simultaneously to copy files
    that were local on those machines into the hdfs.

    Brian Bockelman wrote:
    What'd you do for the tests? Was it a single stream or a multiple
    stream test?

    Brian
    On Jun 12, 2009, at 6:48 AM, Scott wrote:

    So is ~ 1GB/minute transfer rate a reasonable performance benchmark?
    Our test cluster consists of 4 quad core xeon machines with 2
    non-raided drives each. My initial tests show a transfer rate of
    around 1GB/minute, and that was slower that I expected it to be.

    Thanks,
    Scott


    Brian Bockelman wrote:
    Hey Sugandha,

    Transfer rates depend on the quality/quantity of your hardware and
    the quality of your client disk that is generating the data. I
    usually say that you should expect near-hardware-bottleneck speeds
    for an otherwise idle cluster.

    There should be no "make it fast" required (though you should
    reviewi the logs for errors if it's going slow). I would expect a
    5GB file to take around 3-5 minutes to write on our cluster, but
    it's a well-tuned and operational cluster.

    As Todd (I think) mentioned before, we can't help any when you say
    "I want to make it faster". You need to provide diagnostic
    information - logs, Ganglia plots, stack traces, something - that
    folks can look at.

    Brian
    On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:

    But if I want to make it fast, then??? I want to place the data in
    HDFS and
    reoplicate it in fraction of seconds. Can that be possible. and How?

    On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena
    wrote:
    I would suppose about 2-3 hours. It took me some 2 days to load a
    160 Gb
    file.
    Secura

    On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
    wrote:It
    Hello!

    If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
    cluster) into HDFS, and get it back, how much time is it supposed to take?
    No map-reduce involved. Simply Writing files in and out from HDFS
    through a
    simple code of java (usage of API's).

    --
    Regards!
    Sugandha


    --
    Regards!
    Sugandha
  • Brian Bockelman at Jun 12, 2009 at 6:41 pm
    What's your replication factor? What aggregate I/O rates do you see
    in Ganglia? Is the I/O spikey, or has it plateaued?

    We can hit close to network rate (1Gbps) per node locally, and have
    pretty similar hardware.

    Brian
    On Jun 12, 2009, at 9:03 AM, Scott wrote:

    I ran the put command on 3 of the nodes simultaneously to copy files
    that were local on those machines into the hdfs.

    Brian Bockelman wrote:
    What'd you do for the tests? Was it a single stream or a multiple
    stream test?

    Brian
    On Jun 12, 2009, at 6:48 AM, Scott wrote:

    So is ~ 1GB/minute transfer rate a reasonable performance
    benchmark? Our test cluster consists of 4 quad core xeon machines
    with 2 non-raided drives each. My initial tests show a transfer
    rate of around 1GB/minute, and that was slower that I expected it
    to be.

    Thanks,
    Scott


    Brian Bockelman wrote:
    Hey Sugandha,

    Transfer rates depend on the quality/quantity of your hardware
    and the quality of your client disk that is generating the data.
    I usually say that you should expect near-hardware-bottleneck
    speeds for an otherwise idle cluster.

    There should be no "make it fast" required (though you should
    reviewi the logs for errors if it's going slow). I would expect
    a 5GB file to take around 3-5 minutes to write on our cluster,
    but it's a well-tuned and operational cluster.

    As Todd (I think) mentioned before, we can't help any when you
    say "I want to make it faster". You need to provide diagnostic
    information - logs, Ganglia plots, stack traces, something - that
    folks can look at.

    Brian
    On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:

    But if I want to make it fast, then??? I want to place the data
    in HDFS and
    reoplicate it in fraction of seconds. Can that be possible. and
    How?

    On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <kartik.sxn@gmail.com
    wrote:
    I would suppose about 2-3 hours. It took me some 2 days to load
    a 160 Gb
    file.
    Secura

    On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
    wrote:It
    Hello!

    If I try to transfer a 5GB VDI file from a remote host(not a
    part of hadoop
    cluster) into HDFS, and get it back, how much time is it
    supposed to take?
    No map-reduce involved. Simply Writing files in and out from
    HDFS through a
    simple code of java (usage of API's).

    --
    Regards!
    Sugandha


    --
    Regards!
    Sugandha
  • Jason hadoop at Jun 13, 2009 at 1:55 am
    Also check the IO wait time on your datanodes, if the io wait time is high,
    you can't win.
    On Fri, Jun 12, 2009 at 11:24 AM, Brian Bockelman wrote:

    What's your replication factor? What aggregate I/O rates do you see in
    Ganglia? Is the I/O spikey, or has it plateaued?

    We can hit close to network rate (1Gbps) per node locally, and have pretty
    similar hardware.

    Brian


    On Jun 12, 2009, at 9:03 AM, Scott wrote:

    I ran the put command on 3 of the nodes simultaneously to copy files that
    were local on those machines into the hdfs.

    Brian Bockelman wrote:
    What'd you do for the tests? Was it a single stream or a multiple stream
    test?

    Brian

    On Jun 12, 2009, at 6:48 AM, Scott wrote:

    So is ~ 1GB/minute transfer rate a reasonable performance benchmark?
    Our test cluster consists of 4 quad core xeon machines with 2 non-raided
    drives each. My initial tests show a transfer rate of around 1GB/minute,
    and that was slower that I expected it to be.

    Thanks,
    Scott


    Brian Bockelman wrote:
    Hey Sugandha,

    Transfer rates depend on the quality/quantity of your hardware and the
    quality of your client disk that is generating the data. I usually say that
    you should expect near-hardware-bottleneck speeds for an otherwise idle
    cluster.

    There should be no "make it fast" required (though you should reviewi
    the logs for errors if it's going slow). I would expect a 5GB file to take
    around 3-5 minutes to write on our cluster, but it's a well-tuned and
    operational cluster.

    As Todd (I think) mentioned before, we can't help any when you say "I
    want to make it faster". You need to provide diagnostic information - logs,
    Ganglia plots, stack traces, something - that folks can look at.

    Brian

    On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:

    But if I want to make it fast, then??? I want to place the data in
    HDFS and
    reoplicate it in fraction of seconds. Can that be possible. and How?

    On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <kartik.sxn@gmail.com>
    wrote:

    I would suppose about 2-3 hours. It took me some 2 days to load a 160
    Gb
    file.
    Secura

    On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
    wrote:It

    Hello!
    If I try to transfer a 5GB VDI file from a remote host(not a part of hadoop
    cluster) into HDFS, and get it back, how much time is it supposed to take?
    No map-reduce involved. Simply Writing files in and out from HDFS
    through a
    simple code of java (usage of API's).

    --
    Regards!
    Sugandha

    --
    Regards!
    Sugandha

    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.apress.com/book/view/9781430219422
    www.prohadoopbook.com a community for Hadoop Professionals

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 10, '09 at 6:26a
activeJun 13, '09 at 1:55a
posts10
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase