FAQ
Hello everybody

I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
example. It takes about 20 sec for processing of 1,5MB text file. We want to
use Map/Reduce in real time (interactive: by user's requests). User can't
wait for his request 20 sec. This is too long. Is it possible to reduce time
of Map/Reduce job? Or may be I misunderstand something?

BR,
Igor Babkin, Mifors.com

Search Discussions

  • Li ping at Feb 1, 2011 at 2:36 pm
    The Hadoop is designed for no-real time application.
    But You can change the parameter to reduce the job execution time.

    I search an article in Google.
    Hope You can find some useful information on that.
    http://www.slideshare.net/ImpetusInfo/ppt-on-advanced-hadoop-tuning-n-optimisation
    On Tue, Feb 1, 2011 at 4:19 PM, Igor Bubkin wrote:

    Hello everybody

    I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
    example. It takes about 20 sec for processing of 1,5MB text file. We want
    to
    use Map/Reduce in real time (interactive: by user's requests). User can't
    wait for his request 20 sec. This is too long. Is it possible to reduce
    time
    of Map/Reduce job? Or may be I misunderstand something?

    BR,
    Igor Babkin, Mifors.com


    --
    -----李平
  • Praveen Peddi at Feb 1, 2011 at 3:09 pm
    Hi Igor,
    I am not sure if Hadoop is designed for realtime requests. I have a feeling that you are trying to use Hadoop in a way that it isnot designed for. From my experience, Hadoop cluster will be much slower than "local" hadoop mode when processing smaller dataset, because there is always extra overhead of task and job management in cluster mode.

    Praveen
    ________________________________________
    From: ext Igor Bubkin [igba14@gmail.com]
    Sent: Tuesday, February 01, 2011 3:19 AM
    To: common-issues@hadoop.apache.org
    Cc: common-user@hadoop.apache.org
    Subject: How to speed up of Map/Reduce job?

    Hello everybody

    I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
    example. It takes about 20 sec for processing of 1,5MB text file. We want to
    use Map/Reduce in real time (interactive: by user's requests). User can't
    wait for his request 20 sec. This is too long. Is it possible to reduce time
    of Map/Reduce job? Or may be I misunderstand something?

    BR,
    Igor Babkin, Mifors.com
  • Steve Loughran at Feb 1, 2011 at 3:54 pm

    On 01/02/11 08:19, Igor Bubkin wrote:
    Hello everybody

    I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount
    example. It takes about 20 sec for processing of 1,5MB text file. We want to
    use Map/Reduce in real time (interactive: by user's requests). User can't
    wait for his request 20 sec. This is too long. Is it possible to reduce time
    of Map/Reduce job? Or may be I misunderstand something?
    1. I'd expect a minimum 30s query time due to the way work gets queued
    and dispatched, JVM startup costs etc. There is no way to eliminate this
    in Hadoop's current architecture.

    2. 1.5M is a very small file size; I'm currently recommending a block
    size of 512M in new clusters for various reasons. This size of data is
    just too small to bother with distribution. Load it up into memory;
    analyse it locally. Things like Apache CouchDB also support MapReduce.

    Hadoop is not designed for clusters of less than about 10 machines (not
    enough redundancy of storage), or for small datasets. If your problems
    aren't big enough, use different tools, because Hadoop contains design
    decisions and overheads that only make sense once your data is measured
    in GB and your filesystem in tens to thousands of Terabytes.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedFeb 1, '11 at 2:01p
activeFeb 1, '11 at 3:54p
posts4
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase