FAQ
Hi everyone,

I was curious if there is any option to use Hadoop in single node mode
in a way, that enables the process to use more system ressources.
Right now, Hadoop uses one mapper and one reducer, leaving my i7 with
about 20% CPU usage (1 core for Hadoop, .5 cores for my OS) basically
idling.
Raising the number of map tasks doesn't seem to do much, as this
parameter seems to more of a hint anyway. Still, I have lots of CPU
time and RAM left. Any hints on how to use them?

thanks in advance,
Moritz

Search Discussions

  • Michael Segel at Jul 16, 2010 at 9:59 am
    Moritz,

    I'm not sure what you're doing, but raising the number of mapers in your configuration isn't a 'hint'.

    The number of mapers that you can run will depend on your configuration. You mention an i7 which is a quad core cpu, but you don't mention the amount of memory you have available, or what else you run on the machine. You don't want hadoop to swap.

    If your initial m/r jobs are taking input from a file, the default behavior is to create one map/reduce task per block. So if your initial input file is < 64MB and you have kept your default block size of 64MB, then you will only have one map/reduce task.

    I haven't played with Hadoop in a single node / pseudo distributed environment... just in a distributed environment but I believe that the functionality is the same.

    HTH

    -Mike
    PS. Please take my advice with a grain of salt. It's 5:00am and I haven't had my first cup of coffee yet. ;-)
    From: moritzkrog@googlemail.com
    Date: Fri, 16 Jul 2010 11:03:19 +0200
    Subject: Single Node with multiple mappers?
    To: common-user@hadoop.apache.org

    Hi everyone,

    I was curious if there is any option to use Hadoop in single node mode
    in a way, that enables the process to use more system ressources.
    Right now, Hadoop uses one mapper and one reducer, leaving my i7 with
    about 20% CPU usage (1 core for Hadoop, .5 cores for my OS) basically
    idling.
    Raising the number of map tasks doesn't seem to do much, as this
    parameter seems to more of a hint anyway. Still, I have lots of CPU
    time and RAM left. Any hints on how to use them?

    thanks in advance,
    Moritz
    _________________________________________________________________
    The New Busy is not the old busy. Search, chat and e-mail from your inbox.
    http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
  • Moritz Krog at Jul 16, 2010 at 6:36 pm
    Hey :)

    thanks for the quick response. My Systems runs on an i7 together with
    about 8GB of RAM. The problem with my setup is, that I'm using Hadoop
    to pump 40GB of JSON encoded data hashes into a MySQL database. The
    data is in non-relational form and needs to be normalized before it
    can enter the DB, thus the Hadoop approach. I ran the first batch of
    test data last night, and it took ~10GB ~12h to get processed. ( a
    python mapper writing to MySQL via the oursql pkg )
    The reason for this is certainly not perfect configured mysqld along
    with the fact that I gave Hadoop to much access to memory. I used 4GB
    for Hadoop but forgot to remember that MySQL was granted about 6GB as
    well.. so I was 2-3GB into swap most of the time. ( Though I don't
    know how much was Hadoop and how much was MySQL )
    Anyway, I didn't bother so set up the 'real' Handoop environment with
    daemon and hdfs but instead just run the streaming jar directly from
    $hadoop_home. I don't know if this really matters in any way, I just
    thought I mention it.

    all the best,
    Moritz
    On Jul 16, 2010, at 11:59 AM, Michael Segel wrote:



    Moritz,

    I'm not sure what you're doing, but raising the number of mapers in your configuration isn't a 'hint'.

    The number of mapers that you can run will depend on your configuration. You mention an i7 which is a quad core cpu, but you don't mention the amount of memory you have available, or what else you run on the machine. You don't want hadoop to swap.

    If your initial m/r jobs are taking input from a file, the default behavior is to create one map/reduce task per block. So if your initial input file is < 64MB and you have kept your default block size of 64MB, then you will only have one map/reduce task.

    I haven't played with Hadoop in a single node / pseudo distributed environment... just in a distributed environment but I believe that the functionality is the same.

    HTH

    -Mike
    PS. Please take my advice with a grain of salt. It's 5:00am and I haven't had my first cup of coffee yet. ;-)
    From: moritzkrog@googlemail.com
    Date: Fri, 16 Jul 2010 11:03:19 +0200
    Subject: Single Node with multiple mappers?
    To: common-user@hadoop.apache.org

    Hi everyone,

    I was curious if there is any option to use Hadoop in single node mode
    in a way, that enables the process to use more system ressources.
    Right now, Hadoop uses one mapper and one reducer, leaving my i7 with
    about 20% CPU usage (1 core for Hadoop, .5 cores for my OS) basically
    idling.
    Raising the number of map tasks doesn't seem to do much, as this
    parameter seems to more of a hint anyway. Still, I have lots of CPU
    time and RAM left. Any hints on how to use them?

    thanks in advance,
    Moritz
    _________________________________________________________________
    The New Busy is not the old busy. Search, chat and e-mail from your inbox.
    http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
  • Asif Jan at Jul 16, 2010 at 10:04 am
    how is your data being spilt ?
    using mapred.map.tasks property should let you specify how many
    maps you would want to run (provided your input file is big enough to
    be spilt into multiple chunks)

    asif
    On Jul 16, 2010, at 11:03 AM, Moritz Krog wrote:

    Hi everyone,

    I was curious if there is any option to use Hadoop in single node mode
    in a way, that enables the process to use more system ressources.
    Right now, Hadoop uses one mapper and one reducer, leaving my i7 with
    about 20% CPU usage (1 core for Hadoop, .5 cores for my OS) basically
    idling.
    Raising the number of map tasks doesn't seem to do much, as this
    parameter seems to more of a hint anyway. Still, I have lots of CPU
    time and RAM left. Any hints on how to use them?

    thanks in advance,
    Moritz

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedJul 16, '10 at 9:02a
activeJul 16, '10 at 6:36p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase