FAQ
Hi All,

I am interested to know that can we use hadoop for applications where they
need more control over the data and it can specify which node will do which
part of the processing or the storage. For instance, suppose that I have two
data files (datasets, say 1 and 2) and setup a hadoop with two datanodes (A
and B) in a distributed cache, can I specify dataset 1 should load on node A
and dataset 2 should load on node B? Also I have two tasks (a and b), is it
possible to perform task a on node A and task b on node B?

In other words, I want to supply a pattern of operation file (specifying
store and operation tasks) to hadoop to perform, will it be possible? If it
is, I would appreciate a link discussing this or if a sample
code/application doing this.


Thanks a lot,

--
Ahmad

Search Discussions

  • Mike Kendall at Dec 17, 2009 at 12:11 am
    it sounds to me like you might want to split what you want to do up
    into two separate jobs entirely... i don't quite understand your use
    case since the point of hadoop is to spread your load as much (and
    haphazardly!) as possible.

    -mike

    On Wed, Dec 16, 2009 at 3:43 PM, Ahmad Ali Iqbal
    wrote:
    Hi All,

    I am interested to know that can we use hadoop for applications where they
    need more control over the data and it can specify which node will do which
    part of the processing or the storage. For instance, suppose that I have two
    data files (datasets, say 1 and 2) and setup a hadoop with two datanodes (A
    and B) in a distributed cache, can I specify dataset 1 should load on node A
    and dataset 2 should load on node B? Also I have two tasks (a and b), is it
    possible to perform task a on node A and task b on node B?

    In other words, I want to supply a pattern of operation file (specifying
    store and operation tasks) to hadoop to perform, will it be possible? If it
    is, I would appreciate a link discussing this or if a sample
    code/application doing this.


    Thanks a lot,

    --
    Ahmad
  • Ahmad Ali Iqbal at Dec 17, 2009 at 3:40 am
    Hi Mike,

    My understanding is, in hadoop job scheduling is done implicitly as you said
    it spread load as much as possible. However, I want to control task
    assignments to nodes. Let me put in a context of ad-hoc networking
    application scenario where a mobile devices broadcast *Hello* packets
    periodically to keep informing their presence to neighbours and then through
    some algorithm they select a cluster head. Then cluster head assigns
    different tasks to different nodes.

    Take another scenario of a distributed database where every node holds part
    of it (distribution of data should be done explicitly by the user, not by
    the hadoop implicit scheduling) and then apply a query on these nodes
    independently. Finally performing a merging of the results retrieved from
    distributed node databases.

    Is it possible to perform such kind of tasks using hadoop?

    Thanks a lot,

    --
    Ahmad

    On Thu, Dec 17, 2009 at 11:11 AM, Mike Kendall wrote:

    it sounds to me like you might want to split what you want to do up
    into two separate jobs entirely... i don't quite understand your use
    case since the point of hadoop is to spread your load as much (and
    haphazardly!) as possible.

    -mike

    On Wed, Dec 16, 2009 at 3:43 PM, Ahmad Ali Iqbal
    wrote:
    Hi All,

    I am interested to know that can we use hadoop for applications where they
    need more control over the data and it can specify which node will do which
    part of the processing or the storage. For instance, suppose that I have two
    data files (datasets, say 1 and 2) and setup a hadoop with two datanodes (A
    and B) in a distributed cache, can I specify dataset 1 should load on node A
    and dataset 2 should load on node B? Also I have two tasks (a and b), is it
    possible to perform task a on node A and task b on node B?

    In other words, I want to supply a pattern of operation file (specifying
    store and operation tasks) to hadoop to perform, will it be possible? If it
    is, I would appreciate a link discussing this or if a sample
    code/application doing this.


    Thanks a lot,

    --
    Ahmad
  • Ahmad Ali Iqbal at Dec 21, 2009 at 2:48 am
    Can someone please shed light on this issue.

    Thanks a lot,

    --
    Ahmad



    On Thu, Dec 17, 2009 at 2:39 PM, Ahmad Ali Iqbal
    wrote:
    Hi Mike,

    My understanding is, in hadoop job scheduling is done implicitly as you
    said it spread load as much as possible. However, I want to control task
    assignments to nodes. Let me put in a context of ad-hoc networking
    application scenario where a mobile devices broadcast *Hello* packets
    periodically to keep informing their presence to neighbours and then through
    some algorithm they select a cluster head. Then cluster head assigns
    different tasks to different nodes.

    Take another scenario of a distributed database where every node holds part
    of it (distribution of data should be done explicitly by the user, not by
    the hadoop implicit scheduling) and then apply a query on these nodes
    independently. Finally performing a merging of the results retrieved from
    distributed node databases.

    Is it possible to perform such kind of tasks using hadoop?

    Thanks a lot,

    --
    Ahmad


    On Thu, Dec 17, 2009 at 11:11 AM, Mike Kendall wrote:

    it sounds to me like you might want to split what you want to do up
    into two separate jobs entirely... i don't quite understand your use
    case since the point of hadoop is to spread your load as much (and
    haphazardly!) as possible.

    -mike

    On Wed, Dec 16, 2009 at 3:43 PM, Ahmad Ali Iqbal
    wrote:
    Hi All,

    I am interested to know that can we use hadoop for applications where they
    need more control over the data and it can specify which node will do which
    part of the processing or the storage. For instance, suppose that I have two
    data files (datasets, say 1 and 2) and setup a hadoop with two datanodes (A
    and B) in a distributed cache, can I specify dataset 1 should load on node A
    and dataset 2 should load on node B? Also I have two tasks (a and b), is it
    possible to perform task a on node A and task b on node B?

    In other words, I want to supply a pattern of operation file (specifying
    store and operation tasks) to hadoop to perform, will it be possible? If it
    is, I would appreciate a link discussing this or if a sample
    code/application doing this.


    Thanks a lot,

    --
    Ahmad


    --
    --
    Ahmad Ali Iqbal

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedDec 16, '09 at 11:44p
activeDec 21, '09 at 2:48a
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Ahmad Ali Iqbal: 3 posts Mike Kendall: 1 post

People

Translate

site design / logo © 2022 Grokbase