FAQ
Hi,

Can anyone tell me if Hadoop is appropriate for the following application.

I need to perform optimization using a single, small input data set.
To get a good result I must make many independent runs of the
optimizer, where each run is initiated with a different starting
point. At completion, I just choose the best solution from all of the
runs. So my problem is not that I'm working with big data, I just want
to speed up my run time by linking several Ubuntu desktops that are
available to me. The optimizer is written in ANSI C.

Thanks,

John

Search Discussions

  • Miles Osborne at Mar 19, 2009 at 6:55 pm
    yes, this is perfectly fine: make each mapper one of your runs and
    simply emit the final result, along with the conditions leading to
    that result.

    you won't need any reducers.

    Miles

    2009/3/19 John Bergstrom <hillstream@gmail.com>:
    Hi,

    Can anyone tell me if Hadoop is appropriate for the following application.

    I need to perform optimization using a single, small input data set.
    To get a good result I must make many independent runs of the
    optimizer, where each run is initiated with a different starting
    point. At completion, I just choose the best solution from all of the
    runs. So my problem is not that I'm working with big data, I just want
    to speed up my run time by linking several Ubuntu desktops that are
    available to me. The optimizer is written in ANSI C.

    Thanks,

    John


    --
    The University of Edinburgh is a charitable body, registered in
    Scotland, with registration number SC005336.
  • Ted Dunning at Mar 20, 2009 at 4:11 am
    You can use a randomized reduce key to parallelize the comparison of
    different runs. Each reduce key would be in a small range of integers (say
    0..100). Each reducer would then be in charge of keeping only the best
    solution. The final output would be 100 values which could be compared
    conventionally.

    Whether this would help really depends on how many runs you have. If it is
    less than millions, this probably doesn't matter and Miles suggestion is
    fine.
    On Thu, Mar 19, 2009 at 11:54 AM, Miles Osborne wrote:

    you won't need any reducers.



    --
    Ted Dunning, CTO
    DeepDyve
  • Mark Kerzner at Mar 19, 2009 at 6:56 pm
    My feeling is that JavaSpaces could be a good choice. Here is my plan:

    - Have one machine running JavaSpaces (using GigaSpaces free community
    version), put the data in there, with a small object to keep the staring
    point;
    - Each worker machine reads the Space (all workers can read at the same
    time, no lock), and also updates the starting point object (get it from the
    Space, update, put it back) - this is a locking operation, but fact;
    - Results go back into the Space.

    Interesting what you will do in the end.

    Mark
    On Thu, Mar 19, 2009 at 1:48 PM, John Bergstrom wrote:

    Hi,

    Can anyone tell me if Hadoop is appropriate for the following application.

    I need to perform optimization using a single, small input data set.
    To get a good result I must make many independent runs of the
    optimizer, where each run is initiated with a different starting
    point. At completion, I just choose the best solution from all of the
    runs. So my problem is not that I'm working with big data, I just want
    to speed up my run time by linking several Ubuntu desktops that are
    available to me. The optimizer is written in ANSI C.

    Thanks,

    John
  • Tim robertson at Mar 19, 2009 at 7:06 pm
    You might make use of the Hadoop scheduler and task management to
    initiate the jobs, and writing the results back to the hadoop
    filesystem but I would guess there are better ways of doing this than
    using hadoop just for this scheduling (perhaps a simple web service on
    each machine through which you can remotely trigger the processing?).
    I am by no means a Hadoop expert though.

    Cheers,

    Tim



    On Thu, Mar 19, 2009 at 7:48 PM, John Bergstrom wrote:
    Hi,

    Can anyone tell me if Hadoop is appropriate for the following application.

    I need to perform optimization using a single, small input data set.
    To get a good result I must make many independent runs of the
    optimizer, where each run is initiated with a different starting
    point. At completion, I just choose the best solution from all of the
    runs. So my problem is not that I'm working with big data, I just want
    to speed up my run time by linking several Ubuntu desktops that are
    available to me. The optimizer is written in ANSI C.

    Thanks,

    John
  • Stefan Podkowinski at Mar 20, 2009 at 12:17 pm
    Using genetic algorithms may also work for your case.

    See http://jgap.sourceforge.net/
    This one supports grid environment execution too.

    On Thu, Mar 19, 2009 at 7:48 PM, John Bergstrom wrote:
    Hi,

    Can anyone tell me if Hadoop is appropriate for the following application.

    I need to perform optimization using a single, small input data set.
    To get a good result I must make many independent runs of the
    optimizer, where each run is initiated with a different starting
    point. At completion, I just choose the best solution from all of the
    runs. So my problem is not that I'm working with big data, I just want
    to speed up my run time by linking several Ubuntu desktops that are
    available to me. The optimizer is written in ANSI C.

    Thanks,

    John
  • John Bergstrom at Mar 20, 2009 at 11:52 pm
    Thanks to you all. You have been very helpful. I've gotten lots of
    good information.

    Regards,

    John

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 19, '09 at 6:49p
activeMar 20, '09 at 11:52p
posts7
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase