FAQ
Hi there. I did a search of the mailing list archives looking for
something similar to this, but I didn't find anything so apologies if
this has been discussed before.

I'm investigating using Hadoop for distributed rendering. The Mapper
would define the tiles to be rendered and the nodes would render them
using the scene data (which is for the sake of argument, all wrapped up
in one big binary file on the HDFS). The reducer would take the output
tiles and stitch them together to form the final image. I have a system
that does this already but it doesn't have any of the advantages of a
distributed file system, there are lots of IO bottlenecks and
communication overheads. All the code is currently in C++.

Does this sound like a good use for Hadoop?
-Rob

Search Discussions

  • Ted Dunning at Oct 16, 2007 at 3:52 pm
    This is interesting.

    One of the poster children for Google's map-reduce is rendering for Google
    maps. Each object in the world is keyed according to the tile that it
    affects in the map and then the reduce renders the tile given all of the
    objects that affect it. Very slick. Very fast.

    The question with 3d rendering is whether you can limit effects in this way,
    especially if you are using things like radiosity where illumination on
    objects way across the screen can affect the lighting on other objects.

    It may be that multiple map-reduce passes could be used to do this, but I
    don't know.

    If you are only passing the entire scene to independent tile renderers, then
    you really don't have much to do. Just put your scene description into the
    cluster with a huge replication and run.

    On 10/16/07 8:37 AM, "Robert Jessop" wrote:

    Hi there. I did a search of the mailing list archives looking for
    something similar to this, but I didn't find anything so apologies if
    this has been discussed before.

    I'm investigating using Hadoop for distributed rendering. The Mapper
    would define the tiles to be rendered and the nodes would render them
    using the scene data (which is for the sake of argument, all wrapped up
    in one big binary file on the HDFS). The reducer would take the output
    tiles and stitch them together to form the final image. I have a system
    that does this already but it doesn't have any of the advantages of a
    distributed file system, there are lots of IO bottlenecks and
    communication overheads. All the code is currently in C++.

    Does this sound like a good use for Hadoop?
    -Rob
  • Robert Jessop at Oct 17, 2007 at 8:55 am
    In this particular example, we do two forms of global illumination in
    seperate passes. While I think there are some techniques we could apply
    that would discretise the scene and could allow us to save IO and memory
    overhead per tile, the generic nature of the scenes (we have both indoor
    and outdoor) means that we have both lights with infinite range and
    render with nothing more discrete than a polygon soup. Also, the layout
    of the tiles in most cases is not convenient for discretisation seeing
    as the faces are laid out to optimise space.

    We use two seperate passes already. The tiles are rendered once and the
    results of this are used to compute the second pass. This would
    translate very well into hadoop I think as I have seen examples in the
    mailing list of multi-pass jobs. While I do not think there is much
    scope for optimising the discretisation of the scene, I think I could
    make substantial gains in reducing I/O and job distribution overhead by
    using hadoop.

    My main questions are really to do with the finer points of the C++ vs
    Java implementations. I've not coded Java in years so I'm really
    interested in how easy it is to simply wrap the C++ components for the
    map and reduce phases. I notice there is a C++ API that has methods for
    manipulating files in the DFS, which sounds ideal as the first job I run
    (I think this is equivalent to the Mapper, is that right?) generates the
    scene description and composes the tile rendering commands.

    -Rob

    Ted Dunning wrote:
    This is interesting.

    One of the poster children for Google's map-reduce is rendering for Google
    maps. Each object in the world is keyed according to the tile that it
    affects in the map and then the reduce renders the tile given all of the
    objects that affect it. Very slick. Very fast.

    The question with 3d rendering is whether you can limit effects in this way,
    especially if you are using things like radiosity where illumination on
    objects way across the screen can affect the lighting on other objects.

    It may be that multiple map-reduce passes could be used to do this, but I
    don't know.

    If you are only passing the entire scene to independent tile renderers, then
    you really don't have much to do. Just put your scene description into the
    cluster with a huge replication and run.


    On 10/16/07 8:37 AM, "Robert Jessop" wrote:

    Hi there. I did a search of the mailing list archives looking for
    something similar to this, but I didn't find anything so apologies if
    this has been discussed before.

    I'm investigating using Hadoop for distributed rendering. The Mapper
    would define the tiles to be rendered and the nodes would render them
    using the scene data (which is for the sake of argument, all wrapped up
    in one big binary file on the HDFS). The reducer would take the output
    tiles and stitch them together to form the final image. I have a system
    that does this already but it doesn't have any of the advantages of a
    distributed file system, there are lots of IO bottlenecks and
    communication overheads. All the code is currently in C++.

    Does this sound like a good use for Hadoop?
    -Rob
  • Owen O'Malley at Oct 17, 2007 at 10:48 pm

    On Oct 17, 2007, at 1:54 AM, Robert Jessop wrote:

    My main questions are really to do with the finer points of the C++
    vs Java implementations. I've not coded Java in years so I'm really
    interested in how easy it is to simply wrap the C++ components for
    the map and reduce phases.
    There is a C++ API that is called Pipes.

    http://lucene.apache.org/hadoop/api/org/apache/hadoop/mapred/pipes/
    package-summary.html

    and an example in src/examples/pipes. In particular, the famous word
    count example in C++ looks like:

    #include "hadoop/Pipes.hh"
    #include "hadoop/TemplateFactory.hh"
    #include "hadoop/StringUtils.hh"

    class WordCountMap: public HadoopPipes::Mapper {
    public:
    WordCountMap(HadoopPipes::TaskContext& context){}
    void map(HadoopPipes::MapContext& context) {
    std::vector<std::string> words =
    HadoopUtils::splitString(context.getInputValue(), " ");
    for(unsigned int i=0; i < words.size(); ++i) {
    context.emit(words[i], "1");
    }
    }
    };

    class WordCountReduce: public HadoopPipes::Reducer {
    public:
    WordCountReduce(HadoopPipes::TaskContext& context){}
    void reduce(HadoopPipes::ReduceContext& context) {
    int sum = 0;
    while (context.nextValue()) {
    sum += HadoopUtils::toInt(context.getInputValue());
    }
    context.emit(context.getInputKey(), HadoopUtils::toString(sum));
    }
    };

    int main(int argc, char *argv[]) {
    return HadoopPipes::runTask
    (HadoopPipes::TemplateFactory<WordCountMap,
    WordCountReduce>());
    }
  • John Heidemann at Oct 17, 2007 at 10:01 pm

    On Tue, 16 Oct 2007 16:37:41 BST, Robert Jessop wrote:
    Hi there. I did a search of the mailing list archives looking for
    something similar to this, but I didn't find anything so apologies if
    this has been discussed before.

    I'm investigating using Hadoop for distributed rendering. The Mapper
    would define the tiles to be rendered and the nodes would render them
    using the scene data (which is for the sake of argument, all wrapped
    up in one big binary file on the HDFS). The reducer would take the
    output tiles and stitch them together to form the final image. I have
    a system that does this already but it doesn't have any of the
    advantages of a distributed file system, there are lots of IO
    bottlenecks and communication overheads. All the code is currently in
    C++.

    Does this sound like a good use for Hadoop?
    -Rob
    We did almost exactly this with our whole Internet map
    (http://www.isi.edu/ant/address/whole_internet/).

    By "almost": we rendered the tiles in the reduce phase, but wrote them
    out-of-band (direct to the file system), rather than as direct reduce
    output. Since we were writing pngs, I didn't want to have to
    decapsulate them from the reducer output stream.
    We were using hadoop streaming; perhaps this goes away if we had
    directly written a Java reducer with custom output, but that seems
    unnecessary.

    We then did the stitching together on a single workstation after the fact.

    About 2.8 billion records in through a custom map IO function, with ~43k
    tiles output over ~19 hours on 52 cores. Then maybe 4 hours stiching
    things together on a single box, and ~36 hours printing on a single
    printer.

    (By the way, if anyone knows how to stream huge images at an HP
    Designjet 800PS printer without blowing up memory, please let me know.
    The postscript redbook commands to control paper advance and non-cutting
    seem to be ignored.)

    You don't say what kind of rendering you're doing. If it's movie-style
    3-D rendering, I expect you'd need some work to get benefit from HDFS or
    other file systems---the input needed for rendering (compositing,
    textures, models, etc.) is not an obvious fit for map/reduce, at least
    to me.

    -John Heidemann
  • Robert Jessop at Oct 18, 2007 at 10:24 am
    Ted Dunning and I have discussed the particulars of my 3d rendering on
    this mailing list already, it should be in the archive (I hope). In
    summary, you are correct that I can't make the same efficiency gains as
    your project or google maps on tile based associations due to global
    illumination.

    I will probably be doing something similar to your method of writing out
    the tiles to the filesystem, though I will investigate stitching them in
    a reduce phase. The gains I hope to achieve are in file I/O and job
    system overhead.
    -Rob

    John Heidemann wrote:
    On Tue, 16 Oct 2007 16:37:41 BST, Robert Jessop wrote:

    Hi there. I did a search of the mailing list archives looking for
    something similar to this, but I didn't find anything so apologies if
    this has been discussed before.

    I'm investigating using Hadoop for distributed rendering. The Mapper
    would define the tiles to be rendered and the nodes would render them
    using the scene data (which is for the sake of argument, all wrapped
    up in one big binary file on the HDFS). The reducer would take the
    output tiles and stitch them together to form the final image. I have
    a system that does this already but it doesn't have any of the
    advantages of a distributed file system, there are lots of IO
    bottlenecks and communication overheads. All the code is currently in
    C++.

    Does this sound like a good use for Hadoop?
    -Rob
    We did almost exactly this with our whole Internet map
    (http://www.isi.edu/ant/address/whole_internet/).

    By "almost": we rendered the tiles in the reduce phase, but wrote them
    out-of-band (direct to the file system), rather than as direct reduce
    output. Since we were writing pngs, I didn't want to have to
    decapsulate them from the reducer output stream.
    We were using hadoop streaming; perhaps this goes away if we had
    directly written a Java reducer with custom output, but that seems
    unnecessary.

    We then did the stitching together on a single workstation after the fact.

    About 2.8 billion records in through a custom map IO function, with ~43k
    tiles output over ~19 hours on 52 cores. Then maybe 4 hours stiching
    things together on a single box, and ~36 hours printing on a single
    printer.

    (By the way, if anyone knows how to stream huge images at an HP
    Designjet 800PS printer without blowing up memory, please let me know.
    The postscript redbook commands to control paper advance and non-cutting
    seem to be ignored.)

    You don't say what kind of rendering you're doing. If it's movie-style
    3-D rendering, I expect you'd need some work to get benefit from HDFS or
    other file systems---the input needed for rendering (compositing,
    textures, models, etc.) is not an obvious fit for map/reduce, at least
    to me.

    -John Heidemann

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 16, '07 at 3:39p
activeOct 18, '07 at 10:24a
posts6
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase