FAQ
Does map-reduce work well with binary contents in the file? This
binary content is basically some CAD files and map reduce program need
to read these files using some proprietry tool extract values and do
some processing. Wondering if there are others doing similar type of
processing. Best practices etc.

Search Discussions

  • Dieter Plaetinck at Sep 1, 2011 at 8:26 am

    On Wed, 31 Aug 2011 08:44:42 -0700 Mohit Anchlia wrote:

    Does map-reduce work well with binary contents in the file? This
    binary content is basically some CAD files and map reduce program need
    to read these files using some proprietry tool extract values and do
    some processing. Wondering if there are others doing similar type of
    processing. Best practices etc.
    yes, it works. you just need to select the right input format.
    Personally i store all my binary files into a sequencefile (because my binary files are small)

    Dieter
  • Mohit Anchlia at Sep 1, 2011 at 3:38 pm
    On Thu, Sep 1, 2011 at 1:25 AM, Dieter Plaetinck wrote:
    On Wed, 31 Aug 2011 08:44:42 -0700
    Mohit Anchlia wrote:
    Does map-reduce work well with binary contents in the file? This
    binary content is basically some CAD files and map reduce program need
    to read these files using some proprietry tool extract values and do
    some processing. Wondering if there are others doing similar type of
    processing. Best practices etc.
    yes, it works.  you just need to select the right input format.
    Personally i store all my binary files into a sequencefile (because my binary files are small)
    Thanks! Is there a specific tutorial I can focus on to see how it could be done?
    Dieter
  • Owen O'Malley at Sep 1, 2011 at 3:46 pm
    On Thu, Sep 1, 2011 at 8:37 AM, Mohit Anchlia wrote:

    Thanks! Is there a specific tutorial I can focus on to see how it could be
    done?
    Take the word count example and change its output format to be
    SequenceFileOutputFormat.

    job.setOutputFormatClass(SequenceFileOutputFormat.class);

    and it will generate SequenceFiles instead of text. There is
    SequenceFileInputFormat for reading.

    -- Owen
  • Praveen Sripati at Sep 2, 2011 at 3:14 pm
    Mohit,

    "Hadoop: The Definitive Guide" (Chapter 3 - Hadoop I/O) has a section on
    SequenceFile and is worth reading.

    http://oreilly.com/catalog/9780596521981

    Thanks,
    Praveen
    On Thu, Sep 1, 2011 at 9:15 PM, Owen O'Malley wrote:

    On Thu, Sep 1, 2011 at 8:37 AM, Mohit Anchlia <mohitanchlia@gmail.com
    wrote:
    Thanks! Is there a specific tutorial I can focus on to see how it could be
    done?
    Take the word count example and change its output format to be
    SequenceFileOutputFormat.

    job.setOutputFormatClass(SequenceFileOutputFormat.class);

    and it will generate SequenceFiles instead of text. There is
    SequenceFileInputFormat for reading.

    -- Owen

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedAug 31, '11 at 3:45p
activeSep 2, '11 at 3:14p
posts5
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase