FAQ
Hi,

I have 2 questions:

1) Is a SequenceFile more efficient than TextFiles for input? ... I think TextFiles will be processed by TextInputFormat into sequenceFiles inside hadoop. So will SequenceFiles (ie.binary input Files) be more efficient ?

2) If I decided to use SequenceFiles as InputFormat, Do I need to stick to the header protocol defined in http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html ?


Thanks everyone,

Maha

Search Discussions

  • Harsh J at Mar 5, 2011 at 3:55 am
    Hi,
    On Sat, Mar 5, 2011 at 9:03 AM, maha wrote:
    Hi,

    I have 2 questions:

    1) Is a  SequenceFile more efficient than TextFiles for input?  ... I think TextFiles will be processed by TextInputFormat into sequenceFiles inside hadoop. So will SequenceFiles (ie.binary input Files) be more efficient ?
    Depends on what your scenario is.
    2) If I decided to use SequenceFiles as InputFormat, Do I need to stick to the header protocol defined in http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html ?
    No. You would use SequenceFileInputFormat and SequenceFileOutputFormat classes.

    May I suggest reading a good Hadoop book that covers the little,
    scattered stuff like this, neatly? I like Tom White's Hadoop: The
    Definitive Guide :)

    --
    Harsh J
    www.harshj.com
  • Maha at Mar 5, 2011 at 5:27 am
    Thanks again Harsh, I actually got the book 2 days ago, but didn't have time to read it yet.
    Maha
    On Mar 4, 2011, at 7:54 PM, Harsh J wrote:

    Hi,
    On Sat, Mar 5, 2011 at 9:03 AM, maha wrote:
    Hi,

    I have 2 questions:

    1) Is a SequenceFile more efficient than TextFiles for input? ... I think TextFiles will be processed by TextInputFormat into sequenceFiles inside hadoop. So will SequenceFiles (ie.binary input Files) be more efficient ?
    Depends on what your scenario is.
    2) If I decided to use SequenceFiles as InputFormat, Do I need to stick to the header protocol defined in http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html ?
    No. You would use SequenceFileInputFormat and SequenceFileOutputFormat classes.

    May I suggest reading a good Hadoop book that covers the little,
    scattered stuff like this, neatly? I like Tom White's Hadoop: The
    Definitive Guide :)

    --
    Harsh J
    www.harshj.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedMar 5, '11 at 3:34a
activeMar 5, '11 at 5:27a
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Maha: 2 posts Harsh J: 1 post

People

Translate

site design / logo © 2022 Grokbase