Hi,

1 - The map output files are always of the type SequenceFileFormat?

2 - The means that it contains a header with the following files?
# version - A byte array: 3 bytes of magic header 'SEQ', followed by 1
byte of actual version no. (e.g. SEQ4 or SEQ6)
# keyClassName - String
# valueClassName - String
# compression - A boolean which specifies if compression is turned on
for keys/values in this file.
# blockCompression - A boolean which specifies if block compression is
turned on for keys/values in this file.
# compressor class - The classname of the CompressionCodec which is
used to compress/decompress keys and/or values in this SequenceFile
(if compression is enabled).
# metadata - SequenceFile.Metadata for this file (key/value pairs)
# sync - A sync marker to denote end of the header.



Thanks,

--
Pedro

Search Discussions

  • Harsh J at Feb 14, 2011 at 4:45 pm
    Hello,
    On Mon, Feb 14, 2011 at 8:51 PM, Pedro Costa wrote:
    Hi,

    1 - The map output files are always of the type SequenceFileFormat?
    If you mean the Map-intermediate files, then no - they're IFiles.
    Otherwise, if your OutputFormat is set to a SequenceFileOutputFormat,
    then yes these type of files would be created.

    Map-Reduce intermediate files are of the IFile format. It's not part
    of the public API, but you may read its implementation in
    src/java/org/apache/hadoop/mapred/IFile.java.

    SequenceFiles are almost similar, but are built for better K-V file
    operations such as skipping over keys, etc. which is not essentially
    required in case of partitioned-and-sorted-data-containing IFiles.

    --
    Harsh J
    www.harshj.com
  • Pedro Costa at Feb 14, 2011 at 6:08 pm
    And when the data of the map-intermediate files is compressed, it's
    still an IFile?
    On Mon, Feb 14, 2011 at 4:44 PM, Harsh J wrote:
    Hello,
    On Mon, Feb 14, 2011 at 8:51 PM, Pedro Costa wrote:
    Hi,

    1 - The map output files are always of the type SequenceFileFormat?
    If you mean the Map-intermediate files, then no - they're IFiles.
    Otherwise, if your OutputFormat is set to a SequenceFileOutputFormat,
    then yes these type of files would be created.

    Map-Reduce intermediate files are of the IFile format. It's not part
    of the public API, but you may read its implementation in
    src/java/org/apache/hadoop/mapred/IFile.java.

    SequenceFiles are almost similar, but are built for better K-V file
    operations such as skipping over keys, etc. which is not essentially
    required in case of partitioned-and-sorted-data-containing IFiles.

    --
    Harsh J
    www.harshj.com


    --
    Pedro
  • Harsh J at Feb 14, 2011 at 6:17 pm
    Hello,
    On Mon, Feb 14, 2011 at 11:37 PM, Pedro Costa wrote:
    And when the data of the map-intermediate files is compressed, it's
    still an IFile?
    Yes. From my understanding, if compression is turned ON for IFile, the
    output stream for writing the IFile is itself set as a compressing one
    and all data written to the stream is compressed.

    In contrast, in SequenceFiles, compression is done in blocks (of a
    sizes set upon the Writer creation), and keys are left uncompressed.

    --
    Harsh J
    www.harshj.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedFeb 14, '11 at 3:22p
activeFeb 14, '11 at 6:17p
posts4
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Harsh J: 2 posts Pedro Costa: 2 posts

People

Translate

site design / logo © 2022 Grokbase