FAQ
Hi, all.

I want to write lots of little files (32GB) to HDFS as
org.apache.hadoop.io.SequenceFile.

But now it is too slow: we use about 8 hours to create this
SequenceFile (6.7GB).

So I wonder how to create this SequenceFile more faster?

Thanks for your suggestion.

-Best Wishes,

-Lin

Search Discussions

  • Harsh J at May 12, 2011 at 5:15 am
    Are you doing this as a MapReduce job or is it a simple linear
    program? MapReduce could be much faster (Combined-files input format,
    with a few Reducers for merging if you need that as well).
    On Thu, May 12, 2011 at 5:18 AM, 丛林 wrote:
    Hi, all.

    I want to write lots of little files (32GB) to HDFS as
    org.apache.hadoop.io.SequenceFile.

    But now it is too slow: we use about 8 hours to create this
    SequenceFile (6.7GB).

    So I wonder how to create this SequenceFile more faster?

    Thanks for your suggestion.

    -Best Wishes,

    -Lin


    --
    Harsh J
  • 丛林 at May 12, 2011 at 11:06 am
    Dear Harsh,

    Will you please explain how to create a sequence file in the way of mapreduce?

    Suppose that all 32G little file stored in one PC.

    Thanks for your suggestion.

    BTW: I notice that you repeated most of the topic of sequence file in
    this mail-list :-)

    Best Wishes,

    -Lin


    2011/5/12 Harsh J <harsh@cloudera.com>:
    Are you doing this as a MapReduce job or is it a simple linear
    program? MapReduce could be much faster (Combined-files input format,
    with a few Reducers for merging if you need that as well).
    On Thu, May 12, 2011 at 5:18 AM, 丛林 wrote:
    Hi, all.

    I want to write lots of little files (32GB) to HDFS as
    org.apache.hadoop.io.SequenceFile.

    But now it is too slow: we use about 8 hours to create this
    SequenceFile (6.7GB).

    So I wonder how to create this SequenceFile more faster?

    Thanks for your suggestion.

    -Best Wishes,

    -Lin


    --
    Harsh J
  • Steve Lewis at May 12, 2011 at 3:56 pm
    Even for a single machine (and there may be reasons to use a single machine
    if the original data is not splittable) Our experience suggests it should
    take about an hour to process 32 GB on a single machine leading me to wonder
    whether writing the Sequence file is your limiting step - Consider very
    simple job which writes 32 GB of random data - say a Long count and a random
    double to a Sequence file and run it on one box (you might also try the same
    steps without the write) and see if you are really being limited by the
    write.
    You might also consider compression while writing the sequence file

    2011/5/12 丛林 <conglin02@gmail.com>
    Dear Harsh,

    Will you please explain how to create a sequence file in the way of
    mapreduce?

    Suppose that all 32G little file stored in one PC.

    Thanks for your suggestion.

    BTW: I notice that you repeated most of the topic of sequence file in
    this mail-list :-)

    Best Wishes,

    -Lin


    2011/5/12 Harsh J <harsh@cloudera.com>:
    Are you doing this as a MapReduce job or is it a simple linear
    program? MapReduce could be much faster (Combined-files input format,
    with a few Reducers for merging if you need that as well).
    On Thu, May 12, 2011 at 5:18 AM, 丛林 wrote:
    Hi, all.

    I want to write lots of little files (32GB) to HDFS as
    org.apache.hadoop.io.SequenceFile.

    But now it is too slow: we use about 8 hours to create this
    SequenceFile (6.7GB).

    So I wonder how to create this SequenceFile more faster?

    Thanks for your suggestion.

    -Best Wishes,

    -Lin


    --
    Harsh J


    --
    Steven M. Lewis PhD
    4221 105th Ave NE
    Kirkland, WA 98033
    206-384-1340 (cell)
    Skype lordjoe_com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupmapreduce-user @
categorieshadoop
postedMay 11, '11 at 11:49p
activeMay 12, '11 at 3:56p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

3 users in discussion

丛林: 2 posts Harsh J: 1 post Steve Lewis: 1 post

People

Translate

site design / logo © 2022 Grokbase