FAQ
Hi,

I have written a code to create sequence files for given text files.
The program takes following input parameters:

1. Local source directory - contains all the input text files
2. Destination HDFS URI - location on hdfs where sequence file will be copied

The key for a sequence-record is the file-name.
The value for a sequence-record is the content of the text file.

The program runs fine for large number input text files. But if the size of a single input text file is > 100 MB then it throws following exception:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.String.toCharArray(String.java:2726)
at org.apache.hadoop.io.Text.encode(Text.java:388)
at org.apache.hadoop.io.Text.set(Text.java:178)
at org.apache.hadoop.io.Text.(SequenceFileCreator.java:106)
at SequenceFileCreator.processFile(SequenceFileCreator.java:168)

I am using "org.apache.hadoop.io.SequenceFile.Writer" for creating the sequence file. The Text class is used for keyclass and valclass.

I tried increasing the max memory for the program but it throws same error.

Can you provide your suggestions?

Thanks,
- Bhushan


DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Search Discussions

  • Jason Venner at Oct 27, 2009 at 1:48 pm
    How large is the string that is being written?
    Does it contain the entire contents of your file?
    You may simple need to increase the heap size with your jvm.

    On Tue, Oct 27, 2009 at 3:43 AM, bhushan_mahale wrote:

    Hi,

    I have written a code to create sequence files for given text files.
    The program takes following input parameters:

    1. Local source directory - contains all the input text files
    2. Destination HDFS URI - location on hdfs where sequence file will be
    copied

    The key for a sequence-record is the file-name.
    The value for a sequence-record is the content of the text file.

    The program runs fine for large number input text files. But if the size of
    a single input text file is > 100 MB then it throws following exception:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.lang.String.toCharArray(String.java:2726)
    at org.apache.hadoop.io.Text.encode(Text.java:388)
    at org.apache.hadoop.io.Text.set(Text.java:178)
    at org.apache.hadoop.io.Text.<init>(Text.java:81)
    at SequenceFileCreator.create(SequenceFileCreator.java:106)
    at SequenceFileCreator.processFile(SequenceFileCreator.java:168)

    I am using "org.apache.hadoop.io.SequenceFile.Writer" for creating the
    sequence file. The Text class is used for keyclass and valclass.

    I tried increasing the max memory for the program but it throws same error.

    Can you provide your suggestions?

    Thanks,
    - Bhushan


    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Bhushan_mahale at Oct 27, 2009 at 2:25 pm
    Hi Jason,

    Thanks for the reply.
    The string is the entire content of the input text file.
    It could as long as ~300MB.
    I tried increasing jvm heap but unfortunately it was giving same error.

    Other option I am thinking is to split input files first.

    - Bhushan
    -----Original Message-----
    From: Jason Venner
    Sent: Tuesday, October 27, 2009 7:19 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Problem to create sequence file for

    How large is the string that is being written?
    Does it contain the entire contents of your file?
    You may simple need to increase the heap size with your jvm.

    On Tue, Oct 27, 2009 at 3:43 AM, bhushan_mahale wrote:

    Hi,

    I have written a code to create sequence files for given text files.
    The program takes following input parameters:

    1. Local source directory - contains all the input text files
    2. Destination HDFS URI - location on hdfs where sequence file will be
    copied

    The key for a sequence-record is the file-name.
    The value for a sequence-record is the content of the text file.

    The program runs fine for large number input text files. But if the size of
    a single input text file is > 100 MB then it throws following exception:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.lang.String.toCharArray(String.java:2726)
    at org.apache.hadoop.io.Text.encode(Text.java:388)
    at org.apache.hadoop.io.Text.set(Text.java:178)
    at org.apache.hadoop.io.Text.<init>(Text.java:81)
    at SequenceFileCreator.create(SequenceFileCreator.java:106)
    at SequenceFileCreator.processFile(SequenceFileCreator.java:168)

    I am using "org.apache.hadoop.io.SequenceFile.Writer" for creating the
    sequence file. The Text class is used for keyclass and valclass.

    I tried increasing the max memory for the program but it throws same error.

    Can you provide your suggestions?

    Thanks,
    - Bhushan


    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
  • Jason Venner at Oct 27, 2009 at 2:54 pm
    If your string is up to 300MB you will need probably 1.3+gig to write it
    1 copy in the string 600MB if your file is all ascii (strings store as
    shorts)
    1 copy in the byte array as utf8 1 to x3 expansion, say 600MB
    1 copy in the on the wire format, say 700MB
    possibly
    1 copy in a transit buffer on the way to the remote file system, say 720MB
    that adds up to 1.9g to -> 2.6g


    Hopefully there are not more copies made ;)

    Try setting your heap to 3 or 5 gig with a 64bit jvm.
    On Tue, Oct 27, 2009 at 9:25 AM, bhushan_mahale wrote:

    Hi Jason,

    Thanks for the reply.
    The string is the entire content of the input text file.
    It could as long as ~300MB.
    I tried increasing jvm heap but unfortunately it was giving same error.

    Other option I am thinking is to split input files first.

    - Bhushan
    -----Original Message-----
    From: Jason Venner
    Sent: Tuesday, October 27, 2009 7:19 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Problem to create sequence file for

    How large is the string that is being written?
    Does it contain the entire contents of your file?
    You may simple need to increase the heap size with your jvm.


    On Tue, Oct 27, 2009 at 3:43 AM, bhushan_mahale <
    bhushan_mahale@persistent.co.in> wrote:
    Hi,

    I have written a code to create sequence files for given text files.
    The program takes following input parameters:

    1. Local source directory - contains all the input text files
    2. Destination HDFS URI - location on hdfs where sequence file will be
    copied

    The key for a sequence-record is the file-name.
    The value for a sequence-record is the content of the text file.

    The program runs fine for large number input text files. But if the size of
    a single input text file is > 100 MB then it throws following exception:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.lang.String.toCharArray(String.java:2726)
    at org.apache.hadoop.io.Text.encode(Text.java:388)
    at org.apache.hadoop.io.Text.set(Text.java:178)
    at org.apache.hadoop.io.Text.<init>(Text.java:81)
    at SequenceFileCreator.create(SequenceFileCreator.java:106)
    at SequenceFileCreator.processFile(SequenceFileCreator.java:168)

    I am using "org.apache.hadoop.io.SequenceFile.Writer" for creating the
    sequence file. The Text class is used for keyclass and valclass.

    I tried increasing the max memory for the program but it throws same error.
    Can you provide your suggestions?

    Thanks,
    - Bhushan


    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Jean-Eric CAZAMEA at Oct 27, 2009 at 3:19 pm
    Il y a 2 lettres inversées dans le nom

    Jean-Eric





    ________________________________
    From: Jason Venner <jason.hadoop@gmail.com>
    To: common-user@hadoop.apache.org
    Sent: Tue, October 27, 2009 3:54:00 PM
    Subject: Re: Problem to create sequence file for

    If your string is up to 300MB you will need probably 1.3+gig to write it
    1 copy in the string 600MB if your file is all ascii (strings store as
    shorts)
    1 copy in the byte array as utf8 1 to x3 expansion, say 600MB
    1 copy in the on the wire format, say 700MB
    possibly
    1 copy in a transit buffer on the way to the remote file system, say 720MB
    that adds up to 1.9g to -> 2.6g


    Hopefully there are not more copies made ;)

    Try setting your heap to 3 or 5 gig with a 64bit jvm.
    On Tue, Oct 27, 2009 at 9:25 AM, bhushan_mahale wrote:

    Hi Jason,

    Thanks for the reply.
    The string is the entire content of the input text file.
    It could as long as ~300MB.
    I tried increasing jvm heap but unfortunately it was giving same error.

    Other option I am thinking is to split input files first.

    - Bhushan
    -----Original Message-----
    From: Jason Venner
    Sent: Tuesday, October 27, 2009 7:19 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Problem to create sequence file for

    How large is the string that is being written?
    Does it contain the entire contents of your file?
    You may simple need to increase the heap size with your jvm.


    On Tue, Oct 27, 2009 at 3:43 AM, bhushan_mahale <
    bhushan_mahale@persistent.co.in> wrote:
    Hi,

    I have written a code to create sequence files for given text files.
    The program takes following input parameters:

    1. Local source directory - contains all the input text files
    2. Destination HDFS URI - location on hdfs where sequence file will be
    copied

    The key for a sequence-record is the file-name.
    The value for a sequence-record is the content of the text file.

    The program runs fine for large number input text files. But if the size of
    a single input text file is > 100 MB then it throws following exception:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.lang.String.toCharArray(String.java:2726)
    at org.apache.hadoop.io.Text.encode(Text.java:388)
    at org.apache.hadoop.io.Text.set(Text.java:178)
    at org.apache.hadoop.io.Text.<init>(Text.java:81)
    at SequenceFileCreator.create(SequenceFileCreator.java:106)
    at SequenceFileCreator.processFile(SequenceFileCreator.java:168)

    I am using "org.apache.hadoop.io.SequenceFile.Writer" for creating the
    sequence file. The Text class is used for keyclass and valclass.

    I tried increasing the max memory for the program but it throws same error.
    Can you provide your suggestions?

    Thanks,
    - Bhushan


    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals
  • Amogh Vasekar at Oct 27, 2009 at 3:19 pm
    Hi Bhushan,
    If splitting input files is an option, why don't you let hadoop do this for you? If need be you may use a custom input format and sequencefile*outputformat.

    Amogh


    On 10/27/09 7:55 PM, "bhushan_mahale" wrote:

    Hi Jason,

    Thanks for the reply.
    The string is the entire content of the input text file.
    It could as long as ~300MB.
    I tried increasing jvm heap but unfortunately it was giving same error.

    Other option I am thinking is to split input files first.

    - Bhushan
    -----Original Message-----
    From: Jason Venner
    Sent: Tuesday, October 27, 2009 7:19 PM
    To: common-user@hadoop.apache.org
    Subject: Re: Problem to create sequence file for

    How large is the string that is being written?
    Does it contain the entire contents of your file?
    You may simple need to increase the heap size with your jvm.

    On Tue, Oct 27, 2009 at 3:43 AM, bhushan_mahale wrote:

    Hi,

    I have written a code to create sequence files for given text files.
    The program takes following input parameters:

    1. Local source directory - contains all the input text files
    2. Destination HDFS URI - location on hdfs where sequence file will be
    copied

    The key for a sequence-record is the file-name.
    The value for a sequence-record is the content of the text file.

    The program runs fine for large number input text files. But if the size of
    a single input text file is > 100 MB then it throws following exception:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.lang.String.toCharArray(String.java:2726)
    at org.apache.hadoop.io.Text.encode(Text.java:388)
    at org.apache.hadoop.io.Text.set(Text.java:178)
    at org.apache.hadoop.io.Text.<init>(Text.java:81)
    at SequenceFileCreator.create(SequenceFileCreator.java:106)
    at SequenceFileCreator.processFile(SequenceFileCreator.java:168)

    I am using "org.apache.hadoop.io.SequenceFile.Writer" for creating the
    sequence file. The Text class is used for keyclass and valclass.

    I tried increasing the max memory for the program but it throws same error.

    Can you provide your suggestions?

    Thanks,
    - Bhushan


    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is
    the property of Persistent Systems Ltd. It is intended only for the use of
    the individual or entity to which it is addressed. If you are not the
    intended recipient, you are not authorized to read, retain, copy, print,
    distribute or use this message. If you have received this communication in
    error, please notify the sender and delete all copies of this message.
    Persistent Systems Ltd. does not accept any liability for virus infected
    mails.


    --
    Pro Hadoop, a book to guide you from beginner to hadoop mastery,
    http://www.amazon.com/dp/1430219424?tag=jewlerymall
    www.prohadoopbook.com a community for Hadoop Professionals

    DISCLAIMER
    ==========
    This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 27, '09 at 8:44a
activeOct 27, '09 at 3:19p
posts6
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase