Grokbase Groups Hive user August 2010
FAQ
When I run below sql: INSERT OVERWRITE TABLE tablename1
select_statement1 FROM from_statement, there are many files which size
is zero are stored to hadoop,

How can I merge these small files?

Thanks,


LiuLei

Search Discussions

  • Namit Jain at Aug 6, 2010 at 3:27 pm
    HIVEMERGEMAPFILES("hive.merge.mapfiles", true),
    HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false),


    Set the above parameters to true before your query.



    ________________________________________
    From: lei liu [liulei412@gmail.com]
    Sent: Thursday, August 05, 2010 8:47 PM
    To: hive-user@hadoop.apache.org
    Subject: How to merge small files

    When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement, there are many files which size is zero are stored to hadoop,

    How can I merge these small files?

    Thanks,



    LiuLei
  • Lei liu at Aug 9, 2010 at 2:18 am
    Thank you for your reply.

    Your mean is I will execute below statement:

    statement.execute("set hive.merge.mapfiles=true");
    statement.execute("set hive.merge.mapredfiles=true");

    The two parementers are both true, right?

    2010/8/6 Namit Jain <njain@facebook.com>
    HIVEMERGEMAPFILES("hive.merge.mapfiles", true),
    HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false),


    Set the above parameters to true before your query.



    ________________________________________
    From: lei liu [liulei412@gmail.com]
    Sent: Thursday, August 05, 2010 8:47 PM
    To: hive-user@hadoop.apache.org
    Subject: How to merge small files

    When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1
    FROM from_statement, there are many files which size is zero are stored to
    hadoop,

    How can I merge these small files?

    Thanks,



    LiuLei
  • Namit Jain at Aug 9, 2010 at 3:24 pm
    That's right

    ________________________________________
    From: lei liu [liulei412@gmail.com]
    Sent: Sunday, August 08, 2010 7:18 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: How to merge small files

    Thank you for your reply.

    Your mean is I will execute below statement:

    statement.execute("set hive.merge.mapfiles=true");
    statement.execute("set hive.merge.mapredfiles=true");

    The two parementers are both true, right?

    2010/8/6 Namit Jain <njain@facebook.com
    HIVEMERGEMAPFILES("hive.merge.mapfiles", true),
    HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false),


    Set the above parameters to true before your query.



    ________________________________________
    From: lei liu [liulei412@gmail.com ]
    Sent: Thursday, August 05, 2010 8:47 PM
    To: hive-user@hadoop.apache.org
    Subject: How to merge small files

    When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement, there are many files which size is zero are stored to hadoop,

    How can I merge these small files?

    Thanks,



    LiuLei
  • Lei liu at Aug 9, 2010 at 3:57 pm
    Could you tell me whether the query is slower if I two parameters both are
    true?

    2010/8/9 Namit Jain <njain@facebook.com>
    That's right

    ________________________________________
    From: lei liu [liulei412@gmail.com]
    Sent: Sunday, August 08, 2010 7:18 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: How to merge small files

    Thank you for your reply.

    Your mean is I will execute below statement:

    statement.execute("set hive.merge.mapfiles=true");
    statement.execute("set hive.merge.mapredfiles=true");

    The two parementers are both true, right?

    2010/8/6 Namit Jain <njain@facebook.com HIVEMERGEMAPFILES("hive.merge.mapfiles", true),
    HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false),


    Set the above parameters to true before your query.



    ________________________________________
    From: lei liu [liulei412@gmail.com Sent: Thursday, August 05, 2010 8:47 PM
    To: hive-user@hadoop.apache.org Subject: How to merge small files

    When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1
    FROM from_statement, there are many files which size is zero are stored to
    hadoop,

    How can I merge these small files?

    Thanks,



    LiuLei

  • Namit Jain at Aug 9, 2010 at 4:32 pm
    Yes, it will try to run another map-reduce job to merge the files
    ________________________________________
    From: lei liu [liulei412@gmail.com]
    Sent: Monday, August 09, 2010 8:57 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: How to merge small files

    Could you tell me whether the query is slower if I two parameters both are true?

    2010/8/9 Namit Jain <njain@facebook.com
    That's right

    ________________________________________
    From: lei liu [liulei412@gmail.com ]
    Sent: Sunday, August 08, 2010 7:18 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: How to merge small files

    Thank you for your reply.

    Your mean is I will execute below statement:

    statement.execute("set hive.merge.mapfiles=true");
    statement.execute("set hive.merge.mapredfiles=true");

    The two parementers are both true, right?

    2010/8/6 Namit Jain <njain@facebook.com <mailto:njain@facebook.com
    HIVEMERGEMAPFILES("hive.merge.mapfiles", true),
    HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false),


    Set the above parameters to true before your query.



    ________________________________________
    From: lei liu [liulei412@gmail.com <mailto:liulei412@gmail.com ]
    Sent: Thursday, August 05, 2010 8:47 PM
    To: hive-user@hadoop.apache.org <mailto:hive-user@hadoop.apache.org
    Subject: How to merge small files

    When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement, there are many files which size is zero are stored to hadoop,

    How can I merge these small files?

    Thanks,



    LiuLei
  • Bakshi, Ankita at Aug 10, 2010 at 1:08 am
    Hi,

    Sorry to hijack this thread. But I am curious if there any other in-built option to merge files in the directory before loading data into the table.

    I have a directory in the local file system which contains many small files. I want to load it to a single hive table. I am wondering what would be the best approach to this problem.

    Thanks,
    Ankita


    -----Original Message-----
    From: Namit Jain
    Sent: Monday, August 09, 2010 9:32 AM
    To: hive-user@hadoop.apache.org
    Subject: RE: How to merge small files

    Yes, it will try to run another map-reduce job to merge the files
    ________________________________________
    From: lei liu [liulei412@gmail.com]
    Sent: Monday, August 09, 2010 8:57 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: How to merge small files

    Could you tell me whether the query is slower if I two parameters both are true?

    2010/8/9 Namit Jain <njain@facebook.com>
    That's right

    ________________________________________
    From: lei liu [liulei412@gmail.com ]
    Sent: Sunday, August 08, 2010 7:18 PM
    To: hive-user@hadoop.apache.org
    Subject: Re: How to merge small files

    Thank you for your reply.

    Your mean is I will execute below statement:

    statement.execute("set hive.merge.mapfiles=true");
    statement.execute("set hive.merge.mapredfiles=true");

    The two parementers are both true, right?

    2010/8/6 Namit Jain <njain@facebook.com>
    HIVEMERGEMAPFILES("hive.merge.mapfiles", true),
    HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false),


    Set the above parameters to true before your query.



    ________________________________________
    From: lei liu [liulei412@gmail.com ]
    Sent: Thursday, August 05, 2010 8:47 PM
    To: hive-user@hadoop.apache.org
    Subject: How to merge small files

    When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement, there are many files which size is zero are stored to hadoop,

    How can I merge these small files?

    Thanks,



    LiuLei



    The information contained in this email message and its attachments is intended only for the private and confidential use of the recipient(s) named above, unless the sender expressly agrees otherwise. Transmission of email over the Internet is not a secure communications medium. If you are requesting or have requested the transmittal of personal data, as defined in applicable privacy laws by means of email or in an attachment to email, you must select a more secure alternate means of transmittal that supports your obligations to protect such personal data. If the reader of this message is not the intended recipient and/or you have received this email in error, you must take no action based on the information in this email and you are hereby notified that any dissemination, misuse or copying or disclosure of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email and delete the original message.
  • Todd Lee at Aug 10, 2010 at 1:59 am
    as long as the files are inside the same directory, hive will treat them as a table.


    Todd
    On Aug 9, 2010, at 6:07 PM, "Bakshi, Ankita" wrote:


    Hi,

    Sorry to hijack this thread. But I am curious if there any other in-built option to merge files in the directory before loading data into the table.

    I have a directory in the local file system which contains many small files. I want to load it to a single hive table. I am wondering what would be the best approach to this problem.

    Thanks,
    Ankita


    -----Original Message-----
    From: Namit Jain
    Sent: Monday, August 09, 2010 9:32 AM
    To: hive-user@hadoop.apache.org
    Subject: RE: How to merge small files

    Yes, it will try to run another map-reduce job to merge the files
    ________________________________________
    From: lei liu [liulei412@gmail.com]
    Sent: Monday, August 09, 2010 8:57 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: How to merge small files

    Could you tell me whether the query is slower if I two parameters both are true?

    2010/8/9 Namit Jain <njain@facebook.com> That's right

    ________________________________________
    From: lei liu [liulei412@gmail.com Sent: Sunday, August 08, 2010 7:18 PM
    To: hive-user@hadoop.apache.org Subject: Re: How to merge small files

    Thank you for your reply.

    Your mean is I will execute below statement:

    statement.execute("set hive.merge.mapfiles=true");
    statement.execute("set hive.merge.mapredfiles=true");

    The two parementers are both true, right?

    2010/8/6 Namit Jain <njain@facebook.com> HIVEMERGEMAPFILES("hive.merge.mapfiles", true),
    HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false),


    Set the above parameters to true before your query.



    ________________________________________
    From: lei liu [liulei412@gmail.com Sent: Thursday, August 05, 2010 8:47 PM
    To: hive-user@hadoop.apache.org Subject: How to merge small files

    When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement, there are many files which size is zero are stored to hadoop,

    How can I merge these small files?

    Thanks,



    LiuLei



    The information contained in this email message and its attachments is intended only for the private and confidential use of the recipient(s) named above, unless the sender expressly agrees otherwise. Transmission of email over the Internet is not a secure communications medium. If you are requesting or have requested the transmittal of personal data, as defined in applicable privacy laws by means of email or in an attachment to email, you must select a more secure alternate means of transmittal that supports your obligations to protect such personal data. If the reader of this message is not the intended recipient and/or you have received this email in error, you must take no action based on the information in this email and you are hereby notified that any dissemination, misuse or copying or disclosure of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email and delete the original message.
  • Edward Capriolo at Aug 10, 2010 at 3:07 am
    Lei,

    Are you still using hive 4.1 or have you upgraded, the merge options
    mentioned above were probable not present until 5.0

    Edward
    On Mon, Aug 9, 2010 at 9:59 PM, Todd Lee wrote:
    as long as the files are inside the same directory, hive will treat them as a table.


    Todd
    On Aug 9, 2010, at 6:07 PM, "Bakshi, Ankita" wrote:


    Hi,

    Sorry to hijack this thread. But I am curious if there any other in-built option to merge files in the directory before loading data into the table.

    I have a directory in the local file system which contains many small files. I want to load it to a single hive table. I am wondering what would be the best approach to this problem.

    Thanks,
    Ankita


    -----Original Message-----
    From: Namit Jain
    Sent: Monday, August 09, 2010 9:32 AM
    To: hive-user@hadoop.apache.org
    Subject: RE: How to merge small files

    Yes, it will try to run another map-reduce job to merge the files
    ________________________________________
    From: lei liu [liulei412@gmail.com]
    Sent: Monday, August 09, 2010 8:57 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: How to merge small files

    Could you tell me whether the query is slower if I two parameters both are true?

    2010/8/9 Namit Jain <njain@facebook.com> > That's right

    ________________________________________
    From: lei liu [liulei412@gmail.com > Sent: Sunday, August 08, 2010 7:18 PM
    To: hive-user@hadoop.apache.org > Subject: Re: How to merge small files

    Thank you for your reply.

    Your mean is I will execute below statement:

    statement.execute("set hive.merge.mapfiles=true");
    statement.execute("set hive.merge.mapredfiles=true");

    The two parementers are both true, right?

    2010/8/6 Namit Jain <njain@facebook.com> >  HIVEMERGEMAPFILES("hive.merge.mapfiles", true),
    HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false),


    Set the above parameters to true before your query.



    ________________________________________
    From: lei liu [liulei412@gmail.com > Sent: Thursday, August 05, 2010 8:47 PM
    To: hive-user@hadoop.apache.org > Subject: How to merge small files

    When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1 FROM from_statement, there are many files which size is zero are stored to hadoop,

    How can I merge these small files?

    Thanks,



    LiuLei



    The information contained in this email message and its attachments is intended only for the private and confidential use of the recipient(s) named above, unless the sender expressly agrees otherwise. Transmission of email over the Internet is not a secure communications medium. If you are requesting or have requested the transmittal of personal data, as defined in applicable privacy laws by means of email or in an attachment to email, you must select a more secure alternate means of transmittal that supports your obligations to protect such personal data. If the reader of this message is not the intended recipient and/or you have received this email in error, you must take no action based on the information in this email and you are hereby notified that any dissemination, misuse or copying or disclosure of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email and delete the original message.
  • Lei liu at Aug 10, 2010 at 12:14 pm
    Thank you for your reply.

    Could you tell me why it is slower if the two paremeters are true and how
    slow it is?

    2010/8/10 Namit Jain <njain@facebook.com>
    Yes, it will try to run another map-reduce job to merge the files
    ________________________________________
    From: lei liu [liulei412@gmail.com]
    Sent: Monday, August 09, 2010 8:57 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: How to merge small files

    Could you tell me whether the query is slower if I two parameters both are
    true?

    2010/8/9 Namit Jain <njain@facebook.com That's right

    ________________________________________
    From: lei liu [liulei412@gmail.com Sent: Sunday, August 08, 2010 7:18 PM
    To: hive-user@hadoop.apache.org Subject: Re: How to merge small files

    Thank you for your reply.

    Your mean is I will execute below statement:

    statement.execute("set hive.merge.mapfiles=true");
    statement.execute("set hive.merge.mapredfiles=true");

    The two parementers are both true, right?

    2010/8/6 Namit Jain <njain@facebook.com njain@facebook.com HIVEMERGEMAPFILES("hive.merge.mapfiles", true),
    HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false),


    Set the above parameters to true before your query.



    ________________________________________
    From: lei liu [liulei412@gmail.com liulei412@gmail.com Sent: Thursday, August 05, 2010 8:47 PM
    To: hive-user@hadoop.apache.org ><mailto:hive-user@hadoop.apache.org Subject: How to merge small files

    When I run below sql: INSERT OVERWRITE TABLE tablename1 select_statement1
    FROM from_statement, there are many files which size is zero are stored to
    hadoop,

    How can I merge these small files?

    Thanks,



    LiuLei


  • Edward Capriolo at Aug 10, 2010 at 2:53 pm

    On Tue, Aug 10, 2010 at 8:13 AM, lei liu wrote:
    Thank you for your reply.

    Could you tell me why it is slower if the two paremeters are true and how
    slow it is?

    2010/8/10 Namit Jain <njain@facebook.com>
    Yes, it will try to run another map-reduce job to merge the files
    ________________________________________
    From: lei liu [liulei412@gmail.com]
    Sent: Monday, August 09, 2010 8:57 AM
    To: hive-user@hadoop.apache.org
    Subject: Re: How to merge small files

    Could you tell me whether the query is slower if I two parameters both are
    true?

    2010/8/9 Namit Jain <njain@facebook.com > That's right

    ________________________________________
    From: lei liu [liulei412@gmail.com > Sent: Sunday, August 08, 2010 7:18 PM
    To: hive-user@hadoop.apache.org > Subject: Re: How to merge small files

    Thank you for your reply.

    Your mean is I will execute below statement:

    statement.execute("set hive.merge.mapfiles=true");
    statement.execute("set hive.merge.mapredfiles=true");

    The two parementers are both true, right?

    2010/8/6 Namit Jain
    <njain@facebook.com <mailto:njain@facebook.com >  HIVEMERGEMAPFILES("hive.merge.mapfiles", true),
    HIVEMERGEMAPREDFILES("hive.merge.mapredfiles", false),


    Set the above parameters to true before your query.



    ________________________________________
    From: lei liu
    [liulei412@gmail.com <mailto:liulei412@gmail.com > Sent: Thursday, August 05, 2010 8:47 PM
    To:
    hive-user@hadoop.apache.org <mailto:hive-user@hadoop.apache.org > Subject: How to merge small files

    When I run below sql:  INSERT OVERWRITE TABLE tablename1 select_statement1
    FROM from_statement, there are many files which size is zero are stored to
    hadoop,

    How can I merge these small files?

    Thanks,



    LiuLei

    How slow it is is relevant to how much data you have. We can not
    answer questions like that, try it both ways and find out for
    yourself.

    Edward

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedAug 6, '10 at 3:48a
activeAug 10, '10 at 2:53p
posts11
users5
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase