Grokbase Groups Hive user March 2011
FAQ
Hello,

I have a hive query which does a simple select and writes the results to a local

file system.


For example, a query like this,

INSERT OVERWRITE LOCAL DIRECTORY
'/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
Select host, identity, user, time, request
from raw_apachelog
where ds = '2011-03-22-001500';

Now this creates a two files under apachetest folder. This table has only 32
rows. Is there any way I can make Hive to create only single file ?


Appreciate your help :)

Thanks,
Senthil

Search Discussions

  • Jov at Mar 30, 2011 at 5:23 am
    try add limit:

    INSERT OVERWRITE LOCAL DIRECTORY
    '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
    Select host, identity, user, time, request
    from raw_apachelog
    where ds = '2011-03-22-001500' limit 32;


    2011/3/30 V.Senthil Kumar <vaisen2000@yahoo.com>:
    Hello,

    I have a hive query which does a simple select and writes the results to a local

    file system.


    For example, a query like this,

    INSERT OVERWRITE LOCAL DIRECTORY
    '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
    Select host, identity, user, time, request
    from raw_apachelog
    where ds = '2011-03-22-001500';

    Now this creates a two files under apachetest folder. This table has only 32
    rows. Is there any way I can make Hive to create only single file ?


    Appreciate your help :)

    Thanks,
    Senthil
  • V.Senthil Kumar at Mar 30, 2011 at 7:31 pm
    Thanks for the suggestion. The query created just one result file.

    Also, before trying this query, I have found out another way of making this
    work. I have added the following properties in hive-site.xml and it worked as
    well. It created just one result file.


    <property>
    <name>hive.merge.mapredfiles</name>
    <value>true</value>
    <description>Merge small files at the end of a map-reduce job</description>
    </property>

    <property>
    <name>hive.input.format</name>
    <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
    <description>The default input format, if it is not specified, the system
    assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
    whereas it is set to CombineHiveInputFormat for hadoop 20. The user can always
    overwrite it - if there is a bug in CombineHiveInputFormat, it can always be
    manually set to HiveInputFormat. </description>
    </property>



    ----- Original Message ----
    From: Jov <zhao6014@gmail.com>
    To: user@hive.apache.org
    Sent: Tue, March 29, 2011 10:22:32 PM
    Subject: Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files

    try add limit:

    INSERT OVERWRITE LOCAL DIRECTORY
    '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
    Select host, identity, user, time, request
    from raw_apachelog
    where ds = '2011-03-22-001500' limit 32;


    2011/3/30 V.Senthil Kumar <vaisen2000@yahoo.com>:
    Hello,

    I have a hive query which does a simple select and writes the results to a
    local

    file system.


    For example, a query like this,

    INSERT OVERWRITE LOCAL DIRECTORY
    '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
    Select host, identity, user, time, request
    from raw_apachelog
    where ds = '2011-03-22-001500';

    Now this creates a two files under apachetest folder. This table has only 32
    rows. Is there any way I can make Hive to create only single file ?


    Appreciate your help :)

    Thanks,
    Senthil
  • Edward Capriolo at Mar 30, 2011 at 8:19 pm

    On Wed, Mar 30, 2011 at 3:31 PM, V.Senthil Kumar wrote:
    Thanks for the suggestion. The query created just one result file.

    Also, before trying this query, I have found out another way of making this
    work. I have added the following properties in hive-site.xml and it worked as
    well. It created just one result file.


    <property>
    <name>hive.merge.mapredfiles</name>
    <value>true</value>
    <description>Merge small files at the end of a map-reduce job</description>
    </property>

    <property>
    <name>hive.input.format</name>
    <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
    <description>The default input format, if it is not specified, the system
    assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
    whereas it is set to CombineHiveInputFormat for hadoop 20. The user can always
    overwrite it - if there is a bug in CombineHiveInputFormat, it can always be
    manually set to HiveInputFormat. </description>
    </property>



    ----- Original Message ----
    From: Jov <zhao6014@gmail.com>
    To: user@hive.apache.org
    Sent: Tue, March 29, 2011 10:22:32 PM
    Subject: Re: INSERT OVERWRITE LOCAL DIRECTORY -- Why it creates multiple files

    try add limit:

    INSERT OVERWRITE LOCAL DIRECTORY
    '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
    Select host, identity, user, time, request
    from raw_apachelog
    where ds = '2011-03-22-001500' limit 32;


    2011/3/30 V.Senthil Kumar <vaisen2000@yahoo.com>:
    Hello,

    I have a hive query which does a simple select and writes the results to a
    local

    file system.


    For example, a query like this,

    INSERT OVERWRITE LOCAL DIRECTORY
    '/home/hdp-user/hiveadmin_dirs/outbox/apachetest'
    Select host, identity, user, time, request
    from raw_apachelog
    where ds = '2011-03-22-001500';

    Now this creates a two files under apachetest folder. This table has only 32
    rows. Is there any way I can make Hive to create only single file ?


    Appreciate your help :)

    Thanks,
    Senthil
    The number of files is a result of the number of reducers used in the
    job. Adding a limit adds a single reducer phase to the job end. You
    should be able to accomplish the same thing with 'set
    mapred.reduce.tasks=1'

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedMar 30, '11 at 3:53a
activeMar 30, '11 at 8:19p
posts4
users3
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase