Grokbase Groups Hive user July 2011
FAQ
Hello,
I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
0.7 (from CDHb4 to CDHu1).

No matter what query I'm running Hive is always using one mapper.
I have tried different queries with various sizes of input and ones with
many reducers or no reducers.

For version 0.5 everything worked correctly.
I'm attaching my hive-site.xml: https://gist.github.com/1111531
I have tested also jobs with Pig, and those jobs use multiple mappers -
so I guess this is a Hive issue.

Thank you for all your help.

--
Wojciech Langiewicz

Search Discussions

  • Edward Capriolo at Jul 28, 2011 at 2:10 pm

    On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz wrote:

    Hello,
    I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
    0.7 (from CDHb4 to CDHu1).

    No matter what query I'm running Hive is always using one mapper.
    I have tried different queries with various sizes of input and ones with
    many reducers or no reducers.

    For version 0.5 everything worked correctly.
    I'm attaching my hive-site.xml: https://gist.github.com/**1111531<https://gist.github.com/1111531>
    I have tested also jobs with Pig, and those jobs use multiple mappers - so
    I guess this is a Hive issue.

    Thank you for all your help.

    --
    Wojciech Langiewicz
    You should also check that your hive-default.xml and other conf/ files is up
    to 0.7.X. Having older versions of that file can lead to problems.

    Edward
  • Aggarwal, Vaibhav at Jul 28, 2011 at 6:29 pm
    If you are using CombineHiveInputFormat it might be the case that all files are being combined into one large split and hence 1 mapper gets created.

    If that is the case you can set the max split size in hive-default.xml config file to create more splits and hence more map tasks:

    <property>
    <name>mapred.max.split.size</name>
    <value> 134217728</value>
    <description>The maximum size chunk that map input should be split
    into. </description>
    </property>
    Thanks
    Vaibhav

    From: Edward Capriolo
    Sent: Thursday, July 28, 2011 7:10 AM
    To: user@hive.apache.org
    Subject: Re: Hive 0.7 using only one mapper


    On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz wrote:
    Hello,
    I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive 0.7 (from CDHb4 to CDHu1).

    No matter what query I'm running Hive is always using one mapper.
    I have tried different queries with various sizes of input and ones with many reducers or no reducers.

    For version 0.5 everything worked correctly.
    I'm attaching my hive-site.xml: https://gist.github.com/1111531
    I have tested also jobs with Pig, and those jobs use multiple mappers - so I guess this is a Hive issue.

    Thank you for all your help.

    --
    Wojciech Langiewicz

    You should also check that your hive-default.xml and other conf/ files is up to 0.7.X. Having older versions of that file can lead to problems.

    Edward
  • Carl Steinbach at Jul 29, 2011 at 3:44 am
    Hi Wojciech,

    Vaibhav is correct. There's a configuration problem in the copy of
    hive-default.xml that ships with CDH3u1 which sets
    hive.input.format=CombineHiveInputFormat, but leaves mapred.max.split.size
    undefined. You can fix this problem by setting mapred.max.split.size in
    hive-default.xml to some reasonable value (it currently defaults
    to 256000000 on trunk).

    Sorry for the inconvenience.

    Carl
    On Thu, Jul 28, 2011 at 11:28 AM, Aggarwal, Vaibhav wrote:

    If you are using CombineHiveInputFormat it might be the case that all files
    are being combined into one large split and hence 1 mapper gets created.**
    **

    ** **

    If that is the case you can set the max split size in hive-default.xml
    config file to create more splits and hence more map tasks:****

    ** **

    <property>****

    <name>mapred.max.split.size</name>****

    <value> 134217728</value>****

    <description>The maximum size chunk that map input should be split****

    into. </description>****

    </property>****

    ****

    Thanks****

    Vaibhav****

    ** **

    *From:* Edward Capriolo
    *Sent:* Thursday, July 28, 2011 7:10 AM
    *To:* user@hive.apache.org
    *Subject:* Re: Hive 0.7 using only one mapper****

    ** **

    ** **

    On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz <
    wlangiewicz@gmail.com> wrote:****

    Hello,
    I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
    0.7 (from CDHb4 to CDHu1).

    No matter what query I'm running Hive is always using one mapper.
    I have tried different queries with various sizes of input and ones with
    many reducers or no reducers.

    For version 0.5 everything worked correctly.
    I'm attaching my hive-site.xml: https://gist.github.com/1111531
    I have tested also jobs with Pig, and those jobs use multiple mappers - so
    I guess this is a Hive issue.

    Thank you for all your help.

    --
    Wojciech Langiewicz****


    You should also check that your hive-default.xml and other conf/ files is
    up to 0.7.X. Having older versions of that file can lead to problems.

    Edward****
  • Wojciech Langiewicz at Jul 29, 2011 at 10:21 am
    Hello,
    Thank you for your answers, this solves the issue.
    I have set mapred.max.split.size to 1024000000 in hive-site.xml and jobs
    are using appropriate number of mappers.

    I have played a little with different configurations and
    CombineHiveInputFormat gives better performance than HiveInputFormat in
    my case.

    Thanks again.
    --
    Wojciech Langiewicz
    On 29.07.2011 05:43, Carl Steinbach wrote:
    Hi Wojciech,

    Vaibhav is correct. There's a configuration problem in the copy of
    hive-default.xml that ships with CDH3u1 which sets
    hive.input.format=CombineHiveInputFormat, but leaves mapred.max.split.size
    undefined. You can fix this problem by setting mapred.max.split.size in
    hive-default.xml to some reasonable value (it currently defaults
    to 256000000 on trunk).

    Sorry for the inconvenience.

    Carl

    On Thu, Jul 28, 2011 at 11:28 AM, Aggarwal, Vaibhavwrote:
    If you are using CombineHiveInputFormat it might be the case that all files
    are being combined into one large split and hence 1 mapper gets created.**
    **

    ** **

    If that is the case you can set the max split size in hive-default.xml
    config file to create more splits and hence more map tasks:****

    ** **

    <property>****

    <name>mapred.max.split.size</name>****

    <value> 134217728</value>****

    <description>The maximum size chunk that map input should be split****

    into.</description>****

    </property>****

    ****

    Thanks****

    Vaibhav****

    ** **

    *From:* Edward Capriolo
    *Sent:* Thursday, July 28, 2011 7:10 AM
    *To:* user@hive.apache.org
    *Subject:* Re: Hive 0.7 using only one mapper****

    ** **

    ** **

    On Thu, Jul 28, 2011 at 9:23 AM, Wojciech Langiewicz<
    wlangiewicz@gmail.com> wrote:****

    Hello,
    I'm having isssue running Hive jobs after updating from Hive 0.5 to Hive
    0.7 (from CDHb4 to CDHu1).

    No matter what query I'm running Hive is always using one mapper.
    I have tried different queries with various sizes of input and ones with
    many reducers or no reducers.

    For version 0.5 everything worked correctly.
    I'm attaching my hive-site.xml: https://gist.github.com/1111531
    I have tested also jobs with Pig, and those jobs use multiple mappers - so
    I guess this is a Hive issue.

    Thank you for all your help.

    --
    Wojciech Langiewicz****


    You should also check that your hive-default.xml and other conf/ files is
    up to 0.7.X. Having older versions of that file can lead to problems.

    Edward****

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 28, '11 at 1:23p
activeJul 29, '11 at 10:21a
posts5
users4
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase