Grokbase Groups Hive user June 2011
FAQ
Hi Hive Experts
I'm facing an issue while using hive Dynamic Partitions on larger tables. I
tried out Dynamic partitions on smaller tables and it was working fine but
unfortunately when i tried the same on a larger table the map reduce job
terminates throwing an error as

2011-06-16 12:14:28,592 Stage-1 map = 74%, reduce = 0%
[Fatal Error] total number of created files exceeds 100000. Killing the job.
Ended Job = job_201106061630_0536 with errors
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask

I tried setting the parameter hive.max.created.files to a larger value, still
the same error
hive>set hive.max.created.files=500000;
The same error was thrown 'total number of created files exceeds 100000' even
after I changed the value to 500000. I doubt whether the value is set for the
config parameter is not getting affected. Or am I setting the wrong parameter to
solve this issue. Please advise

The other parameters I did set on hive CLI for dynamic partitions are
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions.pernode=300;

The hive QL query I used for dynamic partition is
INSERT OVERWRITE TABLE parameter_part PARTITION(location)
SELECT p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,
p.del_date,p.location FROM parameter_def p;

Please help me out in resolving the same

Thank You.

Regards
Bejoy.K.S

Search Discussions

  • Steven Wong at Jun 18, 2011 at 1:25 am
    The name of the parameter is actually hive.exec.max.created.files. The wiki has a typo, which I'll fix.


    From: Bejoy Ks
    Sent: Thursday, June 16, 2011 9:35 AM
    To: hive user group
    Subject: Issue on using hive Dynamic Partitions on larger tables

    Hi Hive Experts
    I'm facing an issue while using hive Dynamic Partitions on larger tables. I tried out Dynamic partitions on smaller tables and it was working fine but unfortunately when i tried the same on a larger table the map reduce job terminates throwing an error as

    2011-06-16 12:14:28,592 Stage-1 map = 74%, reduce = 0%
    [Fatal Error] total number of created files exceeds 100000. Killing the job.
    Ended Job = job_201106061630_0536 with errors
    FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

    I tried setting the parameter hive.max.created.files to a larger value, still the same error
    hive>set hive.max.created.files=500000;
    The same error was thrown 'total number of created files exceeds 100000' even after I changed the value to 500000. I doubt whether the value is set for the config parameter is not getting affected. Or am I setting the wrong parameter to solve this issue. Please advise

    The other parameters I did set on hive CLI for dynamic partitions are
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.dynamic.partition=true;
    set hive.exec.max.dynamic.partitions.pernode=300;

    The hive QL query I used for dynamic partition is
    INSERT OVERWRITE TABLE parameter_part PARTITION(location)
    SELECT p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,
    p.del_date,p.location FROM parameter_def p;

    Please help me out in resolving the same

    Thank You.

    Regards
    Bejoy.K.S
  • Bejoy Ks at Jun 20, 2011 at 2:57 pm
    Thanks Steven. Now I'm out of that bug, but another one pops when I'm trying for
    Dynamic partitions with larger tables. I have implemenetd the same on smaller
    tables using the same approach mentioned below, but some how it fails for larger
    tables.

    My Larger source Table(parameter_def) contains 5 billion rows which I have
    SQOOPed into hive from a DWH and when I try implementing the dynamic partition
    on the same with the Query
    INSERT OVERWRITE TABLE parameter_part PARTITION(location)
    SELECT p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,
    p.del_date,p.location FROM parameter_def p;
    There are 2 map reduce jobs triggered and the first one now runs to completion
    after setting

    hive.exec.max.created.files=150000;
    But the second job just fails as such without even running. Given below is the
    error log
    From putty console
    2011-06-20 10:40:13,348 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201106061630_0937
    Ended Job = 1659539584, job is filtered out (removed at runtime).
    Launching Job 2 out of 2
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_201106061630_0938, Tracking URL =
    http://********.com:50030/jobdetails.jsp?jobid=job_201106061630_0938
    Kill Command = /usr/lib/hadoop/bin/hadoop job
    -Dmapred.job.tracker=********.com:8021 -kill job_201106061630_0938
    2011-06-20 10:42:51,914 Stage-3 map = 100%, reduce = 100%
    Ended Job = job_201106061630_0938 with errors
    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.MapRedTask

    From hive log file
    2011-06-20 10:41:02,293 WARN mapred.JobClient
    (JobClient.java:copyAndConfigureFiles(649)) - Use GenericOptionsParser for
    parsing the arguments. Applications should implement Tool for the same.
    2011-06-20 10:42:51,917 ERROR exec.MapRedTask
    (SessionState.java:printError(343)) - Ended Job = job_201106061630_0938 with
    errors
    2011-06-20 10:42:51,938 ERROR ql.Driver (SessionState.java:printError(343)) -
    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.MapRedTask


    The hadoop and hive version I'm using are as follows
    Hadoop Version - Hadoop 0.20.2-cdh3u0
    Hive Version - Hive 0.7(lib/hive-hwi-0.7.0-cdh3u0.war)

    Please help me out in figuring what is going wrong with my implementation.

    Thank You

    Regards
    Bejoy.K.S





    ________________________________
    From: Steven Wong <swong@netflix.com>
    To: "user@hive.apache.org" <user@hive.apache.org>
    Sent: Sat, June 18, 2011 6:54:34 AM
    Subject: RE: Issue on using hive Dynamic Partitions on larger tables


    The name of the parameter is actually hive.exec.max.created.files. The wiki has
    a typo, which I’ll fix.


    From:Bejoy Ks
    Sent: Thursday, June 16, 2011 9:35 AM
    To: hive user group
    Subject: Issue on using hive Dynamic Partitions on larger tables

    Hi Hive Experts
    I'm facing an issue while using hive Dynamic Partitions on larger tables. I
    tried out Dynamic partitions on smaller tables and it was working fine but
    unfortunately when i tried the same on a larger table the map reduce job
    terminates throwing an error as

    2011-06-16 12:14:28,592 Stage-1 map = 74%, reduce = 0%
    [Fatal Error] total number of created files exceeds 100000. Killing the job.
    Ended Job = job_201106061630_0536 with errors
    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.MapRedTask

    I tried setting the parameter hive.max.created.files to a larger value, still
    the same error
    hive>set hive.max.created.files=500000;
    The same error was thrown 'total number of created files exceeds 100000' even
    after I changed the value to 500000. I doubt whether the value is set for the
    config parameter is not getting affected. Or am I setting the wrong parameter to
    solve this issue. Please advise

    The other parameters I did set on hive CLI for dynamic partitions are
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.dynamic.partition=true;
    set hive.exec.max.dynamic.partitions.pernode=300;

    The hive QL query I used for dynamic partition is
    INSERT OVERWRITE TABLE parameter_part PARTITION(location)
    SELECT p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,
    p.del_date,p.location FROM parameter_def p;

    Please help me out in resolving the same

    Thank You.

    Regards
    Bejoy.K.S
  • Bejoy Ks at Jun 21, 2011 at 12:27 pm
    Hey Guys
    I was able to resolve the same by groping and distributing records to
    reducers using DISTRIBUTE BY. My modified query would be as folows

    FROM parameter_def p
    INSERT OVERWRITE TABLE parameter_part PARTITION(location)
    SELECT p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,p.del_date,p.location
    DISTRIBUTE BY location;

    With this query the entire job worked like a charm. If there could be any better
    implementations on similar scenarios please do share.

    Thank You

    Regards
    Bejoy.KS


    From: Bejoy Ks <bejoy_ks@yahoo.com>
    To: user@hive.apache.org
    Sent: Mon, June 20, 2011 8:27:16 PM
    Subject: Re: Issue on using hive Dynamic Partitions on larger tables


    Thanks Steven. Now I'm out of that bug, but another one pops when I'm trying for
    Dynamic partitions with larger tables. I have implemenetd the same on smaller
    tables using the same approach mentioned below, but some how it fails for larger
    tables.

    My Larger source Table(parameter_def) contains 5 billion rows which I have
    SQOOPed into hive from a DWH and when I try implementing the dynamic partition
    on the same with the Query
    INSERT OVERWRITE TABLE parameter_part PARTITION(location)
    SELECT p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,
    p.del_date,p.location FROM parameter_def p;
    There are 2 map reduce jobs triggered and the first one now runs to completion
    after setting

    hive.exec.max.created.files=150000;
    But the second job just fails as such without even running. Given below is the
    error log
    From putty console
    2011-06-20 10:40:13,348 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201106061630_0937
    Ended Job = 1659539584, job is filtered out (removed at runtime).
    Launching Job 2 out of 2
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_201106061630_0938, Tracking URL =
    http://********.com:50030/jobdetails.jsp?jobid=job_201106061630_0938
    Kill Command = /usr/lib/hadoop/bin/hadoop job
    -Dmapred.job.tracker=********.com:8021 -kill job_201106061630_0938
    2011-06-20 10:42:51,914 Stage-3 map = 100%, reduce = 100%
    Ended Job = job_201106061630_0938 with errors
    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.MapRedTask

    From hive log file
    2011-06-20 10:41:02,293 WARN mapred.JobClient
    (JobClient.java:copyAndConfigureFiles(649)) - Use GenericOptionsParser for
    parsing the arguments. Applications should implement Tool for the same.
    2011-06-20 10:42:51,917 ERROR exec.MapRedTask
    (SessionState.java:printError(343)) - Ended Job = job_201106061630_0938 with
    errors
    2011-06-20 10:42:51,938 ERROR ql.Driver (SessionState.java:printError(343)) -
    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.MapRedTask


    The hadoop and hive version I'm using are as follows
    Hadoop Version - Hadoop 0.20.2-cdh3u0
    Hive Version - Hive 0.7(lib/hive-hwi-0.7.0-cdh3u0.war)

    Please help me out in figuring what is going wrong with my implementation.

    Thank You

    Regards
    Bejoy.K.S





    ________________________________
    From: Steven Wong <swong@netflix.com>
    To: "user@hive.apache.org" <user@hive.apache.org>
    Sent: Sat, June 18, 2011 6:54:34 AM
    Subject: RE: Issue on using hive Dynamic Partitions on larger tables


    The name of the parameter is actually hive.exec.max.created.files. The wiki has
    a typo, which I’ll fix.


    From:Bejoy Ks
    Sent: Thursday, June 16, 2011 9:35 AM
    To: hive user group
    Subject: Issue on using hive Dynamic Partitions on larger tables

    Hi Hive Experts
    I'm facing an issue while using hive Dynamic Partitions on larger tables. I
    tried out Dynamic partitions on smaller tables and it was working fine but
    unfortunately when i tried the same on a larger table the map reduce job
    terminates throwing an error as

    2011-06-16 12:14:28,592 Stage-1 map = 74%, reduce = 0%
    [Fatal Error] total number of created files exceeds 100000. Killing the job.
    Ended Job = job_201106061630_0536 with errors
    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.MapRedTask

    I tried setting the parameter hive.max.created.files to a larger value, still
    the same error
    hive>set hive.max.created.files=500000;
    The same error was thrown 'total number of created files exceeds 100000' even
    after I changed the value to 500000. I doubt whether the value is set for the
    config parameter is not getting affected. Or am I setting the wrong parameter to
    solve this issue. Please advise

    The other parameters I did set on hive CLI for dynamic partitions are
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.dynamic.partition=true;
    set hive.exec.max.dynamic.partitions.pernode=300;

    The hive QL query I used for dynamic partition is
    INSERT OVERWRITE TABLE parameter_part PARTITION(location)
    SELECT p.seq_id,p.lead_id,p.arr_datetime,p.computed_value,
    p.del_date,p.location FROM parameter_def p;

    Please help me out in resolving the same

    Thank You.

    Regards
    Bejoy.K.S

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedJun 16, '11 at 4:35p
activeJun 21, '11 at 12:27p
posts4
users2
websitehive.apache.org

2 users in discussion

Bejoy Ks: 3 posts Steven Wong: 1 post

People

Translate

site design / logo © 2021 Grokbase