FAQ
Hi,
I have a relatively complicated hive query using CombinedHiveInputFormat:
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.exec.max.dynamic.partitions=1000;
set hive.exec.max.dynamic.partitions.pernode=300;
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select
distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank,
keywords.universal_rank, keywords.serp_type, keywords.date_indexed,
keywords.search_engine_type, keywords.week from keyword_serp_results
keywords JOIN (select domain, keyword, search_engine_type, week,
max_date_indexed, min(rank) as best_rank from (select keywords1.domain,
keywords1.keyword, keywords1.search_engine_type, keywords1.week,
keywords1.rank, dupkeywords1.max_date_indexed from keyword_serp_results
keywords1 JOIN (select domain, keyword, search_engine_type, week,
max(date_indexed) as max_date_indexed from keyword_serp_results group by
domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword =
dupkeywords1.keyword AND keywords1.domain = dupkeywords1.domain AND
keywords1.search_engine_type = dupkeywords1.search_engine_type AND
keywords1.week = dupkeywords1.week AND keywords1.date_indexed =
dupkeywords1.max_date_indexed) dupkeywords2 group by
domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on
keywords.keyword = dupkeywords3.keyword AND keywords.domain =
dupkeywords3.domain AND keywords.search_engine_type =
dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND
keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank =
dupkeywords3.best_rank;

This query use to work fine until I updated to r991183 on trunk and started
getting this error:

java.io.IOException: cannot find dir = hdfs://
ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/000000_0in
partToPartitionInfo: [hdfs://
ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002
,
hdfs://
ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829
,
hdfs://
ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829
,
hdfs://
ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829
,
hdfs://
ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829
,
hdfs://
ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831
]
at
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277)
at
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.(CombineHiveInputFormat.java:312)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610)
at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)

This query works if I don't change the hive.input.format.
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

I've narrowed down this issue to the commit for HIVE-1510. If I take out
the changeset from r987746, everything works as before.

Thanks,
Sammy

Search Discussions

  • Ning Zhang at Sep 2, 2010 at 12:14 am
    This is may be a bug in HIVE-1510. Can you file a JIRA and post the message there?

    On Sep 1, 2010, at 4:33 PM, Sammy Yu wrote:

    Hi,
    I have a relatively complicated hive query using CombinedHiveInputFormat:
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.dynamic.partition=true;
    set hive.exec.max.dynamic.partitions=1000;
    set hive.exec.max.dynamic.partitions.pernode=300;
    set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
    INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank, keywords.universal_rank, keywords.serp_type, keywords.date_indexed, keywords.search_engine_type, keywords.week from keyword_serp_results keywords JOIN (select domain, keyword, search_engine_type, week, max_date_indexed, min(rank) as best_rank from (select keywords1.domain, keywords1.keyword, keywords1.search_engine_type, keywords1.week, keywords1.rank, dupkeywords1.max_date_indexed from keyword_serp_results keywords1 JOIN (select domain, keyword, search_engine_type, week, max(date_indexed) as max_date_indexed from keyword_serp_results group by domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword = dupkeywords1.keyword AND keywords1.domain = dupkeywords1.domain AND keywords1.search_engine_type = dupkeywords1.search_engine_type AND keywords1.week = dupkeywords1.week AND keywords1.date_indexed = dupkeywords1.max_date_indexed) dupkeywords2 group by domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on keywords.keyword = dupkeywords3.keyword AND keywords.domain = dupkeywords3.domain AND keywords.search_engine_type = dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank = dupkeywords3.best_rank;

    This query use to work fine until I updated to r991183 on trunk and started getting this error:

    java.io.IOException: cannot find dir = hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/000000_0 in
    partToPartitionInfo: [hdfs://ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002,
    hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829,
    hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829,
    hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829,
    hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829,
    hdfs://ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831]
    at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277)
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.<init>(CombineHiveInputFormat.java:100)
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610)
    at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)

    This query works if I don't change the hive.input.format.
    set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

    I've narrowed down this issue to the commit for HIVE-1510. If I take out the changeset from r987746, everything works as before.

    Thanks,
    Sammy
  • Sammy Yu at Sep 2, 2010 at 12:40 am
    Hi Ning,
    I've filed it as *HIVE-1610<https://issues.apache.org/jira/browse/HIVE-1610>
    *
    *

    Thanks,
    Sammy
    *
    *
    *
    On Wed, Sep 1, 2010 at 5:16 PM, Ning Zhang wrote:

    This is may be a bug in HIVE-1510. Can you file a JIRA and post the message
    there?


    On Sep 1, 2010, at 4:33 PM, Sammy Yu wrote:

    Hi,
    I have a relatively complicated hive query using
    CombinedHiveInputFormat:
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.dynamic.partition=true;
    set hive.exec.max.dynamic.partitions=1000;
    set hive.exec.max.dynamic.partitions.pernode=300;
    set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
    INSERT OVERWRITE TABLE keyword_serp_results_no_dups PARTITION(week) select
    distinct keywords.keyword, keywords.domain, keywords.url, keywords.rank,
    keywords.universal_rank, keywords.serp_type, keywords.date_indexed,
    keywords.search_engine_type, keywords.week from keyword_serp_results
    keywords JOIN (select domain, keyword, search_engine_type, week,
    max_date_indexed, min(rank) as best_rank from (select keywords1.domain,
    keywords1.keyword, keywords1.search_engine_type, keywords1.week,
    keywords1.rank, dupkeywords1.max_date_indexed from keyword_serp_results
    keywords1 JOIN (select domain, keyword, search_engine_type, week,
    max(date_indexed) as max_date_indexed from keyword_serp_results group by
    domain,keyword,search_engine_type,week) dupkeywords1 on keywords1.keyword =
    dupkeywords1.keyword AND keywords1.domain = dupkeywords1.domain AND
    keywords1.search_engine_type = dupkeywords1.search_engine_type AND
    keywords1.week = dupkeywords1.week AND keywords1.date_indexed =
    dupkeywords1.max_date_indexed) dupkeywords2 group by
    domain,keyword,search_engine_type,week,max_date_indexed ) dupkeywords3 on
    keywords.keyword = dupkeywords3.keyword AND keywords.domain =
    dupkeywords3.domain AND keywords.search_engine_type =
    dupkeywords3.search_engine_type AND keywords.week = dupkeywords3.week AND
    keywords.date_indexed = dupkeywords3.max_date_indexed AND keywords.rank =
    dupkeywords3.best_rank;

    This query use to work fine until I updated to r991183 on trunk and started
    getting this error:

    java.io.IOException: cannot find dir = hdfs://
    ec2-75-101-174-245.compute-1.amazonaws.com/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002/000000_0in
    partToPartitionInfo: [hdfs://
    ec2-75-101-174-245.compute-1.amazonaws.com:8020/tmp/hive-root/hive_2010-09-01_10-57-41_396_1409145025949924904/-mr-10002
    ,
    hdfs://
    ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=417/week=201035/day=20100829
    ,
    hdfs://
    ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=418/week=201035/day=20100829
    ,
    hdfs://
    ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=419/week=201035/day=20100829
    ,
    hdfs://
    ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100829
    ,
    hdfs://
    ec2-75-101-174-245.compute-1.amazonaws.com/user/root/domain_keywords/account=422/week=201035/day=20100831
    ]
    at
    org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:277)
    at
    org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.<init>(CombineHiveInputFormat.java:100)
    at
    org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:312)
    at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
    at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:610)
    at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:120)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:108)

    This query works if I don't change the hive.input.format.
    set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;

    I've narrowed down this issue to the commit for HIVE-1510. If I take out
    the changeset from r987746, everything works as before.

    Thanks,
    Sammy


    --
    Chief Architect, BrightEdge
    email: syu@brightedge.com | mobile: 650.539.4867 | fax: 650.521.9678
    address: 1850 Gateway Dr Suite 400, San Mateo, CA 94404

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 1, '10 at 11:34p
activeSep 2, '10 at 12:40a
posts3
users2
websitehive.apache.org

2 users in discussion

Sammy Yu: 2 posts Ning Zhang: 1 post

People

Translate

site design / logo © 2022 Grokbase