FAQ
Hello,

I have the following table definition (simplified to help in debugging):

create external table pvs (
time INT,
server STRING,
thread_id STRING
)
partitioned by (
dt string
)
row format delimited fields terminated by '\t'
stored as textfile
location 's3://dev-elastic/logz/';

I have another table raw_pvs that I want to import data from into pvs
using the following statement:

INSERT OVERWRITE TABLE pvs PARTITION (dt)
SELECT s.time, s.server, s.thread_id, s.dt
FROM (
FROM raw_pvs SELECT raw_pvs.time, raw_pvs.server,
raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
dt<'2011_01_02' limit 100
) s;

I keep getting the error

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201102111900_0003, Tracking URL =
http://ip-10-204-190-203.ec2.internal:9100/jobdetails.jsp?jobid=job_201102111900_0003
Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job
-Dmapred.job.tracker=ip-10-204-190-203.ec2.internal:9001 -kill
job_201102111900_0003
2011-02-12 01:11:07,649 Stage-1 map = 0%, reduce = 0%
2011-02-12 01:11:09,733 Stage-1 map = 20%, reduce = 0%
2011-02-12 01:11:12,785 Stage-1 map = 100%, reduce = 0%
2011-02-12 01:11:18,868 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201102111900_0003
Loading data to table pvs partition (dt=null)
Failed with exception dt not found in table's partition spec: {dt=null}
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MoveTask

When I run the sub query directly (FROM raw_pvs SELECT raw_pvs.time,
raw_pvs.server, raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
dt<'2011_01_02' limit 100) I get 100 rows with no null in them, so this
doesn't seem like a data issue.

Does anyone know what I'm doing wrong? I've been stuck on this for few days!!

thank you
Khaled

Search Discussions

  • Christopher, Pat at Feb 13, 2011 at 7:31 am
    Check that you have hive.exec.dynamic.partition.mode set to false. That or have a static partition column first in your partitioning clause.

    Pat



    -- Sent from my Palm Pre

    ________________________________
    On Feb 12, 2011 11:09 PM, khassounah@mediumware.net wrote:

    Hello,

    I have the following table definition (simplified to help in debugging):

    create external table pvs (
    time INT,
    server STRING,
    thread_id STRING
    )
    partitioned by (
    dt string
    )
    row format delimited fields terminated by '\t'
    stored as textfile
    location 's3://dev-elastic/logz/';

    I have another table raw_pvs that I want to import data from into pvs
    using the following statement:

    INSERT OVERWRITE TABLE pvs PARTITION (dt)
    SELECT s.time, s.server, s.thread_id, s.dt
    FROM (
    FROM raw_pvs SELECT raw_pvs.time, raw_pvs.server,
    raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
    dt<'2011_01_02' limit 100
    ) s;

    I keep getting the error

    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
    set mapred.reduce.tasks=<number>
    Starting Job = job_201102111900_0003, Tracking URL =
    http://ip-10-204-190-203.ec2.internal:9100/jobdetails.jsp?jobid=job_201102111900_0003
    Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job
    -Dmapred.job.tracker=ip-10-204-190-203.ec2.internal:9001 -kill
    job_201102111900_0003
    2011-02-12 01:11:07,649 Stage-1 map = 0%, reduce = 0%
    2011-02-12 01:11:09,733 Stage-1 map = 20%, reduce = 0%
    2011-02-12 01:11:12,785 Stage-1 map = 100%, reduce = 0%
    2011-02-12 01:11:18,868 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201102111900_0003
    Loading data to table pvs partition (dt=null)
    Failed with exception dt not found in table's partition spec: {dt=null}
    FAILED: Execution Error, return code 1 from
    org.apache.hadoop.hive.ql.exec.MoveTask

    When I run the sub query directly (FROM raw_pvs SELECT raw_pvs.time,
    raw_pvs.server, raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
    dt<'2011_01_02' limit 100) I get 100 rows with no null in them, so this
    doesn't seem like a data issue.

    Does anyone know what I'm doing wrong? I've been stuck on this for few days!!

    thank you
    Khaled
  • Khassounah at Feb 13, 2011 at 7:39 am
    Hi Pat.. I have hive.exec.dynamic.partition.mode=nonstrict and
    hive.exec.dynamic.partition=true. That let hive accept the dynamic
    partitions, but it still fails.

    Khaled
    Check that you have hive.exec.dynamic.partition.mode set to false. That
    or have a static partition column first in your partitioning clause.

    Pat



    -- Sent from my Palm Pre
  • Ning Zhang at Feb 13, 2011 at 4:16 pm
    Khaled, which version of Hive are you running? I tried a similar query in trunk (0.7.0-SNAPSHOT) and it worked.

    The error does't mean the data is wrong (ds=null), it means the compiled query plan doesn't indicate it is a dynamic partition (which is very unlikely for this simple query) or the MoveTask didn't found the location of the intermediate results so that it cannot come up with the list of dynamic partitions generated. I think it is very like to be the latter case in that the external table is stored in s3, and we never tried that in our development. All we assume is that data are in HDFS directories. So you might want to first do a 'explain extended <your_insert_query>' and if the insert are inserted into an s3 location first. If so, it's might be where the bug is. You might want to contact Amazon EMR developers to see if there are any issues in dynamic partitions for data loaded in s3.

    Ning

    On Feb 12, 2011, at 11:38 PM, <khassounah@mediumware.net>
    wrote:
    Hi Pat.. I have hive.exec.dynamic.partition.mode=nonstrict and
    hive.exec.dynamic.partition=true. That let hive accept the dynamic
    partitions, but it still fails.

    Khaled
    Check that you have hive.exec.dynamic.partition.mode set to false. That
    or have a static partition column first in your partitioning clause.

    Pat



    -- Sent from my Palm Pre
  • Ryan Huebsch at Feb 13, 2011 at 4:20 pm
    You are likely encountering a bug w/ Amazon's S3 code:
    https://forums.aws.amazon.com/thread.jspa?threadID=56358&tstart=25

    Try inserting into a non-S3 backed table to see if this is indeed your
    problem.

    Based on the Amazon forums they are expected a fix this week:
    https://forums.aws.amazon.com/thread.jspa?threadID=60149&tstart=0

    Ryan
    On 02/12/2011 11:08 PM, khassounah@mediumware.net wrote:
    Hello,

    I have the following table definition (simplified to help in debugging):

    create external table pvs (
    time INT,
    server STRING,
    thread_id STRING
    )
    partitioned by (
    dt string
    )
    row format delimited fields terminated by '\t'
    stored as textfile
    location 's3://dev-elastic/logz/';

    I have another table raw_pvs that I want to import data from into pvs
    using the following statement:

    INSERT OVERWRITE TABLE pvs PARTITION (dt)
    SELECT s.time, s.server, s.thread_id, s.dt
    FROM (
    FROM raw_pvs SELECT raw_pvs.time, raw_pvs.server,
    raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
    dt<'2011_01_02' limit 100
    ) s;

    I keep getting the error

    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
    set mapred.reduce.tasks=<number>
    Starting Job = job_201102111900_0003, Tracking URL =
    http://ip-10-204-190-203.ec2.internal:9100/jobdetails.jsp?jobid=job_201102111900_0003
    Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job
    -Dmapred.job.tracker=ip-10-204-190-203.ec2.internal:9001 -kill
    job_201102111900_0003
    2011-02-12 01:11:07,649 Stage-1 map = 0%, reduce = 0%
    2011-02-12 01:11:09,733 Stage-1 map = 20%, reduce = 0%
    2011-02-12 01:11:12,785 Stage-1 map = 100%, reduce = 0%
    2011-02-12 01:11:18,868 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201102111900_0003
    Loading data to table pvs partition (dt=null)
    Failed with exception dt not found in table's partition spec: {dt=null}
    FAILED: Execution Error, return code 1 from
    org.apache.hadoop.hive.ql.exec.MoveTask

    When I run the sub query directly (FROM raw_pvs SELECT raw_pvs.time,
    raw_pvs.server, raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
    dt<'2011_01_02' limit 100) I get 100 rows with no null in them, so this
    doesn't seem like a data issue.

    Does anyone know what I'm doing wrong? I've been stuck on this for few days!!

    thank you
    Khaled
  • Khassounah at Feb 13, 2011 at 5:12 pm
    Thanks Ryan... that does seem to be my issue.

    I found the first thread after I sent this email, but not the second
    thread saying it will be fixed next week.

    thanks
    Khaled
    You are likely encountering a bug w/ Amazon's S3 code:
    https://forums.aws.amazon.com/thread.jspa?threadID=56358&tstart=25

    Try inserting into a non-S3 backed table to see if this is indeed your
    problem.

    Based on the Amazon forums they are expected a fix this week:
    https://forums.aws.amazon.com/thread.jspa?threadID=60149&tstart=0

    Ryan
    On 02/12/2011 11:08 PM, khassounah@mediumware.net wrote:
    Hello,

    I have the following table definition (simplified to help in debugging):

    create external table pvs (
    time INT,
    server STRING,
    thread_id STRING
    )
    partitioned by (
    dt string
    )
    row format delimited fields terminated by '\t'
    stored as textfile
    location 's3://dev-elastic/logz/';

    I have another table raw_pvs that I want to import data from into pvs
    using the following statement:

    INSERT OVERWRITE TABLE pvs PARTITION (dt)
    SELECT s.time, s.server, s.thread_id, s.dt
    FROM (
    FROM raw_pvs SELECT raw_pvs.time, raw_pvs.server,
    raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
    dt<'2011_01_02' limit 100
    ) s;

    I keep getting the error

    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
    set mapred.reduce.tasks=<number>
    Starting Job = job_201102111900_0003, Tracking URL =
    http://ip-10-204-190-203.ec2.internal:9100/jobdetails.jsp?jobid=job_201102111900_0003
    Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job
    -Dmapred.job.tracker=ip-10-204-190-203.ec2.internal:9001 -kill
    job_201102111900_0003
    2011-02-12 01:11:07,649 Stage-1 map = 0%, reduce = 0%
    2011-02-12 01:11:09,733 Stage-1 map = 20%, reduce = 0%
    2011-02-12 01:11:12,785 Stage-1 map = 100%, reduce = 0%
    2011-02-12 01:11:18,868 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201102111900_0003
    Loading data to table pvs partition (dt=null)
    Failed with exception dt not found in table's partition spec:
    {dt=null}
    FAILED: Execution Error, return code 1 from
    org.apache.hadoop.hive.ql.exec.MoveTask

    When I run the sub query directly (FROM raw_pvs SELECT raw_pvs.time,
    raw_pvs.server, raw_pvs.thread_id, raw_pvs.dt where dt>'2011_01_00' and
    dt<'2011_01_02' limit 100) I get 100 rows with no null in them, so this
    doesn't seem like a data issue.

    Does anyone know what I'm doing wrong? I've been stuck on this for few
    days!!

    thank you
    Khaled

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedFeb 13, '11 at 7:09a
activeFeb 13, '11 at 5:12p
posts6
users4
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase