FAQ
Hello,

I'm relative new to hive & hadoop and I have written a custom
InputFormat to be able to read our logfiles. I think I got everything
right but when I try to execute a query on a Amazon EMR cluster it fails
with some error messages that don't tell me what exactly is wrong.

So this is the query I execute:

add jar s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar;
add jar s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar;

DROP TABLE event_log;

CREATE EXTERNAL TABLE IF NOT EXISTS event_log (
EVENT_SUBTYPE STRING,
EVENT_TYPE STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
STORED AS
INPUTFORMAT 'com.adconion.hadoop.hive.DataLogInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://amg-events/2010/07/01/01';

SELECT event_type FROM event_log WHERE event_type = 'pp' LIMIT 10;

Which results in the following output:

hadoop@domU-12-31-39-0F-45-B3:~$ hive -f test.ql
Hive history
file=/mnt/var/lib/hive/tmp/history/hive_job_log_hadoop_201009011303_427866099.txt
Testing s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar
converting to local s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar
Added
/mnt/var/lib/hive/downloaded_resources/s3_amg.hadoop_hiveLib_hive-json-serde-0.1.jar
to class path
Testing s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar
converting to local s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar
Added
/mnt/var/lib/hive/downloaded_resources/s3_amg.hadoop_hiveLib_hadoop-jar-with-dependencies.jar
to class path
OK
Time taken: 2.426 seconds
Found class for org.apache.hadoop.hive.contrib.serde2.JsonSerde
OK
Time taken: 0.332 seconds
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201008311250_0006, Tracking URL =
http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/jobdetails.jsp?jobid=job_201008311250_0006
Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job
-Dmapred.job.tracker=domU-12-31-39-0F-45-B3.compute-1.internal:9001
-kill job_201008311250_0006
2010-09-01 13:04:04,376 Stage-1 map = 0%, reduce = 0%
2010-09-01 13:04:34,681 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201008311250_0006 with errors

Failed tasks with most(4) failures :
Task URL:
http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/taskdetails.jsp?jobid=job_201008311250_0006&tipid=task_201008311250_0006_m_000013

FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver

Only errors I can find under /mnt/var/log/apps/hive.log are multiple
like that one:

2010-09-01 13:03:36,586 DEBUG org.apache.hadoop.conf.Configuration
(Configuration.java:<init>(216)) - java.io.IOException: config()
at
org.apache.hadoop.conf.Configuration.(Configuration.java:203)
at org.apache.hadoop.hive.conf.HiveConf.(CliDriver.java:232)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

And those errors:

2010-09-01 13:03:40,228 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.resources" but it cannot be resolved.
2010-09-01 13:03:40,228 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.resources" but it cannot be resolved.
2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.runtime" but it cannot be resolved.
2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.core.runtime" but it cannot be resolved.
2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.
2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
(Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
"org.eclipse.text" but it cannot be resolved.

Does anyone have an Idea what wents wrong?

Thanks!

Robert

Search Discussions

  • Shrijeet Paliwal at Sep 1, 2010 at 4:35 pm
    Ended Job = job_201008311250_0006 with errors
    Check your hadoop task logs, you will find more detailed information there.

    -Shirjeet
    On Wed, Sep 1, 2010 at 6:13 AM, Robert Hennig wrote:

    Hello,

    I'm relative new to hive & hadoop and I have written a custom InputFormat
    to be able to read our logfiles. I think I got everything right but when I
    try to execute a query on a Amazon EMR cluster it fails with some error
    messages that don't tell me what exactly is wrong.

    So this is the query I execute:

    add jar s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar;
    add jar s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar;

    DROP TABLE event_log;

    CREATE EXTERNAL TABLE IF NOT EXISTS event_log (
    EVENT_SUBTYPE STRING,
    EVENT_TYPE STRING
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
    STORED AS
    INPUTFORMAT 'com.adconion.hadoop.hive.DataLogInputFormat'
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION 's3://amg-events/2010/07/01/01';

    SELECT event_type FROM event_log WHERE event_type = 'pp' LIMIT 10;

    Which results in the following output:

    hadoop@domU-12-31-39-0F-45-B3:~$ hive -f test.ql
    Hive history
    file=/mnt/var/lib/hive/tmp/history/hive_job_log_hadoop_201009011303_427866099.txt
    Testing s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar
    converting to local s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar
    Added
    /mnt/var/lib/hive/downloaded_resources/s3_amg.hadoop_hiveLib_hive-json-serde-0.1.jar
    to class path
    Testing s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar
    converting to local
    s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar
    Added
    /mnt/var/lib/hive/downloaded_resources/s3_amg.hadoop_hiveLib_hadoop-jar-with-dependencies.jar
    to class path
    OK
    Time taken: 2.426 seconds
    Found class for org.apache.hadoop.hive.contrib.serde2.JsonSerde
    OK
    Time taken: 0.332 seconds
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_201008311250_0006, Tracking URL =
    http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/jobdetails.jsp?jobid=job_201008311250_0006
    Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job
    -Dmapred.job.tracker=domU-12-31-39-0F-45-B3.compute-1.internal:9001 -kill
    job_201008311250_0006
    2010-09-01 13:04:04,376 Stage-1 map = 0%, reduce = 0%
    2010-09-01 13:04:34,681 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201008311250_0006 with errors

    Failed tasks with most(4) failures :
    Task URL:
    http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/taskdetails.jsp?jobid=job_201008311250_0006&tipid=task_201008311250_0006_m_000013

    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.ExecDriver

    Only errors I can find under /mnt/var/log/apps/hive.log are multiple like
    that one:

    2010-09-01 13:03:36,586 DEBUG org.apache.hadoop.conf.Configuration
    (Configuration.java:<init>(216)) - java.io.IOException: config()
    at
    org.apache.hadoop.conf.Configuration.<init>(Configuration.java:216)
    at
    org.apache.hadoop.conf.Configuration.<init>(Configuration.java:203)
    at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:316)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:232)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

    And those errors:

    2010-09-01 13:03:40,228 ERROR DataNucleus.Plugin
    (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
    "org.eclipse.core.resources" but it cannot be resolved.
    2010-09-01 13:03:40,228 ERROR DataNucleus.Plugin
    (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
    "org.eclipse.core.resources" but it cannot be resolved.
    2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
    (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
    "org.eclipse.core.runtime" but it cannot be resolved.
    2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
    (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
    "org.eclipse.core.runtime" but it cannot be resolved.
    2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
    (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
    "org.eclipse.text" but it cannot be resolved.
    2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
    (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
    "org.eclipse.text" but it cannot be resolved.

    Does anyone have an Idea what wents wrong?

    Thanks!

    Robert
  • Robert Hennig at Sep 2, 2010 at 10:59 am
    Hello,

    Thanks Shirjeet for your answer. I found an execption in a task log
    which results from a casting error:

    Caused by: java.lang.ClassCastException:
    org.apache.hadoop.mapred.FileSplit cannot be cast to
    com.adconion.hadoop.hive.DataLogSplit
    at
    com.adconion.hadoop.hive.DataLogInputFormat.getRecordReader(DataLogInputFormat.java:112)
    at
    org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:61)
    ... 11 more

    The error happened because I expected my custom getSlices() method to be
    used which delivers an array of DataLogSplit objects and I expected that
    my custom getRecordReader() method will receive one of this splits which
    then could be casted to be a DataLog split.

    So this looks like my getSplits() method is not being used. Or does
    hadoop transform the splits somehow?

    Thanks,

    Robert

    Am 01.09.10 18:34, schrieb Shrijeet Paliwal:
    Ended Job = job_201008311250_0006 with errors

    Check your hadoop task logs, you will find more detailed information
    there.

    -Shirjeet

    On Wed, Sep 1, 2010 at 6:13 AM, Robert Hennig wrote:

    Hello,

    I'm relative new to hive & hadoop and I have written a custom
    InputFormat to be able to read our logfiles. I think I got
    everything right but when I try to execute a query on a Amazon EMR
    cluster it fails with some error messages that don't tell me what
    exactly is wrong.

    So this is the query I execute:

    add jar s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar;
    add jar s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar;

    DROP TABLE event_log;

    CREATE EXTERNAL TABLE IF NOT EXISTS event_log (
    EVENT_SUBTYPE STRING,
    EVENT_TYPE STRING
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
    STORED AS
    INPUTFORMAT 'com.adconion.hadoop.hive.DataLogInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION 's3://amg-events/2010/07/01/01';

    SELECT event_type FROM event_log WHERE event_type = 'pp' LIMIT 10;

    Which results in the following output:

    hadoop@domU-12-31-39-0F-45-B3:~$ hive -f test.ql
    Hive history
    file=/mnt/var/lib/hive/tmp/history/hive_job_log_hadoop_201009011303_427866099.txt
    Testing s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar
    converting to local s3://amg.hadoop/hiveLib/hive-json-serde-0.1.jar
    Added
    /mnt/var/lib/hive/downloaded_resources/s3_amg.hadoop_hiveLib_hive-json-serde-0.1.jar
    to class path
    Testing s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar
    converting to local
    s3://amg.hadoop/hiveLib/hadoop-jar-with-dependencies.jar
    Added
    /mnt/var/lib/hive/downloaded_resources/s3_amg.hadoop_hiveLib_hadoop-jar-with-dependencies.jar
    to class path
    OK
    Time taken: 2.426 seconds
    Found class for org.apache.hadoop.hive.contrib.serde2.JsonSerde
    OK
    Time taken: 0.332 seconds
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_201008311250_0006, Tracking URL =
    http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/jobdetails.jsp?jobid=job_201008311250_0006
    Kill Command = /home/hadoop/.versions/0.20/bin/../bin/hadoop job
    -Dmapred.job.tracker=domU-12-31-39-0F-45-B3.compute-1.internal:9001 -kill
    job_201008311250_0006
    2010-09-01 13:04:04,376 Stage-1 map = 0%, reduce = 0%
    2010-09-01 13:04:34,681 Stage-1 map = 100%, reduce = 100%
    Ended Job = job_201008311250_0006 with errors

    Failed tasks with most(4) failures :
    Task URL:
    http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/taskdetails.jsp?jobid=job_201008311250_0006&tipid=task_201008311250_0006_m_000013
    <http://domU-12-31-39-0F-45-B3.compute-1.internal:9100/taskdetails.jsp?jobid=job_201008311250_0006&tipid=task_201008311250_0006_m_000013>

    FAILED: Execution Error, return code 2 from
    org.apache.hadoop.hive.ql.exec.ExecDriver

    Only errors I can find under /mnt/var/log/apps/hive.log are
    multiple like that one:

    2010-09-01 13:03:36,586 DEBUG org.apache.hadoop.conf.Configuration
    (Configuration.java:<init>(216)) - java.io.IOException: config()
    at
    org.apache.hadoop.conf.Configuration.<init>(Configuration.java:216)
    at
    org.apache.hadoop.conf.Configuration.<init>(Configuration.java:203)
    at
    org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:316)
    at
    org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:232)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

    And those errors:

    2010-09-01 13:03:40,228 ERROR DataNucleus.Plugin
    (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.core.resources" but it cannot be resolved.
    2010-09-01 13:03:40,228 ERROR DataNucleus.Plugin
    (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.core.resources" but it cannot be resolved.
    2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
    (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.core.runtime" but it cannot be resolved.
    2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
    (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.core.runtime" but it cannot be resolved.
    2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
    (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.text" but it cannot be resolved.
    2010-09-01 13:03:40,229 ERROR DataNucleus.Plugin
    (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core"
    requires "org.eclipse.text" but it cannot be resolved.

    Does anyone have an Idea what wents wrong?

    Thanks!

    Robert

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 1, '10 at 1:14p
activeSep 2, '10 at 10:59a
posts3
users2
websitehive.apache.org

2 users in discussion

Robert Hennig: 2 posts Shrijeet Paliwal: 1 post

People

Translate

site design / logo © 2022 Grokbase