Grokbase Groups Hive user April 2011
FAQ
Hi(ve),

I created a table like this;

create table testtable (veld1 STRING,veld2 STRING,veld3 STRING) ROW FORMAT
SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" =
"([a-z]{4}[0-9])þ([a-z]{4}[0-9])þ([a-z]{4}[0-9])") STORED AS TEXTFILE;


The table is OK, select * from testtable shows the contents of the
underlying HDFS file.

However when I invoke a MR job by select veld2 from testtable, the MR job
starts but I get mapper errors saying:

"Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.ClassNotFoundException:
org.apache.hadoop.hive.contrib.serde2.RegexSerDe"

I already copied the hive serde jar (in my case hive-serde-0.7.0-CDH3B4.jar)
to $HADOOP_HOME/lib and restarted jobtracker/tasktrackers but that doesn't
help.

Cheers Jasper

--
Kind Regards \ Met Vriendelijke Groet,





Jasper Knulst

BI Consultant





VLC Den Haag
Gildeweg 5B
2632 BD Nootdorp


M: +31 (0)6 19 66 75 11

T: +31 (0)15 764 07 50
------------------------------------------------------------

Skype: jasper_knulst_vlc

Search Discussions

  • Loren Siebert at Apr 5, 2011 at 10:58 pm
    You need to tell Hive about the JAR. This is how I do it in hive-site.xml:

    <property>
    <name>hive.aux.jars.path</name>
    <value>file:///usr/lib/hive/lib/hive-contrib-0.7.0-CDH3B4.jar</value>
    <description>These JAR file are available to all users for all jobs</description>
    </property>

    On Apr 5, 2011, at 3:50 PM, Jasper Knulst wrote:

    Hi(ve),

    I created a table like this;

    create table testtable (veld1 STRING,veld2 STRING,veld3 STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES ("input.regex" = "([a-z]{4}[0-9])þ([a-z]{4}[0-9])þ([a-z]{4}[0-9])") STORED AS TEXTFILE;

    The table is OK, select * from testtable shows the contents of the underlying HDFS file.

    However when I invoke a MR job by select veld2 from testtable, the MR job starts but I get mapper errors saying:

    "Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.contrib.serde2.RegexSerDe"

    I already copied the hive serde jar (in my case hive-serde-0.7.0-CDH3B4.jar) to $HADOOP_HOME/lib and restarted jobtracker/tasktrackers but that doesn't help.

    Cheers Jasper

    --
    Kind Regards \ Met Vriendelijke Groet,





    Jasper Knulst

    BI Consultant

    <image001.gif>



    VLC Den Haag
    Gildeweg 5B
    2632 BD Nootdorp


    M: +31 (0)6 19 66 75 11

    T: +31 (0)15 764 07 50
    ------------------------------------------------------------

    Skype: jasper_knulst_vlc
  • Jasper Knulst at Apr 5, 2011 at 11:52 pm
    Thanks Loren,

    That worked!

    Jasper

    2011/4/6 Loren Siebert <loren@siebert.org>
    You need to tell Hive about the JAR. This is how I do it in hive-site.xml:

    <property>
    <name>hive.aux.jars.path</name>
    <value>file:///usr/lib/hive/lib/hive-contrib-0.7.0-CDH3B4.jar</value>
    <description>These JAR file are available to all users for all
    jobs</description>
    </property>

    On Apr 5, 2011, at 3:50 PM, Jasper Knulst wrote:

    Hi(ve),

    I created a table like this;

    create table testtable (veld1 STRING,veld2 STRING,veld3 STRING) ROW
    FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    WITH SERDEPROPERTIES ("input.regex" =
    "([a-z]{4}[0-9])þ([a-z]{4}[0-9])þ([a-z]{4}[0-9])") STORED AS TEXTFILE;

    The table is OK, select * from testtable shows the contents of the
    underlying HDFS file.
    However when I invoke a MR job by select veld2 from testtable, the MR job
    starts but I get mapper errors saying:
    "Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
    java.lang.ClassNotFoundException:
    org.apache.hadoop.hive.contrib.serde2.RegexSerDe"
    I already copied the hive serde jar (in my case
    hive-serde-0.7.0-CDH3B4.jar) to $HADOOP_HOME/lib and restarted
    jobtracker/tasktrackers but that doesn't help.
    Cheers Jasper

    --
  • Hadoopman at Apr 6, 2011 at 11:20 pm
    I have a process which is loading data into hive hourly. Loading data
    hourly isn't a problem however when I load historical data say 24-48
    hours I receive the below error msg. In googling I've come across some
    suggestions that jvm memory needs to be increased. Are there any other
    options or is that pretty much it?

    I appreciate any help with this one. To get around the problem I've
    loaded fewer historical logs which works great but isn't what I had in
    mind :-)

    Below is the log I found referencing one such run.

    Thanks!


    2011-04-05 15:33:38,400 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 7 finished. closing...
    2011-04-05 15:33:38,400 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 7 forwarded 0 rows
    2011-04-05 15:33:38,400 INFO
    org.apache.hadoop.hive.ql.exec.GroupByOperator: 8 finished. closing...
    2011-04-05 15:33:38,400 INFO
    org.apache.hadoop.hive.ql.exec.GroupByOperator: 8 forwarded 0 rows
    2011-04-05 15:33:38,401 WARN
    org.apache.hadoop.hive.ql.exec.GroupByOperator: Begin Hash Table flush
    at close: size = 0
    2011-04-05 15:33:38,401 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: 9 finished. closing...
    2011-04-05 15:33:38,401 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: 9 forwarded 0 rows
    2011-04-05 15:33:38,401 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS
    hdfs://namenoden1:9000/tmp/hive-
    etl/hive_2011-04-05_15-25-32_126_7118636463039801851/_tmp.-mr-10004/000049_0
    2011-04-05 15:33:38,401 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file:
    FS hdfs://namenoden1:9000/tmp/hive-
    etl/hive_2011-04-05_15-25-32_126_7118636463039801851/_tmp.-mr-10004/_tmp.000049_0
    2011-04-05 15:33:38,401 INFO
    org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS
    hdfs://namenoden1:9000/tmp/hive-
    etl/hive_2011-04-05_15-25-32_126_7118636463039801851/_tmp.-mr-10004/000049_0
    2011-04-05 15:33:38,448 INFO org.apache.hadoop.io.compress.CodecPool:
    Got brand-new compressor
    2011-04-05 15:33:38,495 INFO
    org.apache.hadoop.hive.ql.exec.GroupByOperator: 8 Close done
    2011-04-05 15:33:38,495 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 7 Close done
    2011-04-05 15:33:38,495 INFO
    org.apache.hadoop.hive.ql.exec.FilterOperator: 6 Close done
    2011-04-05 15:33:38,495 INFO
    org.apache.hadoop.hive.ql.exec.SelectOperator: 1 Close done
    2011-04-05 15:33:38,495 INFO
    org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
    2011-04-05 15:33:38,495 INFO org.apache.hadoop.hive.ql.exec.MapOperator:
    13 Close done
    2011-04-05 15:33:38,495 INFO ExecMapper: ExecMapper: processed 188614
    rows: used memory = 720968432
    2011-04-05 15:33:38,501 FATAL org.apache.hadoop.mapred.Child: Error
    running child : java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.io.Text.setCapacity(Text.java:240)
    at org.apache.hadoop.io.Text.append(Text.java:216)
    at
    org.apache.hadoop.util.LineReader.readLine(LineReader.java:159)
    at
    org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136)
    at
    org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40)
    at
    org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:66)
    at
    org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:32)
    at
    org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:67)
    at
    org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:202)
    at
    org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:186)
    at
    org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
    at
    org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:383)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:317)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
    at java.security.AccessController.doPrivileged(Native
    Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
    at org.apache.hadoop.mapred.Child.main(Child.java:211)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categorieshive, hadoop
postedApr 5, '11 at 10:51p
activeApr 6, '11 at 11:20p
posts4
users3
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase