Hello,
[I posted the question below to Cloudera's getsatisfaction site but am cross-posting here in case hive-users folks have debugging suggestions. I'm really stuck on this one.]
I recently upgraded to CDH3 Beta. I had some Hive code working well in an earlier version of Hadoop 20 that created a table, then loaded data into it using LOAD DATA LOCAL INPATH. In CDH3, I now get a semantic error when I run the same LOAD command.
The table is created by
CREATE TABLE TOMCAT(identifier STRING, datestamp STRING, time_stamp STRING, seq STRING, server STRING, logline STRING) PARTITIONED BY(filedate STRING, app STRING, filename STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' STORED AS TEXTFILE;
and the load command used is:
LOAD DATA LOCAL INPATH '/var/www/petrify/mw.log.trustejb1' INTO TABLE TOMCAT PARTITION (filedate='2010-06-25', app='trustdomain', filename='mw.log.trustejb1');
The file is simple tab-delimited log data.
If I exclude the partition when I create the table, the data loads fine. But when I set up the partitions I get the stack trace below during the load.
I tried copying the data into HDFS and using LOAD DATA INPATH instead, but got the same error:
FAILED: Error in semantic analysis: line 1:110 Partition not found 'mw.log.trustejb1'
where 110 is the character position just after the word PARTITION in the query.
It seems like it doesn't think the table is partitioned, though I can see the partition keys listed when I do DESCRIBE EXTENDED on my table. (Output from that is below the error.) There were no errors in the logs or at the Thrift server console when I created the table.
Strangely, when I run SHOW PARTITIONS TOMCAT, it doesn't list anything.
Any help with this would be most welcome.
Thanks
Ken
10/08/12 15:11:40 INFO service.HiveServer: Running the query: LOAD DATA LOCAL INPATH '/var/www/petrify/trustdomain-rewritten/mw.log.trustejb1' INTO TABLE TOMCAT PARTITION (filedate='2010-06-25', app='trustdomain', filename='mw.log.trustejb1')
10/08/12 15:11:40 INFO parse.ParseDriver: Parsing command: LOAD DATA LOCAL INPATH '/var/www/petrify/trustdomain-rewritten/mw.log.trustejb1' INTO TABLE TOMCAT PARTITION (filedate='2010-06-25', app='trustdomain', filename='mw.log.trustejb1')
10/08/12 15:11:40 INFO parse.ParseDriver: Parse Completed
10/08/12 15:11:40 INFO hive.log: DDL: struct tomcat { string identifier, string datestamp, string time_stamp, string seq, string server, string logline}
10/08/12 15:11:40 ERROR metadata.Hive: org.apache.thrift.TApplicationException: get_partition failed: unknown result
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_partition(ThriftHiveMetastore.java:831)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partition(ThriftHiveMetastore.java:799)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartition(HiveMetaStoreClient.java:418)
at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:620)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.(BaseSemanticAnalyzer.java:397)
at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:178)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:120)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:378)
at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:366)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
FAILED: Error in semantic analysis: line 1:110 Partition not found 'mw.log.trustejb1'
10/08/12 15:11:40 ERROR ql.Driver: FAILED: Error in semantic analysis: line 1:110 Partition not found 'mw.log.trustejb1'
org.apache.hadoop.hive.ql.parse.SemanticException: line 1:110 Partition not found 'mw.log.trustejb1'
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer$tableSpec.(BaseSemanticAnalyzer.java:403)
at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:178)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:105)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:275)
at org.apache.hadoop.hive.ql.Driver.runCommand(Driver.java:320)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:120)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.process(ThriftHive.java:378)
at org.apache.hadoop.hive.service.ThriftHive$Processor.process(ThriftHive.java:366)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
DESCRIBE TABLE EXTENDED TOMCAT;
identifier string
datestamp string
time_stamp string
seq string
server string
logline string
filedate string
app string
filename string
Detailed Table Information Table(tableName:tomcat, dbName:default, owner:root, createTime:1281661047, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:identifier, type:string, comment:null), FieldSchema(name:datestamp, type:string, comment:null), FieldSchema(name:time_stamp, type:string, comment:null), FieldSchema(name:seq, type:string, comment:null), FieldSchema(name:server, type:string, comment:null), FieldSchema(name:logline, type:string, comment:null)], location:hdfs://hadoop-vm1/user/hive/warehouse/tomcat, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format=9}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[FieldSchema(name:filedate, type:string, comment:null), FieldSchema(name:app, type:string, comment:null), FieldSchema(name:filename, type:string, comment:null)], parameters:{transient_lastDdlTime=1281661047})
Time taken: 0.086 seconds
10/08/12 18:53:08 INFO CliDriver: Time taken: 0.086 seconds
Ken Barclay
Integration Engineer
Wells Fargo Bank - ISD | 45 Fremont Street, 10th Floor | San Francisco, CA 94105
MAC A0194-100
Tel 415-222-6491
ken.barclay@wellsfargo.com
This message may contain confidential and/or privileged information, and is intended for the use of the addressee only. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation.