FAQ
Hi Mark,

You could try RegexSerDe to deserialize using regular expression. Here is a good example:

http://books.google.com/books?id=Nff49D7vnJcC&lpg=PA391&ots=IicwYn7zOq&dq=ROW%20FORMAT%20SERDE%20input.regex&pg=PA391#v=onepage&q=ROW%20FORMAT%20SERDE%20input.regex&f=false

Good luck,
Vince


From: Mark Kerzner <mark.kerzner@shmsoft.com
Reply-To: <user@hive.apache.org
Date: Sat, 24 Sep 2011 22:43:23 -0500
To: <user@hive.apache.org
Subject: Re: How to load quote-separated fields?

That is what I ended up doing - since I could not change the format of the existing logs, I wrote a utility<https://github.com/markkerzner/WebLogAnalyzer/blob/master/src/main/java/com/shmsoft/webloganalyzer/ApacheWebLog.java> to convert them to something more standard that Hive can easily accept.

Thank you,
Mark

2011/9/24 longmans163 <longmans163@163.com
hi, Mark, I saw ""GET /dynLink/?" contains a space, since hive should recognize this as a FIELDS TERMINATED which you have defined before. I think you should encode the spaces to other non-terminate char.


At 2011-09-23 04:58:59,"Mark Kerzner" wrote:
Hi,

I have an apache web log (sample below), and want to LOAD DATA INPATH.

My fields are separated by a space, and those that contains spaces are enclosed in quotes.

I tried this,

ROW FORMAT DELIMITED
FIELDS TERMINATED BY " "
COLLECTION ITEMS TERMINATED BY '"'
MAP KEYS TERMINATED BY ","

but it did not work, and thought that GET is a separate field. What should I change?

Thank you,
Mark


[01/May/2011:00:00:00 +0000] 68.115.109.118 TLSv1 RC4-MD5 "GET /dynLink/?PCD=CHICHHH&EBC=3425154412&RCC=D2RVX&GAD=20110426&NMN=2&NOA=1& amp;NOC=0&LNG=en&TBP=325.43&GEM=STEPHENCLAUDENELSON%40GMAIL.COM<http://40GMAIL.COM>&GEN=&GSL=&GLN=NELSON&GFN=STEPHEN&GCC=&GST=&GCT=&GPC=&GAR=&GPN=&PRT=0&PLC=&PCC=brandwebsite&PSC=&SRP=CIBMS0&PID=HIL&PET=WEB&GNR=1&CRP=0901452 HTTP/1.1" 200 95 0 99885 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152<tel:3.0.4506.2152>; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E; MS-RTC LM 8)" "https://secure.hilton.com/en/hi/res/retrieved_reservation.jhtml;jsessionid=UIBJ2MH0JDJPOCSGBIYMVCQ?_requestid=153483" "t=1304208000431979" "D=99766"





The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you.

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 4 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 22, '11 at 8:59p
activeSep 25, '11 at 3:59a
posts4
users3
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase