FAQ
That is what I ended up doing - since I could not change the format of the
existing logs, I wrote a
utility<https://github.com/markkerzner/WebLogAnalyzer/blob/master/src/main/java/com/shmsoft/webloganalyzer/ApacheWebLog.java>to
convert them to something more standard that Hive can easily accept.

Thank you,
Mark

2011/9/24 longmans163 <longmans163@163.com>
hi, Mark, I saw ""GET /dynLink/?" contains a space, since hive should
recognize this as a FIELDS TERMINATED which you have defined before. I
think you should encode the spaces to other non-terminate char.


At 2011-09-23 04:58:59,"Mark Kerzner" wrote:

Hi,

I have an apache web log (sample below), and want to LOAD DATA INPATH.

My fields are separated by a space, and those that contains spaces are
enclosed in quotes.

I tried this,

ROW FORMAT DELIMITED
FIELDS TERMINATED BY " "
COLLECTION ITEMS TERMINATED BY '"'
MAP KEYS TERMINATED BY ","

but it did not work, and thought that GET is a separate field. What should
I change?

Thank you,
Mark


[01/May/2011:00:00:00 +0000] 68.115.109.118 TLSv1 RC4-MD5 "GET
/dynLink/?PCD=CHICHHH&EBC=3425154412&RCC=D2RVX&GAD=20110426&NMN=2&NOA=1&
amp;NOC=0&LNG=en&TBP=325.43&GEM=STEPHENCLAUDENELSON%40GMAIL.COM&GEN=&GSL=&GLN=NELSON&GFN=STEPHEN&GCC=&GST=&GCT=&GPC=&GAR=&GPN=&PRT=0&PLC=&PCC=brandwebsite&PSC=&SRP=CIBMS0&PID=HIL&PET=WEB&GNR=1&CRP=0901452
HTTP/1.1" 200 95 0 99885 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1;
.NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729;
InfoPath.2; .NET4.0C; .NET4.0E; MS-RTC LM 8)" "
https://secure.hilton.com/en/hi/res/retrieved_reservation.jhtml;jsessionid=UIBJ2MH0JDJPOCSGBIYMVCQ?_requestid=153483"
"t=1304208000431979" "D=99766"


Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 4 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedSep 22, '11 at 8:59p
activeSep 25, '11 at 3:59a
posts4
users3
websitehive.apache.org

People

Translate

site design / logo © 2022 Grokbase