I am trying to load a file into hive table, but it is not terminated by new line. I am wondering if this is possible directly through hive inbuilt commands or custom SerDe for it. The line terminator seems to be the date string.
My file looks like this:
20 May 2010 16:59:52,329 [http-7280-1] >
[Query:( Big query string )]]
20 May 2010 17:02:14,447 [http-7280-1] > [Query: test][Time(ms):109870]
20 May 2010 17:02:17,275 [http-7280-2] >
20 May 2010 17:31:04,905 [http-7280-1] > [Query: test][Time(ms):14015]
And I am trying to load it into table
create table searchLog (dateStr String, threadName string, message string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH serdeproperties("input.regex"="([^ ]* [^ ]* [^ ]* [^ ]*) \\[([^\\]]*)\\] > ([^$]*)","output.format.string"="%1$s%2$s%3$s")stored as textfile;
This works fine if the message is in the same line. But, for messages that are in multiple lines, hive is not able to match the pattern.
Let me know if there is a way to achieve this in hive.
Appreciate your help,
The information contained in this email message and its attachments
is intended only for the private and confidential use of the
recipient(s) named above, unless the sender expressly agrees
Transmission of email over the Internet is not a secure
communications medium. If you are requesting or have requested the
transmittal of personal data, as defined in applicable privacy laws
by means of email or in an attachment to email, you must select a
more secure alternate means of transmittal that supports your
obligations to protect such personal data.
If the reader of this message is not the intended recipient and/or
you have received this email in error, you must take no action
based on the information in this email and you are hereby notified
that any dissemination, misuse or copying or disclosure of this
communication is strictly prohibited. If you have received this
communication in error, please notify us immediately by email and
delete the original message.