Grokbase Groups Hive user July 2009
FAQ
I think you can do that with regex replace and GROUP BY. Something
like this- select replaced_col, count(1) from (select
regex_replace(original_col, '.*(text1|text2|text3).*', '$1') as
replaced_col from table)
On 7/28/09, Andraz Tori wrote:
Hi,
I am a beginner at Hive's SQL so I am sorry if this question is answered
somewhere else. I tried to find the answer in Wiki, but no luck.

I have a dataset in which one of the columns is text. I need to count
number of records that match certain regex on that column. There are a
number of different regexes that I need to count records for and it
seems that there should be a way to do it in one pass through the data
with hive.

So how do I do it?

This doesn't seem to work (on Hive 0.3):
select sum(textcolumn LIKE "%%sometext%%") as CountA, sum(textcolumn
LIKE "%%othertext%%") as CountB from sometable;


Thank you for your help in advance!


[unrelated: We've written a simple AWS S3 log format deserializer for
Hive. It needs some polishing - if anyone needs it or wants to polish it
up for inclusion, let me know]

--
Andraz Tori, CTO
Zemanta Ltd, New York, London, Ljubljana
www.zemanta.com
mail: andraz@zemanta.com
tel: +386 41 515 767
twitter: andraz, skype: minmax_test



--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 6 | next ›
Discussion Overview
groupuser @
categorieshive, hadoop
postedJul 27, '09 at 6:34p
activeJul 28, '09 at 9:57p
posts6
users4
websitehive.apache.org

People

Translate

site design / logo © 2021 Grokbase