Grokbase Groups Pig user June 2012
FAQ
Hi Norbert,
thanks for the tip.
I think that MATCHES operator won't work for me because it tries to match
the whole region.
In my case I'm interesting in detecting the sequence anywhere in the string.

e.g.
abccccdef - filter out
abcdeeeef - filter out
aabcdeef - leave
111111abcd - leave

I want to filter out all the string with at least 4 times repeated char
sequences but not numbers.

regexp for detecting those is: ([^0-9])\1{3,}
but it won't work with MATCHES

I have a trivial working UDF that just calls the pattern().matcher().find()
but maybe there is something that I could use?


--
regards,
Jakub Glapa

On Mon, Jun 18, 2012 at 3:49 PM, Norbert Burger wrote:

Jakub -- The MATCHES operator accepts regexes as input. You can add a NOT
to invert the logic.

http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html

Norbert
On Mon, Jun 18, 2012 at 7:14 AM, Jakub Glapa wrote:

Hi all,
I found in pig latin a 'matches' operator for pattern matching.
I didn't find it in documentation but maybe there exists something similar
but for searching?
Basically in java world I would want to get the result of the
Matcher.find() method not Matcher.matches().
Will I have to end up writing my own UDF for that?

Thanks for help.

PS.
I'm trying to filter out strings with consecutive repeated characters. I've
constructed a regexp that detects them.
Now I just have to apply it somehow.


--
regards,
Jakub Glapa

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 5 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 18, '12 at 11:15a
activeJun 19, '12 at 10:57a
posts5
users2
websitepig.apache.org

2 users in discussion

Jakub Glapa: 3 posts Norbert Burger: 2 posts

People

Translate

site design / logo © 2021 Grokbase