Grokbase Groups Pig user June 2012
Hi Norbert,
thanks for the tip.
I think that MATCHES operator won't work for me because it tries to match
the whole region.
In my case I'm interesting in detecting the sequence anywhere in the string.

abccccdef - filter out
abcdeeeef - filter out
aabcdeef - leave
111111abcd - leave

I want to filter out all the string with at least 4 times repeated char
sequences but not numbers.

regexp for detecting those is: ([^0-9])\1{3,}
but it won't work with MATCHES

I have a trivial working UDF that just calls the pattern().matcher().find()
but maybe there is something that I could use?

Jakub Glapa

On Mon, Jun 18, 2012 at 3:49 PM, Norbert Burger wrote:

Jakub -- The MATCHES operator accepts regexes as input. You can add a NOT
to invert the logic.

On Mon, Jun 18, 2012 at 7:14 AM, Jakub Glapa wrote:

Hi all,
I found in pig latin a 'matches' operator for pattern matching.
I didn't find it in documentation but maybe there exists something similar
but for searching?
Basically in java world I would want to get the result of the
Matcher.find() method not Matcher.matches().
Will I have to end up writing my own UDF for that?

Thanks for help.

I'm trying to filter out strings with consecutive repeated characters. I've
constructed a regexp that detects them.
Now I just have to apply it somehow.

Jakub Glapa

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 3 of 5 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 18, '12 at 11:15a
activeJun 19, '12 at 10:57a

2 users in discussion

Jakub Glapa: 3 posts Norbert Burger: 2 posts



site design / logo © 2021 Grokbase