Grokbase Groups Pig user June 2012
FAQ
cool, that sounds like good idea

Thanks Norbert!

--
regards,
Jakub Glapa

On Tue, Jun 19, 2012 at 1:22 AM, Norbert Burger wrote:

Any reason you can't wrap this regex with wildcards aligned with start-line
and end-line anchors, i.e.:

^.*([^0-9])\1{3,}.*$

Agreed that it would be nice if MATCHES was less greedy here, but perhaps
this'll avoid you having to write your own UDF.

Norbert
On Mon, Jun 18, 2012 at 3:31 PM, Jakub Glapa wrote:

Hi Norbert,
thanks for the tip.
I think that MATCHES operator won't work for me because it tries to match
the whole region.
In my case I'm interesting in detecting the sequence anywhere in the
string.

e.g.
abccccdef - filter out
abcdeeeef - filter out
aabcdeef - leave
111111abcd - leave

I want to filter out all the string with at least 4 times repeated char
sequences but not numbers.

regexp for detecting those is: ([^0-9])\1{3,}
but it won't work with MATCHES

I have a trivial working UDF that just calls the
pattern().matcher().find()
but maybe there is something that I could use?


--
regards,
Jakub Glapa


On Mon, Jun 18, 2012 at 3:49 PM, Norbert Burger <
norbert.burger@gmail.com
wrote:
Jakub -- The MATCHES operator accepts regexes as input. You can add a NOT
to invert the logic.

http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html

Norbert

On Mon, Jun 18, 2012 at 7:14 AM, Jakub Glapa <jakub.glapa@gmail.com>
wrote:
Hi all,
I found in pig latin a 'matches' operator for pattern matching.
I didn't find it in documentation but maybe there exists something similar
but for searching?
Basically in java world I would want to get the result of the
Matcher.find() method not Matcher.matches().
Will I have to end up writing my own UDF for that?

Thanks for help.

PS.
I'm trying to filter out strings with consecutive repeated
characters.
I've
constructed a regexp that detects them.
Now I just have to apply it somehow.


--
regards,
Jakub Glapa

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 5 of 5 | next ›
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 18, '12 at 11:15a
activeJun 19, '12 at 10:57a
posts5
users2
websitepig.apache.org

2 users in discussion

Jakub Glapa: 3 posts Norbert Burger: 2 posts

People

Translate

site design / logo © 2021 Grokbase