Grokbase Groups Pig user April 2011
FAQ
Hello,

I am having a problem escaping a ":" and a "." in a regular expression within the REGEX_EXTRACT() function shown at http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#REGEX_EXTRACT. Here's a simplified example, though the example in the docs gives me the problem as well. I've tried it without the "\" in front of the ":", but that doesn't work right either (returns the whole line). So, how do I escape the ":", and also I need to escape a "." as well in my actual script.

------INPUT FILE------
hi:1 num1 num2 num3
hi:20 num1 blah boo
ho:30 num1 blah foo
bar:30 foo foo foo
bar:40 foo far away
bar:40 far far far

------PIG SCRIPT------
a = LOAD 'fromabs-colons' USING PigStorage AS (f1,f2,f3,f4);
b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\:(.*)',1) == 'hi';
DUMP b;

------WHAT I EXPECT---
(hi:1,num1,num2,num3)
(hi:20,num1,blah,foo)

------ERROR I GET-----
2011-04-19 22:55:43,844 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 1, column 40. Encountered: ":" (58), after : "\'(.*)\\"

------PIG VERSION-----
Apache Pig version 0.8.0..1103222002 (r1084466)

Search Discussions

  • Sven Krasser at Apr 20, 2011 at 12:03 am
    Hey Jonathan,

    You need to escape the backslash as well (it has a meaning in the string
    literals in Pig):

    b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\\:(.*)',1) == 'hi';

    If you'd want to escape a single backslash, it'd become '\\\\'.

    Best,
    -Sven

    On Tue, Apr 19, 2011 at 4:00 PM, Jonathan Hoover wrote:

    Hello,

    I am having a problem escaping a ":" and a "." in a regular expression
    within the REGEX_EXTRACT() function shown at
    http://pig.apache.org/docs/r0.8.0/piglatin_ref2.html#REGEX_EXTRACT. Here's
    a simplified example, though the example in the docs gives me the problem as
    well. I've tried it without the "\" in front of the ":", but that doesn't
    work right either (returns the whole line). So, how do I escape the ":", and
    also I need to escape a "." as well in my actual script.

    ------INPUT FILE------
    hi:1 num1 num2 num3
    hi:20 num1 blah boo
    ho:30 num1 blah foo
    bar:30 foo foo foo
    bar:40 foo far away
    bar:40 far far far

    ------PIG SCRIPT------
    a = LOAD 'fromabs-colons' USING PigStorage AS (f1,f2,f3,f4);
    b = FILTER a BY REGEX_EXTRACT(f1,'(.*)\:(.*)',1) == 'hi';
    DUMP b;

    ------WHAT I EXPECT---
    (hi:1,num1,num2,num3)
    (hi:20,num1,blah,foo)

    ------ERROR I GET-----
    2011-04-19 22:55:43,844 [main] ERROR org.apache.pig.tools.grunt.Grunt -
    ERROR 1000: Error during parsing. Lexical error at line 1, column 40.
    Encountered: ":" (58), after : "\'(.*)\\"

    ------PIG VERSION-----
    Apache Pig version 0.8.0..1103222002 (r1084466)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedApr 19, '11 at 11:01p
activeApr 20, '11 at 12:03a
posts2
users2
websitepig.apache.org

2 users in discussion

Jonathan Hoover: 1 post Sven Krasser: 1 post

People

Translate

site design / logo © 2021 Grokbase