Grokbase Groups Pig user June 2010
FAQ
Hello Everybody,
I'm looking for a way to run REPLACE on multiple columns in a dataset to
escape some characters that would confuse loading after processing in
pig.
Is there an easy way to do that without having to do

FOREACH x GENERATE REPLACE(a,"char","\\char"), REPLACE(b,"char","\
\char"), REPLACE(c... etc

?
Johannes

Search Discussions

  • Dmitriy Ryaboy at Jun 16, 2010 at 7:09 pm
    We really need an APPLY function (really, it's map, but don't want to
    overload the terms), which would take a funcspec and apply it to every
    element in a tuple or bag. Then you could say
    FOREACH x GENERATE APPLY REPLACE(*, "char", "\\char") TO (a, b, c);

    That would be rad. Especially useful when dealing with the bags produced by
    grouping things and projections on such bags.

    -D
    On Wed, Jun 16, 2010 at 6:10 AM, jr wrote:

    Hello Everybody,
    I'm looking for a way to run REPLACE on multiple columns in a dataset to
    escape some characters that would confuse loading after processing in
    pig.
    Is there an easy way to do that without having to do

    FOREACH x GENERATE REPLACE(a,"char","\\char"), REPLACE(b,"char","\
    \char"), REPLACE(c... etc

    ?
    Johannes
  • Hc busy at Jun 18, 2010 at 6:21 pm
    yeah, that'd be really cool. The other way that we can say this, (to make
    map reduce interface available in pig), is to allow FOREACH to be nested:


    TRIMED_TABLE = FOREACH TABLE {

    stripped = FOREACH TABLE.SOME_BAG GENERATE String.Trim(value);

    GENERATE k1,k2,k3, stripped;

    }
    On Wed, Jun 16, 2010 at 12:08 PM, Dmitriy Ryaboy wrote:

    We really need an APPLY function (really, it's map, but don't want to
    overload the terms), which would take a funcspec and apply it to every
    element in a tuple or bag. Then you could say
    FOREACH x GENERATE APPLY REPLACE(*, "char", "\\char") TO (a, b, c);

    That would be rad. Especially useful when dealing with the bags produced by
    grouping things and projections on such bags.

    -D

    On Wed, Jun 16, 2010 at 6:10 AM, jr <johannes.russek@io-consulting.net
    wrote:
    Hello Everybody,
    I'm looking for a way to run REPLACE on multiple columns in a dataset to
    escape some characters that would confuse loading after processing in
    pig.
    Is there an easy way to do that without having to do

    FOREACH x GENERATE REPLACE(a,"char","\\char"), REPLACE(b,"char","\
    \char"), REPLACE(c... etc

    ?
    Johannes

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedJun 16, '10 at 1:11p
activeJun 18, '10 at 6:21p
posts3
users3
websitepig.apache.org

3 users in discussion

Dmitriy Ryaboy: 1 post Jr: 1 post Hc busy: 1 post

People

Translate

site design / logo © 2022 Grokbase