FAQ
Parameter subsitution using -param option runs into problems when substituing entire pig statements in a shell script (maybe this is a bash problem)
----------------------------------------------------------------------------------------------------------------------------------------------------

Key: PIG-1586
URL: https://issues.apache.org/jira/browse/PIG-1586
Project: Pig
Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Viraj Bhat


I have a Pig script as a template:

{code}
register Countwords.jar;
A = $INPUT;
B = FOREACH A GENERATE
examples.udf.SubString($0,0,1),
$1 as num;
C = GROUP B BY $0;
D = FOREACH C GENERATE group, SUM(B.num);
STORE D INTO $OUTPUT;
{code}


I attempt to do Parameter substitutions using the following:

Using Shell script:

{code}
#!/bin/bash
java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -r -file sub.pig \
-param INPUT="(foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by (word)) generate flatten(examples.udf.CountWords(\\$0,\\$1,\\$2)))" \
-param OUTPUT="\'/user/viraj/output\' USING PigStorage()"
{code}

register Countwords.jar;

A = (foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by (word)) generate flatten(examples.udf.CountWords(runsub.sh,,)));
B = FOREACH A GENERATE
examples.udf.SubString($0,0,1),
$1 as num;
C = GROUP B BY $0;
D = FOREACH C GENERATE group, SUM(B.num);

STORE D INTO /user/viraj/output;
{code}

The shell substitutes the $0 before passing it to java.
a) Is there a workaround for this?
b) Is this is Pig param problem?


Viraj



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Search Discussions

  • Viraj Bhat (JIRA) at Sep 1, 2010 at 12:35 am
    [ https://issues.apache.org/jira/browse/PIG-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Viraj Bhat updated PIG-1586:
    ----------------------------

    Description:
    I have a Pig script as a template:

    {code}
    register Countwords.jar;
    A = $INPUT;
    B = FOREACH A GENERATE
    examples.udf.SubString($0,0,1),
    $1 as num;
    C = GROUP B BY $0;
    D = FOREACH C GENERATE group, SUM(B.num);
    STORE D INTO $OUTPUT;
    {code}


    I attempt to do Parameter substitutions using the following:

    Using Shell script:

    {code}
    #!/bin/bash
    java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -r -file sub.pig \
    -param INPUT="(foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by (word)) generate flatten(examples.udf.CountWords(\\$0,\\$1,\\$2)))" \
    -param OUTPUT="\'/user/viraj/output\' USING PigStorage()"
    {code}

    {code}
    register Countwords.jar;

    A = (foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by (word)) generate flatten(examples.udf.CountWords(runsub.sh,,)));
    B = FOREACH A GENERATE
    examples.udf.SubString($0,0,1),
    $1 as num;
    C = GROUP B BY $0;
    D = FOREACH C GENERATE group, SUM(B.num);

    STORE D INTO /user/viraj/output;
    {code}

    The shell substitutes the $0 before passing it to java.
    a) Is there a workaround for this?
    b) Is this is Pig param problem?


    Viraj

    was:
    I have a Pig script as a template:

    {code}
    register Countwords.jar;
    A = $INPUT;
    B = FOREACH A GENERATE
    examples.udf.SubString($0,0,1),
    $1 as num;
    C = GROUP B BY $0;
    D = FOREACH C GENERATE group, SUM(B.num);
    STORE D INTO $OUTPUT;
    {code}


    I attempt to do Parameter substitutions using the following:

    Using Shell script:

    {code}
    #!/bin/bash
    java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -r -file sub.pig \
    -param INPUT="(foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by (word)) generate flatten(examples.udf.CountWords(\\$0,\\$1,\\$2)))" \
    -param OUTPUT="\'/user/viraj/output\' USING PigStorage()"
    {code}

    register Countwords.jar;

    A = (foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by (word)) generate flatten(examples.udf.CountWords(runsub.sh,,)));
    B = FOREACH A GENERATE
    examples.udf.SubString($0,0,1),
    $1 as num;
    C = GROUP B BY $0;
    D = FOREACH C GENERATE group, SUM(B.num);

    STORE D INTO /user/viraj/output;
    {code}

    The shell substitutes the $0 before passing it to java.
    a) Is there a workaround for this?
    b) Is this is Pig param problem?


    Viraj



    Parameter subsitution using -param option runs into problems when substituing entire pig statements in a shell script (maybe this is a bash problem)
    ----------------------------------------------------------------------------------------------------------------------------------------------------

    Key: PIG-1586
    URL: https://issues.apache.org/jira/browse/PIG-1586
    Project: Pig
    Issue Type: Bug
    Affects Versions: 0.8.0
    Reporter: Viraj Bhat

    I have a Pig script as a template:
    {code}
    register Countwords.jar;
    A = $INPUT;
    B = FOREACH A GENERATE
    examples.udf.SubString($0,0,1),
    $1 as num;
    C = GROUP B BY $0;
    D = FOREACH C GENERATE group, SUM(B.num);
    STORE D INTO $OUTPUT;
    {code}
    I attempt to do Parameter substitutions using the following:
    Using Shell script:
    {code}
    #!/bin/bash
    java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -r -file sub.pig \
    -param INPUT="(foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by (word)) generate flatten(examples.udf.CountWords(\\$0,\\$1,\\$2)))" \
    -param OUTPUT="\'/user/viraj/output\' USING PigStorage()"
    {code}
    {code}
    register Countwords.jar;
    A = (foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by (word)) generate flatten(examples.udf.CountWords(runsub.sh,,)));
    B = FOREACH A GENERATE
    examples.udf.SubString($0,0,1),
    $1 as num;
    C = GROUP B BY $0;
    D = FOREACH C GENERATE group, SUM(B.num);
    STORE D INTO /user/viraj/output;
    {code}
    The shell substitutes the $0 before passing it to java.
    a) Is there a workaround for this?
    b) Is this is Pig param problem?
    Viraj
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.
  • Olga Natkovich (JIRA) at Sep 1, 2010 at 12:50 am
    [ https://issues.apache.org/jira/browse/PIG-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

    Olga Natkovich reassigned PIG-1586:
    -----------------------------------

    Assignee: Viraj Bhat

    Viraj volunteered to print the line that pig gets as part of parameter substitution to see if the escapes and quotes are eaten by the shell. Thanks Viraj
    Parameter subsitution using -param option runs into problems when substituing entire pig statements in a shell script (maybe this is a bash problem)
    ----------------------------------------------------------------------------------------------------------------------------------------------------

    Key: PIG-1586
    URL: https://issues.apache.org/jira/browse/PIG-1586
    Project: Pig
    Issue Type: Bug
    Affects Versions: 0.8.0
    Reporter: Viraj Bhat
    Assignee: Viraj Bhat

    I have a Pig script as a template:
    {code}
    register Countwords.jar;
    A = $INPUT;
    B = FOREACH A GENERATE
    examples.udf.SubString($0,0,1),
    $1 as num;
    C = GROUP B BY $0;
    D = FOREACH C GENERATE group, SUM(B.num);
    STORE D INTO $OUTPUT;
    {code}
    I attempt to do Parameter substitutions using the following:
    Using Shell script:
    {code}
    #!/bin/bash
    java -cp ~/pig-svn/trunk/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main -r -file sub.pig \
    -param INPUT="(foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by (word)) generate flatten(examples.udf.CountWords(\\$0,\\$1,\\$2)))" \
    -param OUTPUT="\'/user/viraj/output\' USING PigStorage()"
    {code}
    {code}
    register Countwords.jar;
    A = (foreach (COGROUP(load '/user/viraj/dataset1' USING PigStorage() AS (word:chararray,num:int)) by (word),(load '/user/viraj/dataset2' USING PigStorage() AS (word:chararray,num:int)) by (word)) generate flatten(examples.udf.CountWords(runsub.sh,,)));
    B = FOREACH A GENERATE
    examples.udf.SubString($0,0,1),
    $1 as num;
    C = GROUP B BY $0;
    D = FOREACH C GENERATE group, SUM(B.num);
    STORE D INTO /user/viraj/output;
    {code}
    The shell substitutes the $0 before passing it to java.
    a) Is there a workaround for this?
    b) Is this is Pig param problem?
    Viraj
    --
    This message is automatically generated by JIRA.
    -
    You can reply to this email to add a comment to the issue online.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categoriespig, hadoop
postedSep 1, '10 at 12:35a
activeSep 1, '10 at 12:50a
posts3
users1
websitepig.apache.org

1 user in discussion

Olga Natkovich (JIRA): 3 posts

People

Translate

site design / logo © 2023 Grokbase