Grokbase Groups Pig user October 2010
FAQ
Hi again! :)

I am trying to run Pig on a local machine, but I want it to connect to a
remote cluster. I can't make it use my settings - whatever I do, I get this:
-----
$ pig -x mapreduce
10/10/16 22:17:43 INFO pig.Main: Logging error messages to:
/home/pigtest/conf/pig_1287260263699.log
2010-10-16 22:17:43,896 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: file:///
grunt>
-----

I have copied the hadoop settings files (/etc/hadoop/conf/*) from the remote
cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH, PIGDIR,
HADOOP_CLASSPATH,... I have also tried changing
/etc/pig/conf/pig.configuration (even wrote there some free text so it would
at least give me an error message) - nothing. It still connects to file:///
and is still doesn't display a message about a jobtracker:
-----
$ export HADOOPDIR=/etc/hadoop/conf
$ export PIG_PATH=/etc/pig/conf
$ export PIG_CLASSPATH=$HADOOPDIR
$ export PIG_HADOOP_VERSION=0.20.2
$ export PIG_HOME="/usr/lib/pig"
$ export PIG_CONF_DIR="/etc/pig/"
$ export PIG_LOG_DIR="/var/log/pig"
$ pig -x mapreduce
10/10/16 22:32:34 INFO pig.Main: Logging error messages to:
/home/pigtest/conf/pig_1287261154272.log
2010-10-16 22:32:34,471 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: file:///
grunt>
-----

I am guessing I am doing something fundamentally wrong. How do I change the
Pig's settings?

More info: using Cloudera package hadoop-pig from CDH3b3 (0.7.0+16-1~lenny-
cdh3b3). I would appreciate some pointers.

Kind regards,

Anze

Search Discussions

  • Gerrit Jansen van Vuuren at Oct 17, 2010 at 2:03 am
    Hi,

    Pig configuration is in the file: $PIG_HOME/conf/pig.properties

    The two parameters that tell pig where to find the namenode and job tracker
    are:

    E.g (assuming your using the default ports)

    ----[ $PIG_HOME/conf/pig.properties ]---------------

    fs.default.name=hdfs://<namenode url>:8020/
    mapred.job.tracker=<jobtracker url>:8021

    --------------

    Having these properties you don't need to specify pig -x mapreduce, just pig
    is enough.


    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Saturday, October 16, 2010 9:53 PM
    To: user@pig.apache.org
    Subject: accessing remote cluster with Pig

    Hi again! :)

    I am trying to run Pig on a local machine, but I want it to connect to a
    remote cluster. I can't make it use my settings - whatever I do, I get this:
    -----
    $ pig -x mapreduce
    10/10/16 22:17:43 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287260263699.log
    2010-10-16 22:17:43,896 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I have copied the hadoop settings files (/etc/hadoop/conf/*) from the remote

    cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH,
    PIGDIR,
    HADOOP_CLASSPATH,... I have also tried changing
    /etc/pig/conf/pig.configuration (even wrote there some free text so it would

    at least give me an error message) - nothing. It still connects to file:///
    and is still doesn't display a message about a jobtracker:
    -----
    $ export HADOOPDIR=/etc/hadoop/conf
    $ export PIG_PATH=/etc/pig/conf
    $ export PIG_CLASSPATH=$HADOOPDIR
    $ export PIG_HADOOP_VERSION=0.20.2
    $ export PIG_HOME="/usr/lib/pig"
    $ export PIG_CONF_DIR="/etc/pig/"
    $ export PIG_LOG_DIR="/var/log/pig"
    $ pig -x mapreduce
    10/10/16 22:32:34 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287261154272.log
    2010-10-16 22:32:34,471 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I am guessing I am doing something fundamentally wrong. How do I change the
    Pig's settings?

    More info: using Cloudera package hadoop-pig from CDH3b3 (0.7.0+16-1~lenny-
    cdh3b3). I would appreciate some pointers.

    Kind regards,

    Anze
  • Anze at Oct 17, 2010 at 6:50 am
    Gerrir, thank you for your answer! It has pointed me in the right direction.

    It looks like Pig (at least mine) ignores PIG_HOME. But with your help I was
    able to debug a bit further:
    -----
    $ find / -name 'pig.properties'
    /etc/pig/conf.dist/pig.properties
    /etc/pig/conf/pig.properties
    /usr/lib/pig/example-confs/conf.default/pig.properties
    /usr/lib/pig/conf/pig.properties
    -----

    I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what my
    Pig uses.

    So while Cloudera packaging makes /etc/pig/conf/pig.properties (the "Debian
    way"), it is not used at all. And it probably ignores the environment vars
    too.

    Thanks again! :)

    Anze


    On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote:
    Hi,

    Pig configuration is in the file: $PIG_HOME/conf/pig.properties

    The two parameters that tell pig where to find the namenode and job tracker
    are:

    E.g (assuming your using the default ports)

    ----[ $PIG_HOME/conf/pig.properties ]---------------

    fs.default.name=hdfs://<namenode url>:8020/
    mapred.job.tracker=<jobtracker url>:8021

    --------------

    Having these properties you don't need to specify pig -x mapreduce, just
    pig is enough.


    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Saturday, October 16, 2010 9:53 PM
    To: user@pig.apache.org
    Subject: accessing remote cluster with Pig

    Hi again! :)

    I am trying to run Pig on a local machine, but I want it to connect to a
    remote cluster. I can't make it use my settings - whatever I do, I get
    this: -----
    $ pig -x mapreduce
    10/10/16 22:17:43 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287260263699.log
    2010-10-16 22:17:43,896 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I have copied the hadoop settings files (/etc/hadoop/conf/*) from the
    remote

    cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH,
    PIGDIR,
    HADOOP_CLASSPATH,... I have also tried changing
    /etc/pig/conf/pig.configuration (even wrote there some free text so it
    would

    at least give me an error message) - nothing. It still connects to file:///
    and is still doesn't display a message about a jobtracker:
    -----
    $ export HADOOPDIR=/etc/hadoop/conf
    $ export PIG_PATH=/etc/pig/conf
    $ export PIG_CLASSPATH=$HADOOPDIR
    $ export PIG_HADOOP_VERSION=0.20.2
    $ export PIG_HOME="/usr/lib/pig"
    $ export PIG_CONF_DIR="/etc/pig/"
    $ export PIG_LOG_DIR="/var/log/pig"
    $ pig -x mapreduce
    10/10/16 22:32:34 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287261154272.log
    2010-10-16 22:32:34,471 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I am guessing I am doing something fundamentally wrong. How do I change the
    Pig's settings?

    More info: using Cloudera package hadoop-pig from CDH3b3 (0.7.0+16-1~lenny-
    cdh3b3). I would appreciate some pointers.

    Kind regards,

    Anze
  • Gerrit Jansen van Vuuren at Oct 17, 2010 at 1:17 pm
    Glad it worked for you :)

    I use the standard apache pig distributions.
    There are several places that environment variables can be changed and set,
    and I have no idea which one cloudera uses but here is a list:

    /etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here that
    sets the home variables and is managed by puppet)
    /etc/bash.bashrc (not good idea to set it here)
    $HOME/.bashrc (quick for users that don't have permission to root but not
    for production )
    $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects, gets
    sourced by $PIG_HOME/bin/pig )

    To see what variables your pig is picking up you can manually insert the
    lines
    echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the $PIG_HOME/bin/pig file
    just before it calls java.

    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Sunday, October 17, 2010 7:49 AM
    To: user@pig.apache.org
    Subject: Re: accessing remote cluster with Pig


    Gerrir, thank you for your answer! It has pointed me in the right direction.


    It looks like Pig (at least mine) ignores PIG_HOME. But with your help I was

    able to debug a bit further:
    -----
    $ find / -name 'pig.properties'
    /etc/pig/conf.dist/pig.properties
    /etc/pig/conf/pig.properties
    /usr/lib/pig/example-confs/conf.default/pig.properties
    /usr/lib/pig/conf/pig.properties
    -----

    I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what my
    Pig uses.

    So while Cloudera packaging makes /etc/pig/conf/pig.properties (the "Debian
    way"), it is not used at all. And it probably ignores the environment vars
    too.

    Thanks again! :)

    Anze


    On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote:
    Hi,

    Pig configuration is in the file: $PIG_HOME/conf/pig.properties

    The two parameters that tell pig where to find the namenode and job tracker
    are:

    E.g (assuming your using the default ports)

    ----[ $PIG_HOME/conf/pig.properties ]---------------

    fs.default.name=hdfs://<namenode url>:8020/
    mapred.job.tracker=<jobtracker url>:8021

    --------------

    Having these properties you don't need to specify pig -x mapreduce, just
    pig is enough.


    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Saturday, October 16, 2010 9:53 PM
    To: user@pig.apache.org
    Subject: accessing remote cluster with Pig

    Hi again! :)

    I am trying to run Pig on a local machine, but I want it to connect to a
    remote cluster. I can't make it use my settings - whatever I do, I get
    this: -----
    $ pig -x mapreduce
    10/10/16 22:17:43 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287260263699.log
    2010-10-16 22:17:43,896 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I have copied the hadoop settings files (/etc/hadoop/conf/*) from the
    remote

    cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH,
    PIGDIR,
    HADOOP_CLASSPATH,... I have also tried changing
    /etc/pig/conf/pig.configuration (even wrote there some free text so it
    would

    at least give me an error message) - nothing. It still connects to file:///
    and is still doesn't display a message about a jobtracker:
    -----
    $ export HADOOPDIR=/etc/hadoop/conf
    $ export PIG_PATH=/etc/pig/conf
    $ export PIG_CLASSPATH=$HADOOPDIR
    $ export PIG_HADOOP_VERSION=0.20.2
    $ export PIG_HOME="/usr/lib/pig"
    $ export PIG_CONF_DIR="/etc/pig/"
    $ export PIG_LOG_DIR="/var/log/pig"
    $ pig -x mapreduce
    10/10/16 22:32:34 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287261154272.log
    2010-10-16 22:32:34,471 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I am guessing I am doing something fundamentally wrong. How do I change the
    Pig's settings?

    More info: using Cloudera package hadoop-pig from CDH3b3
    (0.7.0+16-1~lenny-
    cdh3b3). I would appreciate some pointers.

    Kind regards,

    Anze
  • Anze at Oct 18, 2010 at 7:27 am
    Good idea. :)

    Here is the output for Cloudera CDH3b3 distribution in case someone else needs
    it:
    home:/usr/lib/pig/bin/.. conf:/usr/lib/pig/bin/../conf

    Thanks for helping me out!

    Anze

    On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote:
    Glad it worked for you :)

    I use the standard apache pig distributions.
    There are several places that environment variables can be changed and set,
    and I have no idea which one cloudera uses but here is a list:

    /etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here that
    sets the home variables and is managed by puppet)
    /etc/bash.bashrc (not good idea to set it here)
    $HOME/.bashrc (quick for users that don't have permission to root but not
    for production )
    $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects, gets
    sourced by $PIG_HOME/bin/pig )

    To see what variables your pig is picking up you can manually insert the
    lines
    echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the $PIG_HOME/bin/pig file
    just before it calls java.

    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Sunday, October 17, 2010 7:49 AM
    To: user@pig.apache.org
    Subject: Re: accessing remote cluster with Pig


    Gerrir, thank you for your answer! It has pointed me in the right
    direction.


    It looks like Pig (at least mine) ignores PIG_HOME. But with your help I
    was

    able to debug a bit further:
    -----
    $ find / -name 'pig.properties'
    /etc/pig/conf.dist/pig.properties
    /etc/pig/conf/pig.properties
    /usr/lib/pig/example-confs/conf.default/pig.properties
    /usr/lib/pig/conf/pig.properties
    -----

    I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what my
    Pig uses.

    So while Cloudera packaging makes /etc/pig/conf/pig.properties (the "Debian
    way"), it is not used at all. And it probably ignores the environment vars
    too.

    Thanks again! :)

    Anze
    On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote:
    Hi,

    Pig configuration is in the file: $PIG_HOME/conf/pig.properties

    The two parameters that tell pig where to find the namenode and job tracker
    are:

    E.g (assuming your using the default ports)

    ----[ $PIG_HOME/conf/pig.properties ]---------------

    fs.default.name=hdfs://<namenode url>:8020/
    mapred.job.tracker=<jobtracker url>:8021

    --------------

    Having these properties you don't need to specify pig -x mapreduce, just
    pig is enough.


    Cheers,

    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Saturday, October 16, 2010 9:53 PM
    To: user@pig.apache.org
    Subject: accessing remote cluster with Pig

    Hi again! :)

    I am trying to run Pig on a local machine, but I want it to connect to a
    remote cluster. I can't make it use my settings - whatever I do, I get
    this: -----
    $ pig -x mapreduce
    10/10/16 22:17:43 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287260263699.log
    2010-10-16 22:17:43,896 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I have copied the hadoop settings files (/etc/hadoop/conf/*) from the
    remote

    cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH,
    PIGDIR,
    HADOOP_CLASSPATH,... I have also tried changing
    /etc/pig/conf/pig.configuration (even wrote there some free text so it
    would

    at least give me an error message) - nothing. It still connects to file:///
    and is still doesn't display a message about a jobtracker:
    -----
    $ export HADOOPDIR=/etc/hadoop/conf
    $ export PIG_PATH=/etc/pig/conf
    $ export PIG_CLASSPATH=$HADOOPDIR
    $ export PIG_HADOOP_VERSION=0.20.2
    $ export PIG_HOME="/usr/lib/pig"
    $ export PIG_CONF_DIR="/etc/pig/"
    $ export PIG_LOG_DIR="/var/log/pig"
    $ pig -x mapreduce
    10/10/16 22:32:34 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287261154272.log
    2010-10-16 22:32:34,471 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I am guessing I am doing something fundamentally wrong. How do I change the
    Pig's settings?

    More info: using Cloudera package hadoop-pig from CDH3b3
    (0.7.0+16-1~lenny-
    cdh3b3). I would appreciate some pointers.

    Kind regards,

    Anze
  • Kaluskar, Sanjay at Oct 21, 2010 at 10:17 am
    I am trying to do the same (submitting a PIG script to a remote cluster
    from a Windows m/c) and the job gets submitted after setting the
    following in pig.properties:

    fs.default.name=hdfs://<node>:54310
    mapred.job.tracker=hdfs://<node>:54510

    However, my script fails because it looks for inputs under /user/DrWho.
    Is it possible to specify the hadoop cluster user in pig.properties? How
    does one control it? Where is DrWho coming from?

    Thanks,
    -sanjay

    -----Original Message-----
    From: Gerrit Jansen van Vuuren
    Sent: Sunday, October 17, 2010 6:47 PM
    To: user@pig.apache.org
    Subject: RE: accessing remote cluster with Pig

    Glad it worked for you :)

    I use the standard apache pig distributions.
    There are several places that environment variables can be changed and
    set, and I have no idea which one cloudera uses but here is a list:

    /etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here
    that sets the home variables and is managed by puppet) /etc/bash.bashrc
    (not good idea to set it here) $HOME/.bashrc (quick for users that
    don't have permission to root but not for production )
    $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects,
    gets
    sourced by $PIG_HOME/bin/pig )

    To see what variables your pig is picking up you can manually insert the
    lines echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the
    $PIG_HOME/bin/pig file just before it calls java.

    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Sunday, October 17, 2010 7:49 AM
    To: user@pig.apache.org
    Subject: Re: accessing remote cluster with Pig


    Gerrir, thank you for your answer! It has pointed me in the right
    direction.


    It looks like Pig (at least mine) ignores PIG_HOME. But with your help I
    was

    able to debug a bit further:
    -----
    $ find / -name 'pig.properties'
    /etc/pig/conf.dist/pig.properties
    /etc/pig/conf/pig.properties
    /usr/lib/pig/example-confs/conf.default/pig.properties
    /usr/lib/pig/conf/pig.properties
    -----

    I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what
    my
    Pig uses.

    So while Cloudera packaging makes /etc/pig/conf/pig.properties (the
    "Debian
    way"), it is not used at all. And it probably ignores the environment
    vars
    too.

    Thanks again! :)

    Anze


    On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote:
    Hi,

    Pig configuration is in the file: $PIG_HOME/conf/pig.properties

    The two parameters that tell pig where to find the namenode and job tracker
    are:

    E.g (assuming your using the default ports)

    ----[ $PIG_HOME/conf/pig.properties ]---------------

    fs.default.name=hdfs://<namenode url>:8020/
    mapred.job.tracker=<jobtracker url>:8021

    --------------

    Having these properties you don't need to specify pig -x mapreduce, just
    pig is enough.


    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Saturday, October 16, 2010 9:53 PM
    To: user@pig.apache.org
    Subject: accessing remote cluster with Pig

    Hi again! :)

    I am trying to run Pig on a local machine, but I want it to connect to a
    remote cluster. I can't make it use my settings - whatever I do, I get
    this: -----
    $ pig -x mapreduce
    10/10/16 22:17:43 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287260263699.log
    2010-10-16 22:17:43,896 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I have copied the hadoop settings files (/etc/hadoop/conf/*) from the
    remote

    cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH,
    PIGDIR,
    HADOOP_CLASSPATH,... I have also tried changing
    /etc/pig/conf/pig.configuration (even wrote there some free text so it
    would

    at least give me an error message) - nothing. It still connects to file:///
    and is still doesn't display a message about a jobtracker:
    -----
    $ export HADOOPDIR=/etc/hadoop/conf
    $ export PIG_PATH=/etc/pig/conf
    $ export PIG_CLASSPATH=$HADOOPDIR
    $ export PIG_HADOOP_VERSION=0.20.2
    $ export PIG_HOME="/usr/lib/pig"
    $ export PIG_CONF_DIR="/etc/pig/"
    $ export PIG_LOG_DIR="/var/log/pig"
    $ pig -x mapreduce
    10/10/16 22:32:34 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287261154272.log
    2010-10-16 22:32:34,471 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I am guessing I am doing something fundamentally wrong. How do I
    change
    the
    Pig's settings?

    More info: using Cloudera package hadoop-pig from CDH3b3
    (0.7.0+16-1~lenny-
    cdh3b3). I would appreciate some pointers.

    Kind regards,

    Anze
  • 김영우 at Oct 21, 2010 at 2:41 pm
    Hi Sanjay,

    You can specify a 'hadoop.job.ugi' property for your mapreduce job.

    e.g.,
    hadoop.job.ugi=username,groupname

    Hope this helps.

    Regards,

    - Youngwoo

    2010/10/21 Kaluskar, Sanjay <skaluskar@informatica.com>
    I am trying to do the same (submitting a PIG script to a remote cluster
    from a Windows m/c) and the job gets submitted after setting the
    following in pig.properties:

    fs.default.name=hdfs://<node>:54310
    mapred.job.tracker=hdfs://<node>:54510

    However, my script fails because it looks for inputs under /user/DrWho.
    Is it possible to specify the hadoop cluster user in pig.properties? How
    does one control it? Where is DrWho coming from?

    Thanks,
    -sanjay

    -----Original Message-----
    From: Gerrit Jansen van Vuuren
    Sent: Sunday, October 17, 2010 6:47 PM
    To: user@pig.apache.org
    Subject: RE: accessing remote cluster with Pig

    Glad it worked for you :)

    I use the standard apache pig distributions.
    There are several places that environment variables can be changed and
    set, and I have no idea which one cloudera uses but here is a list:

    /etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here
    that sets the home variables and is managed by puppet) /etc/bash.bashrc
    (not good idea to set it here) $HOME/.bashrc (quick for users that
    don't have permission to root but not for production )
    $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects,
    gets
    sourced by $PIG_HOME/bin/pig )

    To see what variables your pig is picking up you can manually insert the
    lines echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the
    $PIG_HOME/bin/pig file just before it calls java.

    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Sunday, October 17, 2010 7:49 AM
    To: user@pig.apache.org
    Subject: Re: accessing remote cluster with Pig


    Gerrir, thank you for your answer! It has pointed me in the right
    direction.


    It looks like Pig (at least mine) ignores PIG_HOME. But with your help I
    was

    able to debug a bit further:
    -----
    $ find / -name 'pig.properties'
    /etc/pig/conf.dist/pig.properties
    /etc/pig/conf/pig.properties
    /usr/lib/pig/example-confs/conf.default/pig.properties
    /usr/lib/pig/conf/pig.properties
    -----

    I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what
    my
    Pig uses.

    So while Cloudera packaging makes /etc/pig/conf/pig.properties (the
    "Debian
    way"), it is not used at all. And it probably ignores the environment
    vars
    too.

    Thanks again! :)

    Anze


    On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote:
    Hi,

    Pig configuration is in the file: $PIG_HOME/conf/pig.properties

    The two parameters that tell pig where to find the namenode and job tracker
    are:

    E.g (assuming your using the default ports)

    ----[ $PIG_HOME/conf/pig.properties ]---------------

    fs.default.name=hdfs://<namenode url>:8020/
    mapred.job.tracker=<jobtracker url>:8021

    --------------

    Having these properties you don't need to specify pig -x mapreduce, just
    pig is enough.


    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Saturday, October 16, 2010 9:53 PM
    To: user@pig.apache.org
    Subject: accessing remote cluster with Pig

    Hi again! :)

    I am trying to run Pig on a local machine, but I want it to connect to a
    remote cluster. I can't make it use my settings - whatever I do, I get
    this: -----
    $ pig -x mapreduce
    10/10/16 22:17:43 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287260263699.log
    2010-10-16 22:17:43,896 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I have copied the hadoop settings files (/etc/hadoop/conf/*) from the
    remote

    cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH,
    PIGDIR,
    HADOOP_CLASSPATH,... I have also tried changing
    /etc/pig/conf/pig.configuration (even wrote there some free text so it
    would

    at least give me an error message) - nothing. It still connects to file:///
    and is still doesn't display a message about a jobtracker:
    -----
    $ export HADOOPDIR=/etc/hadoop/conf
    $ export PIG_PATH=/etc/pig/conf
    $ export PIG_CLASSPATH=$HADOOPDIR
    $ export PIG_HADOOP_VERSION=0.20.2
    $ export PIG_HOME="/usr/lib/pig"
    $ export PIG_CONF_DIR="/etc/pig/"
    $ export PIG_LOG_DIR="/var/log/pig"
    $ pig -x mapreduce
    10/10/16 22:32:34 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287261154272.log
    2010-10-16 22:32:34,471 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I am guessing I am doing something fundamentally wrong. How do I
    change
    the
    Pig's settings?

    More info: using Cloudera package hadoop-pig from CDH3b3
    (0.7.0+16-1~lenny-
    cdh3b3). I would appreciate some pointers.

    Kind regards,

    Anze
  • Kaluskar, Sanjay at Oct 25, 2010 at 5:57 am
    Thanks, the job gets submitted as the right user with this property, so it works. But my job setup fails and I can't see any logs to figure out what went wrong. (The same pig script runs successfully when submitted from a Linux m/c that is part of the hadoop cluster). Do I need to set some options to see a log of the job setup?

    Thanks,
    -sanjay

    -----Original Message-----
    From: 김영우
    Sent: Thursday, October 21, 2010 8:11 PM
    To: user@pig.apache.org
    Subject: Re: accessing remote cluster with Pig

    Hi Sanjay,

    You can specify a 'hadoop.job.ugi' property for your mapreduce job.

    e.g.,
    hadoop.job.ugi=username,groupname

    Hope this helps.

    Regards,

    - Youngwoo

    2010/10/21 Kaluskar, Sanjay <skaluskar@informatica.com>
    I am trying to do the same (submitting a PIG script to a remote
    cluster from a Windows m/c) and the job gets submitted after setting
    the following in pig.properties:

    fs.default.name=hdfs://<node>:54310
    mapred.job.tracker=hdfs://<node>:54510

    However, my script fails because it looks for inputs under /user/DrWho.
    Is it possible to specify the hadoop cluster user in pig.properties?
    How does one control it? Where is DrWho coming from?

    Thanks,
    -sanjay

    -----Original Message-----
    From: Gerrit Jansen van Vuuren
    Sent: Sunday, October 17, 2010 6:47 PM
    To: user@pig.apache.org
    Subject: RE: accessing remote cluster with Pig

    Glad it worked for you :)

    I use the standard apache pig distributions.
    There are several places that environment variables can be changed and
    set, and I have no idea which one cloudera uses but here is a list:

    /etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here
    that sets the home variables and is managed by puppet)
    /etc/bash.bashrc (not good idea to set it here) $HOME/.bashrc (quick
    for users that don't have permission to root but not for production )
    $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects,
    gets
    sourced by $PIG_HOME/bin/pig )

    To see what variables your pig is picking up you can manually insert
    the lines echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the
    $PIG_HOME/bin/pig file just before it calls java.

    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Sunday, October 17, 2010 7:49 AM
    To: user@pig.apache.org
    Subject: Re: accessing remote cluster with Pig


    Gerrir, thank you for your answer! It has pointed me in the right
    direction.


    It looks like Pig (at least mine) ignores PIG_HOME. But with your help
    I was

    able to debug a bit further:
    -----
    $ find / -name 'pig.properties'
    /etc/pig/conf.dist/pig.properties
    /etc/pig/conf/pig.properties
    /usr/lib/pig/example-confs/conf.default/pig.properties
    /usr/lib/pig/conf/pig.properties
    -----

    I have changed /usr/lib/pig/conf/pig.properties and bingo - this is
    what my Pig uses.

    So while Cloudera packaging makes /etc/pig/conf/pig.properties (the
    "Debian way"), it is not used at all. And it probably ignores the
    environment vars too.

    Thanks again! :)

    Anze


    On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote:
    Hi,

    Pig configuration is in the file: $PIG_HOME/conf/pig.properties

    The two parameters that tell pig where to find the namenode and job tracker
    are:

    E.g (assuming your using the default ports)

    ----[ $PIG_HOME/conf/pig.properties ]---------------

    fs.default.name=hdfs://<namenode url>:8020/
    mapred.job.tracker=<jobtracker url>:8021

    --------------

    Having these properties you don't need to specify pig -x mapreduce, just
    pig is enough.


    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Saturday, October 16, 2010 9:53 PM
    To: user@pig.apache.org
    Subject: accessing remote cluster with Pig

    Hi again! :)

    I am trying to run Pig on a local machine, but I want it to connect
    to a
    remote cluster. I can't make it use my settings - whatever I do, I
    get
    this: -----
    $ pig -x mapreduce
    10/10/16 22:17:43 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287260263699.log
    2010-10-16 22:17:43,896 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I have copied the hadoop settings files (/etc/hadoop/conf/*) from
    the remote

    cluster's namenode to /home/pigtest/conf/ and exported
    PIG_CLASSPATH, PIGDIR, HADOOP_CLASSPATH,... I have also tried
    changing /etc/pig/conf/pig.configuration (even wrote there some free
    text so it would

    at least give me an error message) - nothing. It still connects to file:///
    and is still doesn't display a message about a jobtracker:
    -----
    $ export HADOOPDIR=/etc/hadoop/conf
    $ export PIG_PATH=/etc/pig/conf
    $ export PIG_CLASSPATH=$HADOOPDIR
    $ export PIG_HADOOP_VERSION=0.20.2
    $ export PIG_HOME="/usr/lib/pig"
    $ export PIG_CONF_DIR="/etc/pig/"
    $ export PIG_LOG_DIR="/var/log/pig"
    $ pig -x mapreduce
    10/10/16 22:32:34 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287261154272.log
    2010-10-16 22:32:34,471 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I am guessing I am doing something fundamentally wrong. How do I
    change
    the
    Pig's settings?

    More info: using Cloudera package hadoop-pig from CDH3b3
    (0.7.0+16-1~lenny-
    cdh3b3). I would appreciate some pointers.

    Kind regards,

    Anze
  • Santhosh Srinivasan at Oct 21, 2010 at 4:56 pm
    http://blog.rapleaf.com/dev/2010/01/05/the-wrath-of-drwho-or-unpredictable-hadoop-memory-usage/

    Check for the load on your client. Sometimes, if the client cannot determine your user name (who am i), your name node will receive DrWho which is the default user name when no user name is specified (read null)

    I have seen this behaviour when I use boxes with low memory especially on VMs.

    Santhosh

    -----Original Message-----
    From: Kaluskar, Sanjay
    Sent: Thursday, October 21, 2010 3:17 AM
    To: user@pig.apache.org
    Subject: RE: accessing remote cluster with Pig

    I am trying to do the same (submitting a PIG script to a remote cluster from a Windows m/c) and the job gets submitted after setting the following in pig.properties:

    fs.default.name=hdfs://<node>:54310
    mapred.job.tracker=hdfs://<node>:54510

    However, my script fails because it looks for inputs under /user/DrWho.
    Is it possible to specify the hadoop cluster user in pig.properties? How does one control it? Where is DrWho coming from?

    Thanks,
    -sanjay

    -----Original Message-----
    From: Gerrit Jansen van Vuuren
    Sent: Sunday, October 17, 2010 6:47 PM
    To: user@pig.apache.org
    Subject: RE: accessing remote cluster with Pig

    Glad it worked for you :)

    I use the standard apache pig distributions.
    There are several places that environment variables can be changed and set, and I have no idea which one cloudera uses but here is a list:

    /etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here that sets the home variables and is managed by puppet) /etc/bash.bashrc (not good idea to set it here) $HOME/.bashrc (quick for users that don't have permission to root but not for production )
    $PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects,
    gets
    sourced by $PIG_HOME/bin/pig )

    To see what variables your pig is picking up you can manually insert the lines echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the $PIG_HOME/bin/pig file just before it calls java.

    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Sunday, October 17, 2010 7:49 AM
    To: user@pig.apache.org
    Subject: Re: accessing remote cluster with Pig


    Gerrir, thank you for your answer! It has pointed me in the right direction.


    It looks like Pig (at least mine) ignores PIG_HOME. But with your help I was

    able to debug a bit further:
    -----
    $ find / -name 'pig.properties'
    /etc/pig/conf.dist/pig.properties
    /etc/pig/conf/pig.properties
    /usr/lib/pig/example-confs/conf.default/pig.properties
    /usr/lib/pig/conf/pig.properties
    -----

    I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what my Pig uses.

    So while Cloudera packaging makes /etc/pig/conf/pig.properties (the "Debian way"), it is not used at all. And it probably ignores the environment vars too.

    Thanks again! :)

    Anze


    On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote:
    Hi,

    Pig configuration is in the file: $PIG_HOME/conf/pig.properties

    The two parameters that tell pig where to find the namenode and job tracker
    are:

    E.g (assuming your using the default ports)

    ----[ $PIG_HOME/conf/pig.properties ]---------------

    fs.default.name=hdfs://<namenode url>:8020/
    mapred.job.tracker=<jobtracker url>:8021

    --------------

    Having these properties you don't need to specify pig -x mapreduce, just
    pig is enough.


    Cheers,
    Gerrit

    -----Original Message-----
    From: Anze
    Sent: Saturday, October 16, 2010 9:53 PM
    To: user@pig.apache.org
    Subject: accessing remote cluster with Pig

    Hi again! :)

    I am trying to run Pig on a local machine, but I want it to connect to a
    remote cluster. I can't make it use my settings - whatever I do, I get
    this: -----
    $ pig -x mapreduce
    10/10/16 22:17:43 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287260263699.log
    2010-10-16 22:17:43,896 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I have copied the hadoop settings files (/etc/hadoop/conf/*) from the
    remote

    cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH,
    PIGDIR, HADOOP_CLASSPATH,... I have also tried changing
    /etc/pig/conf/pig.configuration (even wrote there some free text so it
    would

    at least give me an error message) - nothing. It still connects to file:///
    and is still doesn't display a message about a jobtracker:
    -----
    $ export HADOOPDIR=/etc/hadoop/conf
    $ export PIG_PATH=/etc/pig/conf
    $ export PIG_CLASSPATH=$HADOOPDIR
    $ export PIG_HADOOP_VERSION=0.20.2
    $ export PIG_HOME="/usr/lib/pig"
    $ export PIG_CONF_DIR="/etc/pig/"
    $ export PIG_LOG_DIR="/var/log/pig"
    $ pig -x mapreduce
    10/10/16 22:32:34 INFO pig.Main: Logging error messages to:
    /home/pigtest/conf/pig_1287261154272.log
    2010-10-16 22:32:34,471 [main] INFO
    org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
    Connecting
    to
    hadoop file system at: file:///
    grunt>
    -----

    I am guessing I am doing something fundamentally wrong. How do I
    change
    the
    Pig's settings?

    More info: using Cloudera package hadoop-pig from CDH3b3
    (0.7.0+16-1~lenny-
    cdh3b3). I would appreciate some pointers.

    Kind regards,

    Anze

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupuser @
categoriespig, hadoop
postedOct 16, '10 at 8:53p
activeOct 25, '10 at 5:57a
posts9
users5
websitepig.apache.org

People

Translate

site design / logo © 2021 Grokbase