I am trying to do the same (submitting a PIG script to a remote cluster
from a Windows m/c) and the job gets submitted after setting the
following in pig.properties:
fs.default.name=hdfs://<node>:54310
mapred.job.tracker=hdfs://<node>:54510
However, my script fails because it looks for inputs under /user/DrWho.
Is it possible to specify the hadoop cluster user in pig.properties? How
does one control it? Where is DrWho coming from?
Thanks,
-sanjay
-----Original Message-----
From: Gerrit Jansen van Vuuren
Sent: Sunday, October 17, 2010 6:47 PM
To: user@pig.apache.org
Subject: RE: accessing remote cluster with Pig
Glad it worked for you :)
I use the standard apache pig distributions.
There are several places that environment variables can be changed and
set, and I have no idea which one cloudera uses but here is a list:
/etc/profile.d/<any file> (we have hadoop.sh, pig.sh and java.sh here
that sets the home variables and is managed by puppet) /etc/bash.bashrc
(not good idea to set it here) $HOME/.bashrc (quick for users that
don't have permission to root but not for production )
$PIG_HOME/conf/pig-env.sh (standard in all hadoop related projects,
gets
sourced by $PIG_HOME/bin/pig )
To see what variables your pig is picking up you can manually insert the
lines echo "home:$PIG_HOME conf:$PIG_CONF_DIR" into the
$PIG_HOME/bin/pig file just before it calls java.
Cheers,
Gerrit
-----Original Message-----
From: Anze
Sent: Sunday, October 17, 2010 7:49 AM
To: user@pig.apache.org
Subject: Re: accessing remote cluster with Pig
Gerrir, thank you for your answer! It has pointed me in the right
direction.
It looks like Pig (at least mine) ignores PIG_HOME. But with your help I
was
able to debug a bit further:
-----
$ find / -name 'pig.properties'
/etc/pig/conf.dist/pig.properties
/etc/pig/conf/pig.properties
/usr/lib/pig/example-confs/conf.default/pig.properties
/usr/lib/pig/conf/pig.properties
-----
I have changed /usr/lib/pig/conf/pig.properties and bingo - this is what
my
Pig uses.
So while Cloudera packaging makes /etc/pig/conf/pig.properties (the
"Debian
way"), it is not used at all. And it probably ignores the environment
vars
too.
Thanks again! :)
Anze
On Sunday 17 October 2010, Gerrit Jansen van Vuuren wrote:
Hi,
Pig configuration is in the file: $PIG_HOME/conf/pig.properties
The two parameters that tell pig where to find the namenode and job tracker
are:
E.g (assuming your using the default ports)
----[ $PIG_HOME/conf/pig.properties ]---------------
fs.default.name=hdfs://<namenode url>:8020/
mapred.job.tracker=<jobtracker url>:8021
--------------
Having these properties you don't need to specify pig -x mapreduce, just
pig is enough.
Cheers,
Gerrit
-----Original Message-----
From: Anze
Sent: Saturday, October 16, 2010 9:53 PM
To: user@pig.apache.org
Subject: accessing remote cluster with Pig
Hi again! :)
I am trying to run Pig on a local machine, but I want it to connect to a
remote cluster. I can't make it use my settings - whatever I do, I get
this: -----
$ pig -x mapreduce
10/10/16 22:17:43 INFO pig.Main: Logging error messages to:
/home/pigtest/conf/pig_1287260263699.log
2010-10-16 22:17:43,896 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting
to
hadoop file system at: file:///
grunt>
-----
I have copied the hadoop settings files (/etc/hadoop/conf/*) from the
remote
cluster's namenode to /home/pigtest/conf/ and exported PIG_CLASSPATH,
PIGDIR,
HADOOP_CLASSPATH,... I have also tried changing
/etc/pig/conf/pig.configuration (even wrote there some free text so it
would
at least give me an error message) - nothing. It still connects to file:///
and is still doesn't display a message about a jobtracker:
-----
$ export HADOOPDIR=/etc/hadoop/conf
$ export PIG_PATH=/etc/pig/conf
$ export PIG_CLASSPATH=$HADOOPDIR
$ export PIG_HADOOP_VERSION=0.20.2
$ export PIG_HOME="/usr/lib/pig"
$ export PIG_CONF_DIR="/etc/pig/"
$ export PIG_LOG_DIR="/var/log/pig"
$ pig -x mapreduce
10/10/16 22:32:34 INFO pig.Main: Logging error messages to:
/home/pigtest/conf/pig_1287261154272.log
2010-10-16 22:32:34,471 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting
to
hadoop file system at: file:///
grunt>
-----
I am guessing I am doing something fundamentally wrong. How do I
change
the
Pig's settings?
More info: using Cloudera package hadoop-pig from CDH3b3
(0.7.0+16-1~lenny-
cdh3b3). I would appreciate some pointers.
Kind regards,
Anze