Another approach I've been using for a while successfully:
We have some auxiliary boxes next to the cluster for users to SSH to
and execute Hadoop stuff (like Pig scripts, etc)
I use one of these boxes to:
* copy the application JAR into my home folder there
* run "hadoop jar <app.jar> clojure.main -i swank-server.clj" in a
screen session
* SSH port forward a local port to port 4005 on the remote aux box
* M-x slime-connect to the local port
The application jar holds loads of predefined functions, cascalog
operators and queries that I then can use to query data on the
cluster. As long as no new subqueries or operations are needed the
application doesn't need to be repackaged and redeployed. In such a
case, though, writing some additional operations, repackage and redeploy
with a one liner, plus two additional steps of restarting the app on the
remote box and M-x slime-connect to it again.
swank-server.clj looks like this:
(require 'swank.swank)
(swank.swank/start-server :host $HOST :port 4005)
(swank-clojure needs to be packaged in the app JAR)
With this setup in place I have a SLIME REPL open for days to
interactively query various datasets, without restarting.
Maybe this is useful to some.
Stefan Hübner
David McNeil <mcneil.david@gmail.com>
writes:
On Mar 8, 3:17 am, Sam Ritchie wrote:
What do you do with the local process? Leave it running in a screen or something?
Our application runs as a server process that accepts user requests
and dynamically turns them into Cascalog queries which are executed on
our Hadoop cluster. The setup we currently have works as follows:
* install our application jars on the Hadoop classpath via hadoop-
env.sh
* put the Hadoop config files pointing to our cluster on our
application classpath (i.e. core-site.xml, hdfs-site.xml, mapred-
site.xml)
* currently we are running our application as a standalone process on
the same machine that serves as the Hadoop master. I don't believe
this is strictly necessary, but rather than keep fighting networking
and security issues I opted to run our app on the master node for now.
-David