FAQ
I've come across similar issues of the options for child JVMs specified in
with-job-conf not "sticking". I experienced GC issues in a reducer of one
of my Cascalog jobs for the first time last week. I found the
with-job-conf macro and wrapped the query execution form with it, to no
avail:

(let [snk-qry-by-chan (for [chan channels]
                           (channel-query chan))
         all-snk-qry-seq (apply concat snk-qry-by-chan)]
     ;; configure the MapReduce child JVM options to avoid GC Overhead Limit
err
     (with-job-conf {"mapred.child.java.opts" "-XX:-UseGCOverheadLimit
-Xmx4g"}
       ;; execute all of the queries in parallel
       (apply ?- all-snk-qry-seq)))

The relevant parts of my project.clj

   :dependencies [[org.clojure/clojure "1.5.1"]
                  [cascalog "1.10.1"]
                  [incanter "1.4.1"]]
   :repositories {"cloudera" "
https://repository.cloudera.com/artifactory/cloudera-repos"}
   :profiles {:provided {:dependencies [[org.apache.hadoop/hadoop-core
  "0.20.2-cdh3u5"]]}}

But from the logging output from the reducer in question, regardless of
what I specified in with-job-conf, I always saw this:

2013-07-12 17:25:55,216 INFO cascading.flow.hadoop.FlowMapper: child
jvm opts: -Xmx1073741824


Further details:

* We're running a Cloudera distribution (v 4.1.4) of Hadoop, and the
version of Hadoop is 2.0.0.
* I'm running Cascalog in cluster mode (I uberjar the code whenever I
deploy).
* The exception being thrown from the JVM is a GC Overhead Limit exceeded
(as opposed to something like OutOfMemoryError).


I saw Robin's workaround, which seems to just modify the site-hadoop.xml.
  It would be great if the with-job-conf settings "stuck" so as not to have
to tweak site settings for per-job needs (especially since I don't manage
the Hadoop cluster).

-- Elango

On Mon, Jul 15, 2013 at 3:54 PM, Robin Kraft wrote:

And if for some reason the job-specific memory settings don't "take",
add/modify that setting in hadoop-site.xml. I ran into this with EMR and
documented it here:

https://github.com/reddmetrics/forma-clj/issues/70

Of course that issue may be specific to our cluster config.


On Jul 15, 2013, at 3:08 PM, Sam Ritchie wrote:

Well, Cascalog doesn't actually kick off any Hadoop instances... that
said, I think hadoop does have a "mapred.child.java.opts" setting, which
you can use with with-job-conf:

(with-job-conf {"mapred.child.java.opts" "-Xmx200m"}
.......)

<compose-unknown-contact.jpg>
Diego Gilscarbo <dgilscar@gmail.com>
July 15, 2013 3:07 PM
Hello,

I am wondering if there is a known way to configure the Hadoop instance
that is kicked off by Cascalog. I am seeing the following log line. I
would like to increase the amount of heap space. How do I do that? This
is probably documented somewhere but I was unable to find it. Any help
would be appreciated!

13/07/15 15:05:38 INFO hadoop.FlowReducer: child jvm opts: -Xmx200m


Relevant parts of project.clj

:jvm-opts ["-Xmx1024m" "-server"]

:dependencies [

[org.clojure/clojure "1.5.1"]

[cascalog "1.10.1"]

:profiles { :dev {:dependencies [[org.apache.hadoop/hadoop-core
"0.20.2-dev"]]}}


--
You received this message because you are subscribed to the Google Groups
"cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to cascalog-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




--
Sam Ritchie, Twitter Inc
703.662.1337
@sritchie

--
You received this message because you are subscribed to the Google Groups
"cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to cascalog-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups
"cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to cascalog-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 7 | next ›
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedJul 15, '13 at 10:07p
activeJul 18, '13 at 3:09p
posts7
users5
websiteclojure.org
irc#clojure

People

Translate

site design / logo © 2021 Grokbase