FAQ
Some 10 days ago I wrote something similar to this on this forum, but here
is quick overview - you tell me if you need more details...

Anyway, I too noticed that most of example out there use shell-way, and I
feel in best control when I do things programmatically, so I can decide
when/how to submit jobs. Moreover, if I have that, then I can submit jobs
from my IDE, which I find the best way for development. I actually even
develop on my windows machine, so I even had to overcome even more problems.

Anyway, here is quick recap of what I'm doing:

1. During my app boot time, I copy programmatically all MR-required JARS
from my local dir to HDFS 3rd party lib dir; Even app JAR is inside this
dir; I prepare this dir using Gradle task described below.
2. Before submitting any Job (or before calling any lib that I know to
submit Job internally), I construct my Configuration, and use it to add all
JARs inside mentioned 3rd party lib dir into distributed cache - now the
Configuration is populated with paths to 3rd party lib JARs
3. Since Cascalog/Cascading require Configuration props to be passed as
Map/Properties, I convert Configuration to these objects and pass to them

This way you don't even need to specify your main app JAR to hadoop job
submitter, because this is provided also as 3rd party JAR in distributed
cache, but if you don't want htis warning apearing in your logs, you can
easily decide to specify its path, since you know how its called inside
your local 3rd party lib dir. I never use resolving JAR from example class
since it is not safe in various environments (such as IDE).

### Preparing local dir with MR-required JARs via Gradle:

1. create special gradle configuration "mapreduce" specifying all deps that
are needed for MR jobs (such as cascalog etc), but excluding hadoop-core
from it (since it is provided in hadoop environment); default "compile"
configuration extends "mapreduce" configuration and has hadoop-core dep
(needed for compiling of course)

configurations {
     mapreduce {
     }
     compile {
         extendsFrom mapreduce
     }
}

2. create special gradle task to copy all jars from "mapreduce"
configuration to desired local lib, together with output of compile task
(so you would have your app's JAR inside it too).

task prepareMapReduceLibs(type: Sync, dependsOn: jar) {
     from jar.outputs.files
     from configurations.mapreduce.files
     into 'mapreducelib'
}

Cheers,
Vjeran
On Tuesday, May 21, 2013 2:58:25 AM UTC+2, David Kincaid wrote:

We've been having some problems with the way that Spring is manipulating
the class path, so right now I'm leaning in that direction.

If anyone is doing anything other than using the hadoop shell script to
run jobs, I'd love to hear how you're doing it. I want to submit the jobs
programmatically from Java instead of shelling out to a script. We've been
trying to get this working for about 3 weeks. We're closer than ever, but
it's just one thing after another.
On Monday, May 20, 2013 6:16:40 PM UTC-5, Mason wrote:

I can't remember the exact cause, but I've gotten an error like that
which had nothing to do with Kryo. I think it's something going on in all
the metaprogramming.

On 5/20/13 14:28 PM, David Kincaid wrote:

Thanks. I don't think so. Here's the catch. The query runs fine if I run
it by itself from a repl. The stack trace I posted above comes when we try
to use Spring-Hadoop to run the query on a remote Hadoop cluster. I'll see
if I can put together a small failing query. We're reading Thrift objects
out of a dfs-datastores Pail that uses the SplitDataStructure from Nathan's
Big Data book. I'll see if I can put something together tonight.

Thanks,

Dave
On Monday, May 20, 2013 4:24:13 PM UTC-5, Sam Ritchie wrote:

Great, a small failing test case would be good. Note that currently any
operation or function must be attached to a var, so functions created in a
closure are not allowed:

(let [op (comp :keyword first)]
(<- [?x ?y]
(src ?x)
(op ?x :> ?y)))

Could this be what's happening?

David Kincaid
May 20, 2013 2:17 PM
I think it's too big and in a lot of pieces to post on here. I'll see
if I can pull something together in a Gist or something.

On Monday, May 20, 2013 3:57:59 PM UTC-5, Sam Ritchie wrote: --
You received this message because you are subscribed to the Google
Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to cascalog-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Sam Ritchie
May 20, 2013 1:57 PM
Can you post your query?


David Kincaid
May 20, 2013 1:55 PM
Has anyone seen something like this before when trying to run a
Cascalog query. It seems to be a problem somewhere in the Kryo
serialization, but I'm not sure where to start troubleshooting this one.
Any ideas?



java.lang.ClassCastException: clojure.core$eval1 cannot be cast to
clojure.lang.IFn
at clojure.lang.Compiler.eval(Compiler.java:6618)
at clojure.lang.Compiler.eval(Compiler.java:6608)
at clojure.lang.Compiler.load(Compiler.java:7064)
at clojure.lang.RT.loadResourceScript(RT.java:370)
at clojure.lang.RT.loadResourceScript(RT.java:361)
at clojure.lang.RT.load(RT.java:440)
at clojure.lang.RT.load(RT.java:411)
at clojure.core$load$fn__5018.invoke(core.clj:5530)
at clojure.core$load.doInvoke(core.clj:5529)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.core$load_one.invoke(core.clj:5336)
at clojure.core$load_lib$fn__4967.invoke(core.clj:5375)
at clojure.core$load_lib.doInvoke(core.clj:5374)
at clojure.lang.RestFn.applyTo(RestFn.java:142)
at clojure.core$apply.invoke(core.clj:619)
at clojure.core$load_libs.doInvoke(core.clj:5413)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at clojure.core$apply.invoke(core.clj:619)
at clojure.core$require.doInvoke(core.clj:5496)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.lang.Var.invoke(Var.java:415)
at carbonite.JavaBridge.initialize(Unknown Source)
at carbonite.JavaBridge.registerPrimitives(Unknown Source)
at carbonite.JavaBridge.enhanceRegistry(Unknown Source)
at
cascalog.hadoop.ClojureKryoSerialization.decorateKryo(ClojureKryoSerialization.java:26)
at
cascading.kryo.KryoSerialization.populatedKryo(KryoSerialization.java:50)
at cascalog.KryoService.freshKryo(KryoService.java:41)
at cascalog.KryoService.serialize(KryoService.java:49)
at
cascalog.ClojureCascadingBase.initialize(ClojureCascadingBase.java:37)
at cascalog.ClojureCascadingBase.<init>(ClojureCascadingBase.java:42)
at cascalog.ClojureFilter.<init>(ClojureFilter.java:28)
at cascalog.workflow$filter$fn__1088.invoke(workflow.clj:130)
at cascalog.predicate$fn__1463.invoke(predicate.clj:525)
at clojure.lang.MultiFn.invoke(MultiFn.java:249)
at cascalog.predicate$build_predicate.invoke(predicate.clj:562)
at clojure.lang.AFn.applyToHelper(AFn.java:178)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:619)
at clojure.lang.AFn.applyToHelper(AFn.java:167)
at clojure.lang.RestFn.applyTo(RestFn.java:132)
at clojure.core$apply.invoke(core.clj:621)
at clojure.core$partial$fn__4192.doInvoke(core.clj:2398)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.core$map$fn__4207.invoke(core.clj:2485)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:60)
at clojure.lang.RT.seq(RT.java:484)
at clojure.core$seq.invoke(core.clj:133)
at clojure.core.protocols$seq_reduce.invoke(protocols.clj:30)
at clojure.core.protocols$fn__6026.invoke(protocols.clj:54)
at
clojure.core.protocols$fn__5979$G__5974__5992.invoke(protocols.clj:13)
at clojure.core$reduce.invoke(core.clj:6177)
at clojure.core$group_by.invoke(core.clj:6492)
at cascalog.rules$split_predicates.invoke(rules.clj:47)
at cascalog.rules$build_query.invoke(rules.clj:552)
at cascalog.rules$build_rule.invoke(rules.clj:653)
at
com.idexx.lambda.hadoop.jobs.patientvisits.generators$transaction_source.invoke(generators.clj:67)
at
com.idexx.lambda.hadoop.jobs.patientvisits.summary$launch_workflow$fn__586.invoke(summary.clj:49)
at cascalog.checkpoint$mk_runner$fn__2399.invoke(checkpoint.clj:60)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:722)
--
You received this message because you are subscribed to the Google
Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to cascalog-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




--
Sam Ritchie, Twitter Inc
703.662.1337
@sritchie

--
You received this message because you are subscribed to the Google Groups
"cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to cascalog-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 8 of 9 | next ›
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedMay 20, '13 at 8:55p
activeMay 21, '13 at 12:19p
posts9
users4
websiteclojure.org
irc#clojure

People

Translate

site design / logo © 2021 Grokbase