FAQ
Hi -- I have a cascalog job which uses the maxmind geoip data.

I have my GeoLiteCity.dat file in the resources/ folder. I then run the
uberjar in EMR.

I construct the lookup service like this:

(def lookup (new LookupService
(.getPath (clojure.java.io/resource "GeoLiteCity.dat")) LookupService/GEOIP_MEMORY_CACHE))


The weird thing is that this works fine as long as the geoip lookup is
being done at the reduce stage.

However, whenever I have job which uses the geoip look in the map stage it
complains because

(.getPath (clojure.java.io/resource "GeoLiteCity.dat"))

returns a weird looking path from a jar url -- which the lookup service
can't use (it needs a File in order to created a RandomAccessFile)

I put in some print statements

when used in the *reduce stage* the path returned looks like

/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201303202141_0001/jars/GeoLiteCity.dat


when used in the *map stage* the path returned looks like


file:/mnt/var/lib/hadoop/mapred/taskTracker/hadoop/jobcache/job_201303202135_0001/jars/job.jar!/GeoLiteCity.dat


my question is -- is this an artifact of hadoop (it simply doesn't extract the jar during map stage? in which case the file simply isn't there), or is there something weird about the map stage that just affects clojure.java.io/resource?


Thanks!


--
You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedMar 20, '13 at 11:38p
activeMar 20, '13 at 11:38p
posts1
users1
websiteclojure.org
irc#clojure

1 user in discussion

Andy Xue: 1 post

People

Translate

site design / logo © 2022 Grokbase