|
Gerrard McNulty |
at Jan 4, 2012 at 9:48 pm
|
⇧ |
| |
I'm using Maxmind Geoip to do different kinds of lookups on ip
addresses.
They export their databases as CSV or .dat files (with a library to
access
the .dat). I'd like to make cascalog queries based on lookup
information
without performing an expensive join or adding the lookups in advance
on my
data
It seems pushing the .dat file out to the distributed cache is the
quickest way
to do this, but of course I'm open to suggestions :)
On Jan 4, 6:24 pm, Sam Ritchie wrote:There isn't any way of doing this currently. What are you trying to share
in the distributed cache? One other option might be to distribute the value
to each operation as a parametrized argument, as with
(defmapop [add-n [n]]
[x]
(+ x n))
but this gets a little flakey with large data structures as arguments.
On Wed, Jan 4, 2012 at 5:49 AM, Gerrard McNulty
wrote:
Hi guys,
If I want to share a file in the distributed cache from a cascalog
query, do I have to fall back to hadoop's java apis? Or is there a
cascalog way of doing it?
--
Sam Ritchie, Twitter Inc
703.662.1337
@sritchie09
(Too brief? Here's why!http://emailcharter.org)