Hi! I've been playing around with wrapping up Pig UDFs as cascalog
operators. The following works for UDFs sub-classed from EvalFunc (based
in this case on linkedin's
just-published https://github.com/linkedin/datafu library):

(let [tf (org.apache.pig.data.TupleFactory/getInstance)
bf (org.apache.pig.data.BagFactory/getInstance)
m (datafu.pig.stats.Median.)
s (datafu.pig.stats.StreamingMedian.)]

(defmapop median [x]
(into [] (.getAll (.call m (.newDefaultBag bf (map (fn [y] (.newTuple
tf y)) x))))))

(defmapop streaming-median [x]
(into [] (.getAll (.call s (.newDefaultBag bf (map (fn [y] (.newTuple
tf y)) x))))))
)

and I can then call (eg) .... (median ?x :> ?y)

But it seems very compelling to be able to make this generic, via a macro,
so I could call any (in this case EvalFunc-based) UDF using something like:

... (pig-eval-func datafu.pig.stats.Median ?x :> ?y)

However I can't figure out how/whether it's possible to wrap a defmapop
macro in a macro of my own. Is this feasible? (And/or trivial, or
horribly complicated?)

Thanks!!

Mike

Search Discussions

  • Nathan Marz at Feb 23, 2012 at 12:26 am
    You probably want to make a higher order mapop for this, like:

    (defmapop [pig-eval-func [pig-func]] [& args]
    ;; use func on the args...
    )


    ;;Usage:
    (pig-eval-func [datafu.pig.stats.Median] ?x ?y :> ?z)

    On Wed, Feb 22, 2012 at 2:58 PM, R Daneel wrote:

    Hi! I've been playing around with wrapping up Pig UDFs as cascalog
    operators. The following works for UDFs sub-classed from EvalFunc (based
    in this case on linkedin's just-published
    https://github.com/linkedin/datafu library):

    (let [tf (org.apache.pig.data.TupleFactory/getInstance)
    bf (org.apache.pig.data.BagFactory/getInstance)
    m (datafu.pig.stats.Median.)
    s (datafu.pig.stats.StreamingMedian.)]

    (defmapop median [x]
    (into [] (.getAll (.call m (.newDefaultBag bf (map (fn [y] (.newTuple
    tf y)) x))))))

    (defmapop streaming-median [x]
    (into [] (.getAll (.call s (.newDefaultBag bf (map (fn [y] (.newTuple
    tf y)) x))))))
    )

    and I can then call (eg) .... (median ?x :> ?y)

    But it seems very compelling to be able to make this generic, via a macro,
    so I could call any (in this case EvalFunc-based) UDF using something like:

    ... (pig-eval-func datafu.pig.stats.Median ?x :> ?y)

    However I can't figure out how/whether it's possible to wrap a defmapop
    macro in a macro of my own. Is this feasible? (And/or trivial, or
    horribly complicated?)

    Thanks!!

    Mike

    --
    Twitter: @nathanmarz
    http://nathanmarz.com
  • R Daneel at Feb 23, 2012 at 12:42 am
    Thanks! Um... for me, the devil is still in the details :) I'm not sure
    how to use func on the args: ie. how do create the necessary instance of
    Median?
  • R Daneel at Feb 23, 2012 at 12:48 am
    That is, how do I make this work:

    (def instance (datafu.pig.stats.Median.)) ;; works fine, but not
    parameterized

    (def foo datafu.pig.stats.Median)
    (def instance (foo.)) ;; WRONG!!
  • Nathan Marz at Feb 23, 2012 at 1:18 am
    You can try try parameterizing the mapop with an instance of a class,
    rather than the class itself:

    (pig-eval-func [(datafu.pig.stats.Median.)] ?x ?y :> ?z)

    This will only work if Median is serializable. Alternatively, you can use
    Java reflection like this:

    (.newInstance foo)

    On Wed, Feb 22, 2012 at 4:48 PM, R Daneel wrote:

    That is, how do I make this work:

    (def instance (datafu.pig.stats.Median.)) ;; works fine, but not
    parameterized

    (def foo datafu.pig.stats.Median)
    (def instance (foo.)) ;; WRONG!!

    --
    Twitter: @nathanmarz
    http://nathanmarz.com
  • R Daneel at Feb 23, 2012 at 1:29 am
    Awesome; works like a charm!! :) Thanks very much: this was very
    educational (and should probably have been in the clojure-user group in the
    first place!)
  • R Daneel at Feb 23, 2012 at 2:40 am
    Oops: one more thing! Here's my (working, generic) operator (thanks
    again!):

    (defmapop
    [pig-eval-func [pig-func]] [& x]
    (into [] (.getAll (.call (.newInstance pig-func) (.newDefaultBag bf
    (apply map (fn [y] (.newTuple tf y)) x))))))

    and here's how I call it:

    (pig-eval-func [datafu.pig.stats.Median] ?x :> ?fx)

    buy in this form I can't figure out how to wrap it with c/each, ie. so I
    can call it like this:

    (pig-eval-func [datafu.pig.stats.Median] ?x ?y :> ?fx ?fy)

    Is that possible without drastic surgery?
  • Nathan Marz at Feb 23, 2012 at 4:30 am
    Not currently, unfortunately. But we have a redesign of the ops planned
    that will make that possible.

    On Wed, Feb 22, 2012 at 6:40 PM, R Daneel wrote:

    Oops: one more thing! Here's my (working, generic) operator (thanks
    again!):

    (defmapop
    [pig-eval-func [pig-func]] [& x]
    (into [] (.getAll (.call (.newInstance pig-func) (.newDefaultBag bf
    (apply map (fn [y] (.newTuple tf y)) x))))))

    and here's how I call it:

    (pig-eval-func [datafu.pig.stats.Median] ?x :> ?fx)

    buy in this form I can't figure out how to wrap it with c/each, ie. so I
    can call it like this:

    (pig-eval-func [datafu.pig.stats.Median] ?x ?y :> ?fx ?fy)

    Is that possible without drastic surgery?


    --
    Twitter: @nathanmarz
    http://nathanmarz.com
  • R Daneel at Feb 23, 2012 at 4:35 am
    Naturally :) (Might that be part of 1.9.0 by any chance?) Thanks again :)
  • Nathan Marz at Feb 23, 2012 at 4:57 am
    It's planned for 2.0, since it won't be backwards compatible.


    On Wed, Feb 22, 2012 at 8:35 PM, R Daneel wrote:

    Naturally :) (Might that be part of 1.9.0 by any chance?) Thanks again :)



    --
    Twitter: @nathanmarz
    http://nathanmarz.com

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedFeb 22, '12 at 10:58p
activeFeb 23, '12 at 4:57a
posts10
users2
websiteclojure.org
irc#clojure

2 users in discussion

R Daneel: 6 posts Nathan Marz: 4 posts

People

Translate

site design / logo © 2022 Grokbase