FAQ
I'm trying to create a custom ParallelAgg class in JCascalog. There isn't
much documentation on how that is supposed to work. Could someone help me
understand when each of the methods is called and what the arguments to
them are? The inputs to the method are going to be date strings.

Here's the shell of my class:

     private static class MaxTransaction implements ParallelAgg {
          public void prepare(FlowProcess flowProcess, OperationCall
operationCall) {

         }

         @Override
         public List<Object> init(List<Object> input) {

         }

         @Override
         public List<Object> combine(List<Object> val1, List<Object> val2) {

         }
     }

--
You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

  • Sam Ritchie at Jun 12, 2013 at 5:42 pm
    I don't use JCascalog, really, but here's my understanding:

    "prepare" is the same as in Cascading's operations.

    http://docs.cascading.org/cascading/1.2/javadoc/cascading/operation/Operation.html

    This method is called once on the operations' instantiation. This is
    where you should set up any state you need.

    "Init" is called once for every tuple that you're aggregating. Why would
    you need this?

    Say you're trying to parallel-aggregate an Average. Because this happens
    in parallel, you need to know the weighting on the items you're
    combining... so you might emit a list with [item-value, 1].

    Your combine function is going to combine whatever you emit from init,
    two at a time. In this example, you could perform a weighted average of
    two of the pairs you emitted above.

    (ParallelAgg really should have a "present" method of something that
    allows you to undo the "init".)

    If this makes sense and helps out, I'd love if you would add it to the
    Cascalog wiki!
    David Kincaid June 11, 2013 8:21 PM
    I'm trying to create a custom ParallelAgg class in JCascalog. There
    isn't much documentation on how that is supposed to work. Could
    someone help me understand when each of the methods is called and what
    the arguments to them are? The inputs to the method are going to be
    date strings.

    Here's the shell of my class:

    private static class MaxTransaction implements ParallelAgg {
    public void prepare(FlowProcess flowProcess, OperationCall
    operationCall) {

    }

    @Override
    public List<Object> init(List<Object> input) {

    }

    @Override
    public List<Object> combine(List<Object> val1, List<Object>
    val2) {

    }
    }

    --
    You received this message because you are subscribed to the Google
    Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send
    an email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie

    --
    You received this message because you are subscribed to the Google Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.
  • David Kincaid at Jun 12, 2013 at 6:05 pm
    Thanks, Sam! That is exactly what I was looking for. Very helpful. I'll see
    about getting this onto the Wiki.
    On Wednesday, June 12, 2013 1:42:11 PM UTC-4, Sam Ritchie wrote:

    I don't use JCascalog, really, but here's my understanding:

    "prepare" is the same as in Cascading's operations.


    http://docs.cascading.org/cascading/1.2/javadoc/cascading/operation/Operation.html

    This method is called once on the operations' instantiation. This is where
    you should set up any state you need.

    "Init" is called once for every tuple that you're aggregating. Why would
    you need this?

    Say you're trying to parallel-aggregate an Average. Because this happens
    in parallel, you need to know the weighting on the items you're
    combining... so you might emit a list with [item-value, 1].

    Your combine function is going to combine whatever you emit from init, two
    at a time. In this example, you could perform a weighted average of two of
    the pairs you emitted above.

    (ParallelAgg really should have a "present" method of something that
    allows you to undo the "init".)

    If this makes sense and helps out, I'd love if you would add it to the
    Cascalog wiki!

    David Kincaid <javascript:>
    June 11, 2013 8:21 PM
    I'm trying to create a custom ParallelAgg class in JCascalog. There isn't
    much documentation on how that is supposed to work. Could someone help me
    understand when each of the methods is called and what the arguments to
    them are? The inputs to the method are going to be date strings.

    Here's the shell of my class:

    private static class MaxTransaction implements ParallelAgg {
    public void prepare(FlowProcess flowProcess, OperationCall
    operationCall) {

    }

    @Override
    public List<Object> init(List<Object> input) {

    }

    @Override
    public List<Object> combine(List<Object> val1, List<Object> val2) {

    }
    }

    --
    You received this message because you are subscribed to the Google Groups
    "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to cascalog-use...@googlegroups.com <javascript:>.
    For more options, visit https://groups.google.com/groups/opt_out.




    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie
    --
    You received this message because you are subscribed to the Google Groups "cascalog-user" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/groups/opt_out.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedJun 12, '13 at 3:21a
activeJun 12, '13 at 6:05p
posts3
users2
websiteclojure.org
irc#clojure

2 users in discussion

David Kincaid: 2 posts Sam Ritchie: 1 post

People

Translate

site design / logo © 2022 Grokbase