FAQ
I've written a (trivial) aggregator for mean, and another for variance:

(defaggregateop mean
"Aggregates the arithmetic mean of its input."
([] [0 0])
([[sum count] val] [(+ sum val) (inc count)])
([[sum count]] [(float (/ sum count))])
)

(defaggregateop var
"Aggregates the variance of its input."
([] [0 0 0])
([[sumsq sum count] val] [(+ sumsq (* val val)) (+ sum val) (inc count)])
([[sumsq sum count]] (let [mean (float (/ sum count))] [(- (float (/
sumsq count)) (* mean mean))]))
)

I'd like to do two things:

1) combine them into one aggregator that returns both (and thus
accumulates sum and count once only);
2) implement them with a combiner.

I've seen the definition of sum (each sum-parallel), where sum-parallel is
a defparallelop,
but I don't see how that applies here, since defparallelop doesn't allow
maintaining state;
and I don't see any other way to use a combiner. Are one or both of these
problems
(easily) solvable?

Many thanks (still learning!),

Mike

## Search Discussions

• at Nov 9, 2011 at 5:41 am ⇧ Hey Mike,

Here's how to do mean and variance using combiners:

https://gist.github.com/1350522

Note that the implementation of "avg" is from cascalog.ops. This
implementation makes use of predicate macros, which allow for the
arbitrary composition of predicates. So even though "avg" itself can't
be defined as a parallel aggregator, its components "count" and "sum"
can, and they can be composed with "div" to produce an optimized
version of the aggregator. A similar approach is taken for variance.

Predicate macros, as you can see, are really powerful.

-Nathan

On Nov 8, 9:07 pm, R Daneel wrote:
I've written a (trivial) aggregator for mean, and another for variance:

(defaggregateop mean
"Aggregates the arithmetic mean of its input."
([] [0 0])
([[sum count] val] [(+ sum val) (inc count)])
([[sum count]] [(float (/ sum count))])
)

(defaggregateop var
"Aggregates the variance of its input."
([] [0 0 0])
([[sumsq sum count] val] [(+ sumsq (* val val)) (+ sum val) (inc count)])
([[sumsq sum count]] (let [mean (float (/ sum count))] [(- (float (/
sumsq count)) (* mean mean))]))
)

I'd like to do two things:

1) combine them into one aggregator that returns both (and thus
accumulates sum and count once only);
2) implement them with a combiner.

I've seen the definition of sum (each sum-parallel), where sum-parallel is
a defparallelop,
but I don't see how that applies here, since defparallelop doesn't allow
maintaining state;
and I don't see any other way to use a combiner.  Are one or both of these
problems
(easily) solvable?

Many thanks (still learning!),

Mike
• at Nov 9, 2011 at 5:48 am ⇧ Thanks very much!!! I'll be digesting this for a while :)

Mike
• at Nov 9, 2011 at 7:19 am ⇧ I'm just playing around and have added another predicate that's composed
with the variance:

(def unbiased-variance
(<- [!val :> !var]
(variance !val :> !j)
(c/count !count)
(* !j !count :> !k)
(- !count 1 :> !l)
(/ !k !l :> !var)))

but I had to re-calculate the !count variable. Is cascalog smart enough to
avoid actually accumulating !count twice?
• at Nov 9, 2011 at 7:34 am ⇧ Not yet. I opened up a ticket for Cascalog to detect duplicate operations
and rewrite the query to avoid wasted work:

https://www.assembla.com/spaces/cascalog/tickets/31-cascalog-should-detect-duplicate-operations-and-rewrite-the-query-to-avoid-wasted-work

On Tue, Nov 8, 2011 at 10:46 PM, R Daneel wrote:

I'm just playing around and have added another predicate that's composed
with the variance:

(def unbiased-variance
(<- [!val :> !var]
(variance !val :> !j)
(c/count !count)
(* !j !count :> !k)
(- !count 1 :> !l)
(/ !k !l :> !var)))

but I had to re-calculate the !count variable. Is cascalog smart enough
to avoid actually accumulating !count twice?

--
http://nathanmarz.com
• at Nov 9, 2011 at 6:14 pm ⇧ Great! Thanks again: predicate macros are awesome :)

## Related Discussions

Discussion Overview
 group cascalog-user categories clojure, hadoop posted Nov 9, '11 at 5:08a active Nov 9, '11 at 6:14p posts 6 users 2 website clojure.org irc #clojure

### 2 users in discussion

Content

People

Support

Translate

site design / logo © 2021 Grokbase