Thomas, the most important information that I read about Cascalog was the
section in the wiki called "How cascalog executes a query" (https://github.com/nathanmarz/cascalog/wiki/How-cascalog-executes-a-query).
Once you read that and apply the steps to your queries you'll better
understand what Cascalog is doing.
In your example, it is indeed grouping on the ?v var when the min and max
aggregations are done since going into the aggregation step the ?v output
var has been satisfied.
I'm not sure what the most efficient way to go about what you're trying to
do is going to be. It sounds to me like you're going to need two
subqueries. One to compute the min and max and the other to use that
min/max subquery do the normalization. Maybe something like this (untested,
so I don't know if this will even run):
(<- [?min-v ?max-v]
(c/min ?v :> ?min-v)
(c/max ?v :> ?max-v)))
(<- [?v ?norm-v]
(min-max ?min-v ?max-v)
(normalize ?v ?min-v ?max-v :> ?norm-v)))
Of course, that's going to have to run over the values twice. Once to get
the min and max and then a second time to so the normalization. Unless you
can hold all the values in memory inside a buffer or aggregator I'm not
sure how you can get away with one pass. Maybe others that are better at
Cascalog/Cascading/MapReduce know a better way.
On Monday, July 8, 2013 6:21:59 AM UTC-5, Thomas Norden wrote:
What i would like to do is normalize a set of values from a subquery.
[n min max]
(/ (- n min) (- max min)))
Say that my subquery produces values like:
(def values [  ])
I would like to feed the values through my normalize function with the
min/max of values across the entire data set. However, when I use
cascalog.ops/min and cascalog.ops/max the values are grouped together.
(?<- (stdout) [?v ?min-v ?max-v]
(c/min ?v :> ?min-v)
(c/max ?v :> ?max-v))
1 1 1
2 2 2
3 3 3
What i would like to see is:
1 1 3
2 1 3
3 1 3
At this point the dataset is small enough that I could do it post-cascalog
with some other code but I would like to see if I can keep this inside of
You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
For more options, visit https://groups.google.com/groups/opt_out.