Thomas, the most important information that I read about Cascalog was the

section in the wiki called "How cascalog executes a query" (

https://github.com/nathanmarz/cascalog/wiki/How-cascalog-executes-a-query).Once you read that and apply the steps to your queries you'll better

understand what Cascalog is doing.

In your example, it is indeed grouping on the ?v var when the min and max

aggregations are done since going into the aggregation step the ?v output

var has been satisfied.

I'm not sure what the most efficient way to go about what you're trying to

do is going to be. It sounds to me like you're going to need two

subqueries. One to compute the min and max and the other to use that

min/max subquery do the normalization. Maybe something like this (untested,

so I don't know if this will even run):

(def min-max

(<- [?min-v ?max-v]

(values ?v)

(c/min ?v :> ?min-v)

(c/max ?v :> ?max-v)))

(<- [?v ?norm-v]

(values ?v)

(min-max ?min-v ?max-v)

(normalize ?v ?min-v ?max-v :> ?norm-v)))

Of course, that's going to have to run over the values twice. Once to get

the min and max and then a second time to so the normalization. Unless you

can hold all the values in memory inside a buffer or aggregator I'm not

sure how you can get away with one pass. Maybe others that are better at

Cascalog/Cascading/MapReduce know a better way.

On Monday, July 8, 2013 6:21:59 AM UTC-5, Thomas Norden wrote:

What i would like to do is normalize a set of values from a subquery.

(defn normalize

[n min max]

(/ (- n min) (- max min)))

Say that my subquery produces values like:

(def values [[1] [2] [3]])

I would like to feed the values through my normalize function with the

min/max of values across the entire data set. However, when I use

cascalog.ops/min and cascalog.ops/max the values are grouped together.

(?<- (stdout) [?v ?min-v ?max-v]

(values ?v)

(c/min ?v :> ?min-v)

(c/max ?v :> ?max-v))

RESULTS

-----------------------

1 1 1

2 2 2

3 3 3

-----------------------

What i would like to see is:

RESULTS

-----------------------

1 1 3

2 1 3

3 1 3

-----------------------

At this point the dataset is small enough that I could do it post-cascalog

with some other code but I would like to see if I can keep this inside of

cascalog.

Thanks,

Tom

--

You received this message because you are subscribed to the Google Groups "cascalog-user" group.

To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.

For more options, visit

https://groups.google.com/groups/opt_out.