I am starting with Cascalog and though it is fun and really concise I
am still struggling with some queries.
Lets say that I have users. These users are selling things (lets say
orders) and these orders are tied to particular account. Data could
look like this
I would like to generate query that would tell me count of accounts
for each user. If I try something like this
(defn account-query 
(let [opptys (lfs-textline "/Users/fluke/sandbox/living_social/
accounts (lfs-textline "/Users/fluke/sandbox/living_social/
(?<- (stdout) [?user ?count]
(opptys ?o-line) (my-csv-parser ?o-line :> ?
oppty-id ?acc-id ?user)
(accounts ?a-line) (my-csv-parser ?a-line :> ?
Cascalog will join all the lines so it will count the orders for each
user. How can I give it a hint to count on a different level of
aggregation? I tried to play with count and distinct but I must be
missing something. Or maybe I am looking at it from too SQL angle.
Thanks everybody in advance and thanks for this wonderful tool.