Hey Manish, this is almost there! Your call to (hfs-textline ...) returns a
cascalog generator; to use it in a query, you'll have to define the output
variables. The following code binds the result of (hfs-textline /your/path)
to the local variable "src", then adds it to the query as a predicate with
one output variable:
(let [src (hfs-textline "/user/manish/trouper/archives")]
It would also be correct to keep the hfs-textline call inline, though
arguably it clutters things:
((hfs-textline "/user/manish/trouper/archives") ?lines)
The only other thing I changed was the name of the output variable: ?count,
not ?c. The output variables in that return vector need to appear somewhere
in the predicates below.
You've probably already tracked it down, but the Cascalog wiki hosts a
great collection of articles from around the web, plus some more detailed
Best of luck, and looking forward to seeing you around the mailing list.
On Tue, Nov 8, 2011 at 4:22 PM, Manish Shah wrote:
I'm fairly new to cascalog and i'm trying to get things setup for me
locally. I have a hadoop instance installed and running. I'm trying
first to get cascalog to read from my local hadoop files. For
example, i'm trying to count lines in a file. This is probably
incorrect syntax so i was hoping to get some help:
user=> (use 'cascalog.api)
user=> (?<- (stdout) [?c] (hfs-textline "/user/manish/trouper/
archives") (c/count ?count))
My understanding is that this should output using the "stdout output
tap". I assume that hfs-textline dir would create a series of tuples
(one for each line) and count would count them and store the value in ?
c which would then get printed to stdout.
Is this close? When i run it locally using lein repl, i get:
IllegalArgumentException Unable to join predicates together
Thanks for the help, really looking forward to playing with cascalog
Sam Ritchie, Twitter Inc
(Too brief? Here's why! http://emailcharter.org)