I'm fairly new to cascalog and i'm trying to get things setup for me
locally. I have a hadoop instance installed and running. I'm trying
first to get cascalog to read from my local hadoop files. For
example, i'm trying to count lines in a file. This is probably
incorrect syntax so i was hoping to get some help:

user=> (use 'cascalog.api)
nil
user=> (?<- (stdout) [?c] (hfs-textline "/user/manish/trouper/
archives") (c/count ?count))

My understanding is that this should output using the "stdout output
tap". I assume that hfs-textline dir would create a series of tuples
(one for each line) and count would count them and store the value in ?
c which would then get printed to stdout.

Is this close? When i run it locally using lein repl, i get:

IllegalArgumentException Unable to join predicates together
sun.reflect.NativeConstructorAccessorImpl.newInstance0
(NativeConstructorAccessorImpl.java:-2)

Thanks for the help, really looking forward to playing with cascalog
some more.

- manish

Search Discussions

  • Sam Ritchie at Nov 9, 2011 at 1:43 am
    Hey Manish, this is almost there! Your call to (hfs-textline ...) returns a
    cascalog generator; to use it in a query, you'll have to define the output
    variables. The following code binds the result of (hfs-textline /your/path)
    to the local variable "src", then adds it to the query as a predicate with
    one output variable:

    (let [src (hfs-textline "/user/manish/trouper/archives")]
    (?<- (stdout)
    [?count]
    (src ?lines)
    (c/count ?count)))

    It would also be correct to keep the hfs-textline call inline, though
    arguably it clutters things:

    (?<- (stdout)
    [?count]
    ((hfs-textline "/user/manish/trouper/archives") ?lines)
    (c/count ?count))

    The only other thing I changed was the name of the output variable: ?count,
    not ?c. The output variables in that return vector need to appear somewhere
    in the predicates below.

    You've probably already tracked it down, but the Cascalog wiki hosts a
    great collection of articles from around the web, plus some more detailed
    explanations:

    http://www.assembla.com/wiki/show/d9Z8_q-Omr35zteJe5cbLr

    Best of luck, and looking forward to seeing you around the mailing list.

    Sam
    On Tue, Nov 8, 2011 at 4:22 PM, Manish Shah wrote:

    I'm fairly new to cascalog and i'm trying to get things setup for me
    locally. I have a hadoop instance installed and running. I'm trying
    first to get cascalog to read from my local hadoop files. For
    example, i'm trying to count lines in a file. This is probably
    incorrect syntax so i was hoping to get some help:

    user=> (use 'cascalog.api)
    nil
    user=> (?<- (stdout) [?c] (hfs-textline "/user/manish/trouper/
    archives") (c/count ?count))

    My understanding is that this should output using the "stdout output
    tap". I assume that hfs-textline dir would create a series of tuples
    (one for each line) and count would count them and store the value in ?
    c which would then get printed to stdout.

    Is this close? When i run it locally using lein repl, i get:

    IllegalArgumentException Unable to join predicates together
    sun.reflect.NativeConstructorAccessorImpl.newInstance0
    (NativeConstructorAccessorImpl.java:-2)

    Thanks for the help, really looking forward to playing with cascalog
    some more.

    - manish


    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09

    (Too brief? Here's why! http://emailcharter.org)
  • Manish Shah at Nov 10, 2011 at 3:32 am
    thanks sam. this is really helpful.

    - manish
    On Nov 8, 5:42 pm, Sam Ritchie wrote:
    Hey Manish, this is almost there! Your call to (hfs-textline ...) returns a
    cascalog generator; to use it in a query, you'll have to define the output
    variables. The following code binds the result of (hfs-textline /your/path)
    to the local variable "src", then adds it to the query as a predicate with
    one output variable:

    (let [src (hfs-textline "/user/manish/trouper/archives")]
    (?<- (stdout)
    [?count]
    (src ?lines)
    (c/count ?count)))

    It would also be correct to keep the hfs-textline call inline, though
    arguably it clutters things:

    (?<- (stdout)
    [?count]
    ((hfs-textline "/user/manish/trouper/archives") ?lines)
    (c/count ?count))

    The only other thing I changed was the name of the output variable: ?count,
    not ?c. The output variables in that return vector need to appear somewhere
    in the predicates below.

    You've probably already tracked it down, but the Cascalog wiki hosts a
    great collection of articles from around the web, plus some more detailed
    explanations:

    http://www.assembla.com/wiki/show/d9Z8_q-Omr35zteJe5cbLr

    Best of luck, and looking forward to seeing you around the mailing list.

    Sam








    On Tue, Nov 8, 2011 at 4:22 PM, Manish Shah wrote:
    I'm fairly new to cascalog and i'm trying to get things setup for me
    locally.  I have a hadoop instance installed and running.  I'm trying
    first to get cascalog to read from my local hadoop files.  For
    example, i'm trying to count lines in a file.  This is probably
    incorrect syntax so i was hoping to get some help:
    user=> (use 'cascalog.api)
    nil
    user=>  (?<- (stdout) [?c] (hfs-textline "/user/manish/trouper/
    archives") (c/count ?count))
    My understanding is that this should output using the "stdout output
    tap".  I assume that hfs-textline dir would create a series of tuples
    (one for each line) and count would count them and store the value in ?
    c which would then get printed to stdout.
    Is this close? When i run it locally using lein repl, i get:
    IllegalArgumentException Unable to join predicates together
    sun.reflect.NativeConstructorAccessorImpl.newInstance0
    (NativeConstructorAccessorImpl.java:-2)
    Thanks for the help, really looking forward to playing with cascalog
    some more.
    - manish
    --
    Sam Ritchie, Twitter Inc
    703.662.1337
    @sritchie09

    (Too brief? Here's why!http://emailcharter.org)

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedNov 9, '11 at 12:35a
activeNov 10, '11 at 3:32a
posts3
users2
websiteclojure.org
irc#clojure

2 users in discussion

Manish Shah: 2 posts Sam Ritchie: 1 post

People

Translate

site design / logo © 2021 Grokbase