FAQ
Having troubles using `defmapcatop` successfully -

Given an input file with two fields, ?id and ?pts:

foo a,b,c,d
bar x,y,z

Trying to generate bigrams, which should have output:

foo a,b
foo b,c
foo c,d
bar x,y
bar y,z

I'm having troubles using `defmapcatop` correctly in this case.

I've been able to use `defmapcatop` to make a generator, as in the "foo"
definition below. And I can use tail recursion to create a list of bigrams,
as in the "bar" definition below:

(ns copa.core
(:use [cascalog.api]
[cascalog.more-taps :only (hfs-delimited)])
(:require [clojure.string :as s]
[cascalog [ops :as c] [vars :as v]])
(:gen-class))

(defmapcatop foo [pts]
(s/split pts #",")
)

(defn bar [pts]
(loop [lst (s/split pts #",") coll []]
(if (> (count lst) 1)
(let [x [(nth lst 0) (nth lst 1)]]
(recur (rest lst) (conj coll x))
)
[coll]
)))

(defn -main [in out & args]
(?<- (hfs-delimited out)
[?id ?x]
((hfs-delimited in) ?id ?pts)
(bar ?pts :> ?x)
))

Which has the output:

foo [["a" "b"] ["b" "c"] ["c" "d"]]
bar [["x" "y"] ["y" "z"]]


However, my attempts to compose "foo" and "bar" into one `defmapcatop` are
causing exceptions.

Something like the following in "baz" would seem to be the intuitive
approach (but it fails):

(defmapcatop baz [lst]
lst
)

Is there a better way to structure the collector in the tail-recursive
definition?

Thanks,
Paco

Search Discussions

  • Jeroen van Dijk at Jan 18, 2013 at 2:41 pm

    On Thu, Jan 17, 2013 at 11:16 PM, Paco Nathan wrote:

    (defn bar [pts]
    (loop [lst (s/split pts #",") coll []]
    (if (> (count lst) 1)
    (let [x [(nth lst 0) (nth lst 1)]]
    (recur (rest lst) (conj coll x))
    )
    [coll]
    )))
    The output of a defmapcatop function needs to be a list of tuples. I don't
    see how your bar output would not work with defmapcatop. How does "baz"
    fail?

    Here is a function I use often, which might give you an idea:

    (defmapcatop explode
    "Explodes the given list into single items"
    [list]
    (map vector list))

    Here is another source of information about defmapcatop
    http://jimdrannbauer.com/2011/02/04/cascalog-made-easier/

    HTH,
    Jeroen
  • Igor Postelnik at Jan 18, 2013 at 3:16 pm
    When producing collections as outputs of cascalog operations you need an
    extra level of wrapping, otherwise cascalog assumes each element is a
    separate output. The fact that you can return a scalar from an op is a
    convenience for the common case of one output. Contrast:

    (identity [1 2] :> ?x ?y) => ?x is 1, ?y is 2
    (identity [[1 2] 3] :> ?x ?y) => ?x is [1 2], ?y is 3
    (identity 1 :> ?x) => ?x is 1
    (identity [[1 2]] :> ?x) => ?x is [1 2]

    For your problem, you want something like this:

    (defn split-str [pts]
    [(s/split pts #",")]) ;; notice extra vector to ensure output unifies
    with a single variable

    (defmapcatop partition-list [n]
    [lst]
    (partition n lst))

    (defn -main [in out & args]
    (?<- (hfs-delimited out)
    [?id ?x]
    ((hfs-delimited in) ?id ?pts)
    (split-str ?x :> ?parts)
    (partition-list [2] ?parts :> ?x)))

    ?x will be a clojure list, not a string, you can use (str/join) to turn it
    into a comma-separated string if you want.

    -Igor
    On Thursday, January 17, 2013 4:16:32 PM UTC-6, Paco Nathan wrote:

    Having troubles using `defmapcatop` successfully -

    Given an input file with two fields, ?id and ?pts:

    foo a,b,c,d
    bar x,y,z

    Trying to generate bigrams, which should have output:

    foo a,b
    foo b,c
    foo c,d
    bar x,y
    bar y,z

    I'm having troubles using `defmapcatop` correctly in this case.

    I've been able to use `defmapcatop` to make a generator, as in the "foo"
    definition below. And I can use tail recursion to create a list of bigrams,
    as in the "bar" definition below:

    (ns copa.core
    (:use [cascalog.api]
    [cascalog.more-taps :only (hfs-delimited)])
    (:require [clojure.string :as s]
    [cascalog [ops :as c] [vars :as v]])
    (:gen-class))

    (defmapcatop foo [pts]
    (s/split pts #",")
    )

    (defn bar [pts]
    (loop [lst (s/split pts #",") coll []]
    (if (> (count lst) 1)
    (let [x [(nth lst 0) (nth lst 1)]]
    (recur (rest lst) (conj coll x))
    )
    [coll]
    )))

    (defn -main [in out & args]
    (?<- (hfs-delimited out)
    [?id ?x]
    ((hfs-delimited in) ?id ?pts)
    (bar ?pts :> ?x)
    ))

    Which has the output:

    foo [["a" "b"] ["b" "c"] ["c" "d"]]
    bar [["x" "y"] ["y" "z"]]


    However, my attempts to compose "foo" and "bar" into one `defmapcatop` are
    causing exceptions.

    Something like the following in "baz" would seem to be the intuitive
    approach (but it fails):

    (defmapcatop baz [lst]
    lst
    )

    Is there a better way to structure the collector in the tail-recursive
    definition?

    Thanks,
    Paco
  • Paco Nathan at Jan 18, 2013 at 8:16 pm
    Many thanks, Jeroen and Igor -
    Your examples helped out lots.

    In that case, a much simpler solution for a `defmapcatop` to get ngrams
    would be:

    (defmapcatop bigram [pts]
    (partition 2 1 (s/split pts #","))
    )

    (defn -main [in out & args]
    (?<- (hfs-delimited out)
    [?id ?pt0 ?pt1]
    ((hfs-delimited in) ?id ?pts)
    (bigram ?pts :> ?pt0 ?pt1)
    ))

    Had to re-read the `partition` docs about `step` param, but eventually that
    sunk in :)
    Cascalog provides such nice code elision!

    Paco

    On Fri, Jan 18, 2013 at 7:11 AM, Igor Postelnik wrote:

    When producing collections as outputs of cascalog operations you need an
    extra level of wrapping, otherwise cascalog assumes each element is a
    separate output. The fact that you can return a scalar from an op is a
    convenience for the common case of one output. Contrast:

    (identity [1 2] :> ?x ?y) => ?x is 1, ?y is 2
    (identity [[1 2] 3] :> ?x ?y) => ?x is [1 2], ?y is 3
    (identity 1 :> ?x) => ?x is 1
    (identity [[1 2]] :> ?x) => ?x is [1 2]

    For your problem, you want something like this:

    (defn split-str [pts]
    [(s/split pts #",")]) ;; notice extra vector to ensure output unifies
    with a single variable

    (defmapcatop partition-list [n]
    [lst]
    (partition n lst))

    (defn -main [in out & args]
    (?<- (hfs-delimited out)
    [?id ?x]
    ((hfs-delimited in) ?id ?pts)
    (split-str ?x :> ?parts)
    (partition-list [2] ?parts :> ?x)))

    ?x will be a clojure list, not a string, you can use (str/join) to turn it
    into a comma-separated string if you want.

    -Igor
    On Thursday, January 17, 2013 4:16:32 PM UTC-6, Paco Nathan wrote:

    Having troubles using `defmapcatop` successfully -

    Given an input file with two fields, ?id and ?pts:

    foo a,b,c,d
    bar x,y,z

    Trying to generate bigrams, which should have output:

    foo a,b
    foo b,c
    foo c,d
    bar x,y
    bar y,z

    I'm having troubles using `defmapcatop` correctly in this case.

    I've been able to use `defmapcatop` to make a generator, as in the "foo"
    definition below. And I can use tail recursion to create a list of bigrams,
    as in the "bar" definition below:

    (ns copa.core
    (:use [cascalog.api]
    [cascalog.more-taps :only (hfs-delimited)])
    (:require [clojure.string :as s]
    [cascalog [ops :as c] [vars :as v]])
    (:gen-class))

    (defmapcatop foo [pts]
    (s/split pts #",")
    )

    (defn bar [pts]
    (loop [lst (s/split pts #",") coll []]
    (if (> (count lst) 1)
    (let [x [(nth lst 0) (nth lst 1)]]
    (recur (rest lst) (conj coll x))
    )
    [coll]
    )))

    (defn -main [in out & args]
    (?<- (hfs-delimited out)
    [?id ?x]
    ((hfs-delimited in) ?id ?pts)
    (bar ?pts :> ?x)
    ))

    Which has the output:

    foo [["a" "b"] ["b" "c"] ["c" "d"]]
    bar [["x" "y"] ["y" "z"]]


    However, my attempts to compose "foo" and "bar" into one `defmapcatop`
    are causing exceptions.

    Something like the following in "baz" would seem to be the intuitive
    approach (but it fails):

    (defmapcatop baz [lst]
    lst
    )

    Is there a better way to structure the collector in the tail-recursive
    definition?

    Thanks,
    Paco

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedJan 17, '13 at 10:16p
activeJan 18, '13 at 8:16p
posts4
users3
websiteclojure.org
irc#clojure

People

Translate

site design / logo © 2022 Grokbase