FAQ
I am also looking for something like this in Jcascalog. For example: I have
one data set, I need parse the data and if foo condition satisfied , push
the data to foo variable ( some intermediate data store instead of tab) and
if bar condition satisfied push the data to bar variable.

This is something like split sub assembly in cascading.

Please suggest how can I do that in Jcascalog?


Thanks
Sourabh
On Saturday, June 25, 2011 8:49:50 AM UTC+5:30, Evan Gamble wrote:

Is there a way in Cascalog to output to multiple output taps within
the same job? For example, tuples for which predicate foo matches
would go to output tap foo-tap, and tuples for which predicate bar
matches go to bar-tap.

I can do it by first writing to an intermediate tap, then reading from
it in multiple jobs, but that seems unnecessarily complex. Here's some
code I wrote that takes the intermediate tap/multiple jobs approach,
but I'm hoping there's a better way.

The intermediate tap in the code below is 'extr-tap'.

(defn extract-from-urls
"Takes a directory of tabbed files where URLs are the second field
(after UUID), fetches xhtml either from
dcache or the local cache (depending on doc/*use-local-cache*),
runs all extractors on the xhtml,
and writes JSON strings with extractor name/values and URL to json-
dir.
URLs with parse errors are written to parse-error-dir.
URLs not in dcache are written to cache-miss-dir.
Other errors are written to trap-dir.
If out-prefix is present it is prepended to the output paths."

[url-dir json-dir parse-error-dir cache-miss-dir trap-dir & [out-
prefix]]

(cascalog.io/with-fs-tmp [_ tmp-dir]
(let [extr-tap (hfs-seqfile tmp-dir)
json-tap (hfs-textline (str out-prefix json-dir))
parse-error-tap (hfs-textline (str out-prefix parse-error-
dir))
cache-miss-tap (hfs-textline (str out-prefix cache-miss-
dir))]
(let [extr-query (make-extractor-query url-dir (str out-prefix
trap-dir))]
(?<- extr-tap [?uuid ?url !json !parse-error !cache-miss]
(extr-query ?uuid ?url !json !parse-error !cache-miss)))
(?- json-tap
(<- [?uuid ?url ?json] (extr-tap ?uuid ?url ?json _ _))
parse-error-tap
(<- [?uuid ?url ?parse-error] (extr-tap ?uuid ?url _ ?parse-
error _))
cache-miss-tap
(<- [?uuid ?url] (extr-tap ?uuid ?url _ _ ?cache-miss))))))
--
You received this message because you are subscribed to the Google Groups "cascalog-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascalog-user+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 6 | next ›
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedMay 29, '13 at 6:40p
activeNov 17, '13 at 4:23a
posts6
users5
websiteclojure.org
irc#clojure

People

Translate

site design / logo © 2021 Grokbase