FAQ
Hi,

I am a newbie cascalog user. The code below is doing extracting numbers
from event logs and then join with another ads table. The code works
perfect on my local machine (use lfs-textline) but failed to accomplish on
hadoop cluster.

(use 'cascalog.api)
(use 'cascalog.more-taps)

(defmapcatop event-ad [line]
(let [event-csv-parser (fn [line] (mapv peek (re-seq
#"\w+#(.+?)[,\]][a-z]" (str line "a"))))
ad-split (fn [ad-col]
(let [ad-line (subs ad-col 1 (dec (count ad-col)))
ad-list (clojure.string/split ad-line #",")]
(if (= "" (peek ad-list))
[]
ad-list)))
cols (event-csv-parser line)
cur-ad-col (get cols 1)
new-ad-col (peek cols)
timestamp (get cols 0)
device-type (get cols 2)
dsn (get cols 3)
cur-ad (ad-split cur-ad-col)
new-ad (ad-split new-ad-col)
cur-ad-rec (map #(vector timestamp device-type dsn "presentAds" %)
cur-ad)
new-ad-rec (map #(vector timestamp device-type dsn "newAds" %)
new-ad)
ad-rec (vec (concat cur-ad-rec new-ad-rec))]
ad-rec))

(let [event-dir "/home/kangtu/DPETS/event.100"
ad-detail-dir "/home/kangtu/DPETS/addetail.txt"
event-tap (hfs-textline event-dir)
ad-detail-tap (hfs-delimited ad-detail-dir :skip-header? true)
output-tap (stdout)]
(?<- output-tap [?ad-cfid ?ad-name]
(ad-detail-tap ?ad-cfid _ ?ad-name _ _ _ _ _ _ _ _ _ _ _ _ _)
(event-tap ?line)
(event-ad ?line :> ?timestamp ?device-type ?dsn ?ad-status ?ad-cfid)
(:distinct true)))

The way I did that is I compile the jar and start a repl on cluster with
the following project.clj

(defproject cascalog-repl "1.0.0-SNAPSHOT"
:dependencies [[org.clojure/clojure "1.4.0"]
[cascalog "1.10.1-SNAPSHOT"]
[cascalog-more-taps "0.3.0"]
]
:dev-dependencies [[swank-clojure "1.4.2"]
[lein-swank "1.4.4"]
[org.apache.hadoop/hadoop-core "0.20.2-dev"]]
:jvm-opts
["-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8030"]
)

The error msg looks like

13/01/03 22:07:04 WARN flow.FlowStep: [] event = Task Id :
attempt_201210262237_370013_m_000003_0, Status : SUCCEEDED
13/01/03 22:07:04 WARN flow.FlowStep: [] event = Task Id :
attempt_201210262237_370013_m_000000_0, Status : SUCCEEDED
13/01/03 22:07:04 WARN flow.FlowStep: [] event = Task Id :
attempt_201210262237_370013_m_000001_0, Status : FAILED
13/01/03 22:07:04 WARN flow.FlowStep: [] event = Task Id :
attempt_201210262237_370013_m_000001_1, Status : FAILED
13/01/03 22:07:04 WARN flow.FlowStep: [] event = Task Id :
attempt_201210262237_370013_m_000001_2, Status : FAILED
13/01/03 22:07:04 WARN flow.FlowStep: [] event = Task Id :
attempt_201210262237_370013_m_000001_3, Status : TIPFAILED
13/01/03 22:07:04 WARN flow.FlowStep: [] event = Task Id :
attempt_201210262237_370013_m_000001_4, Status : TIPFAILED
13/01/03 22:07:04 WARN flow.FlowStep: [] event = Task Id :
attempt_201210262237_370013_m_000002_0, Status : SUCCEEDED
13/01/03 22:07:04 WARN flow.FlowStep: [] abandoning step: (2/2)
...7159113412495167695578140, predecessor failed: (1/2)
13/01/03 22:07:04 INFO flow.FlowStep: [] stopping: (2/2)
...7159113412495167695578140
13/01/03 22:07:04 INFO flow.Flow: [] stopping all jobs
13/01/03 22:07:04 INFO flow.FlowStep: [] stopping: (2/2)
...7159113412495167695578140
13/01/03 22:07:04 INFO flow.FlowStep: [] stopping: (1/2)
13/01/03 22:07:04 INFO flow.Flow: [] stopped all jobs
FlowException step failed: (1/2), with job id: job_201210262237_370013,
please see cluster logs for failure messages
cascading.flow.planner.FlowStepJob.blockOnJob (FlowStepJob.java:193)

and the hadoop error logs looks like

cascading.pipe.OperatorException: [b08cc45a-1f69-441c-bcd...][sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)] operator Each failed executing operation

...

Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Attempting to call unbound fn: #'user/event-ad__

...

I do need help. Any suggestions is appreciated.

Thanks

Kang

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 3 | next ›
Discussion Overview
groupcascalog-user @
categoriesclojure, hadoop
postedJan 4, '13 at 4:42a
activeJan 5, '13 at 8:18a
posts3
users3
websiteclojure.org
irc#clojure

People

Translate

site design / logo © 2021 Grokbase