In my context various cli tools have been created with cascading in order
to provide a way to easily parametrize custom jobs.
I would like to be able to compare this use with cascalog only used as a
The way I see it, all implementations would be done in java (tap,
operations...) but the structure of the query would be in cascalog/clojure.
I would like to evaluate the power that would give to technical end users
which are not necessarily devs (and more likely ops).
For that matter, I have a few questions :
* Is there any explicit documentation about how to reuse cascading filter,
operations and tap?
* How would someone provide a clj file to hadoop? It should be external to
the archive with the custom code and a repl should not be a necessity. I
mean that a non-dev should be able to change to query (ie the ouputs fiels)
and run it again without really knowing how to compile, how to package,
what is clojure or what is a repl.
The use case that I am evaluating is one for which Hive/Pig could be used.
There is a separation between the one specifying the request and the one
responsible for implementing custom operations. But cascalog could allow
faster prototyping, especially when one person has the two roles (which is
NOT always the case).
I understand that those questions might be trivial but I would love to have
any feedback on this subject.