|
Sam Ritchie |
at Dec 15, 2011 at 9:01 am
|
⇧ |
| |
Andy,
You can configure your tap to limit the number of reducers it uses on the
final write like this:
(hfs-textline "path/to/output" :sinkparts 5) ;; this tap only uses 5
reducers!
See the docs<
http://nathanmarz.github.com/cascalog/cascalog.api-api.html#cascalog.api/hfs-tap>
for
some other helpful options. This behavior works for all of these taps:
(for [prefix ["hfs" "lfs"], suffix ["textline" "seqfile"]]
(str prefix "-" suffix))
:)
Cheers,
Sam
On Thu, Dec 15, 2011 at 12:50 AM, Andrew Xue wrote:
Hi,
I have a flow which uses a number of subqueries. All the subqueries
eventually flow into a final "master" query, and I wrap this executing
"master" query with a (with-job-conf {"mapred.reduce.tasks" N} (?- out
query)) statement when it comes to running it.
I am using the mapred.reduce.task # to control the file size and to
avoid small files. The issue is that what I am doing limits the #
reduce task in the entire flow including all the previous subqueries.
Performance really suffers as a result.
Is there some way to specify a job-conf specifically only for the last
part of the flow? Thanks
Andy
--
Sam Ritchie, Twitter Inc
703.662.1337
@sritchie09
(Too brief? Here's why!
http://emailcharter.org)