|| at May 1, 2013 at 5:17 pm
How many output files are you getting? You can set SET DEFAULT_PARALLEL 1;
so you don't have to specify parallelism on each reduce phase.
In general though, I wouldn't recommend forcing your output into one file
(parallelism is good). Just write a shell/python/ruby/perl script that
appends the files after the full job executes.
On Wed, May 1, 2013 at 12:51 PM, Mark wrote:
Thought I understood how to output to a single file but It doesn't seem to
be working. Anything I'm missing here?
-- Dedupe and store
rows = LOAD '$input';
unique = DISTINCT rows PARELLEL 1;
STORE unique INTO '$output';
Product Lead, http://parse.ly
989 Avenue of the Americas, 3rd Floor
New York, NY 10018
p: +1 (416) 953-4248