|| at Mar 13, 2011 at 5:12 am
On Sat, Mar 12, 2011 at 3:51 PM, Jun Young Kim wrote:
is a single thread allocated to a single output file when a job is trying to
write multiple output files?
At the lower levels, a data streaming thread is indeed run for every
OutputStream created for writing on the DFS.
The map task is generally single threaded unless you multi-thread the
calls (in which case the record writers are still got in a
if counts of output files are 10,000, does a hadoop try to create threads
for each output file?
Yes, there should be 10,000 threads 'started' for streaming writes
(but not all really working at the same time, as per the record writer
access methods in tasks).
Please correct me if I'm wrong.