On Wed, Jul 6, 2011 at 6:11 PM, Sean Owen wrote:A block is a piece of a file. It does not (necessarily) have a meaning, or a
"file format", by itself. You would not address HDFS blocks individually
from this level. So I suppose the first answer is, no, they do not have
different formats, though the question is not well-formed.
You can have whatever you like in whatever HDFS file you want. Your
application (be it Mahout, or any MapReduce application) just needs to be
prepared to read it. If your input is a CSV file with a header line, one
mapper will read that first chunk with the header line. You don't know which
mapper that will be. Only one will read it, so no you would not construct a
MapReduce app that depends on all mappers seeing some header line, because
they don't.
Yes, so, you would not observe any Mahout job doing this, because it doesn't
work.
On Wed, Jul 6, 2011 at 11:03 AM, Xiaobo Gu wrote:
Hi,
Does every block of files in HDFS have to be the same file format when
writing map-reduce applications, a more specific question is , when
dealing with CSV files, can we have a head in the file? I have seen
Mahout applications using the UCI repository file format which is
similar as CSV without header, does it because all map reduce task
must run semantically, having a header will cause one map task be
unique to others.
Regards,
Xiaobo Gu