FAQ
Hi, all

I have a large csv file ( larger than 10 GB ), I'd like to use a certain
InputFormat to split it into smaller part thus each Mapper can deal with
piece of the csv file. However, as far as I know, FileInputFormat only
cares about byte size of file, that is, the class can divide the csv
file as many part, and maybe some part is not a well-format CVS file.
For example, one line of the CSV file is not terminated with CRLF, or
maybe some text is trimed.

How to ensure each FileSplit is a smaller valid CSV file using a proper
InputFormat?

BR/anderson

Search Discussions

Discussion Posts

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 1 of 10 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedJun 10, '09 at 12:07p
activeJun 12, '09 at 7:21a
posts10
users6
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase