I have a large csv file ( larger than 10 GB ), I'd like to use a certain
InputFormat to split it into smaller part thus each Mapper can deal with
piece of the csv file. However, as far as I know, FileInputFormat only
cares about byte size of file, that is, the class can divide the csv
file as many part, and maybe some part is not a well-format CVS file.
For example, one line of the CSV file is not terminated with CRLF, or
maybe some text is trimed.
How to ensure each FileSplit is a smaller valid CSV file using a proper