FAQ
Data Blocks Spliting should be record oriented or provided option for give the spliting locations (offsets) as input file
-------------------------------------------------------------------------------------------------------------------------

Key: HADOOP-7404
URL: https://issues.apache.org/jira/browse/HADOOP-7404
Project: Hadoop Common
Issue Type: Improvement
Reporter: Sunil Goyal


Old Bug : https://issues.apache.org/jira/browse/HADOOP-106

It is difficult to do the padding in the existing records. Due to the following reason:

1. Records are having the different Size (some may be bytes, some may be GB) but in same file.
2. It is having the compatibility issues with the other standard tools.
3. It will increases the file size without any need of other tools (not working on hadoop).

I think there should be option to this splitting process like this:-

1. File contains information of offsets where should be splitting done. (like 10,100,120, offset it).
2. Hadoop should do the splitting according to it ( 10-0 = 10, 100-10 =90 , etc).
3. This file can be generated easily from the other tools.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedJun 19, '11 at 8:21a
activeJun 19, '11 at 8:21a
posts1
users1
websitehadoop.apache.org...
irc#hadoop

1 user in discussion

Sunil Goyal (JIRA): 1 post

People

Translate

site design / logo © 2022 Grokbase