For security reasons I am required to conform and use a different S3 library
that I am provided to access S3 data. If I write an adapter against the
native file system store class to access S3 using my own library, do I still
get the same benefits that I would get for using the default file system
store , i.e. jets3t native file system store? My motivation here is to
exploit hadoop's capability to compute and generate file splits , so that I
can parallelize the work across different mappers for a single S3 file. I
believe this is quite different from the norm, as splits are generally used
in HDFS and supports larger files (where in this case the max is 5GB) and
that most approaches that I've heard requires the uploading of the data from
S3 to HDFS prior to processing - I am currently reading and writing straight
to S3, similar to EMR. What I have just pointed out may be completely
infeasible - I have looked through parts of the hadoop library but haven't
completely grasped how file split could interact with S3 input stream. There
are two questions here that may be totally unrelated, but thanks for


Search Discussions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
postedNov 25, '09 at 6:09p
activeNov 25, '09 at 6:09p

1 user in discussion

Neovazjr: 1 post



site design / logo © 2022 Grokbase