I have a urgent question regarding processing binary (image) data using
Hadoop streaming.
I am looking for simplest solution, preferably without making change to
hadoop and/or streaming package.
I got some hints from this mailing list, including using customized
InputFormat, or sequencefileInputForm. but nothing really help me out. Here
is my problem:
1. A lot of binary (image) files stored on HDFS.
2. a standalone executable take binary (e.g., image) filename as input (key)
and export small metadata as value (e.g., size of image)
How can we passing the this standalone program as a mapper to streaming to
process image across all nodes, given streaming currently only takes stdin
by default.
Thanks.
-Qiming
--
View this message in context: http://www.nabble.com/streaming-a-binary-processing-file-tp23859645p23859645.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.