FAQ
Dear Shwitzu

The steps are listed below:

Kindly go through wordcount and multifile word count for you project.

Modify the program to list the files containing the keywords along with fine names. Use file names as keys.

Store the files in 4 different input directories – one for each file type if needed. Else you can also have it in a single input directory.

Use word count example with extensions suggested to retrieve file names having the keywords and store the result in output directory or display the links.

Map – parallelized reading of multiple files –
Input key-value pair is filename–filecontents
Output key-value pair is filename – keyword and count.

Reduce – combining output from key-value pairs of map function

Input key-value pair is filename – keyword and count.
Output key-value pairs is keyword – filenames having the keywords

The answers to your questions are:
1) How should I start with the design?
Identify the files to be saved in the HDFS input disrectory.
Go through the word count example.
2) Upload all the files and create Map, Reduce and Driver code and
once I run my application will it automatically go the file system and get
back the results to me?
Move all the files from local file system to HDFS / save it directly to HDFS by using suitable DFS command like copyfromlocal() - Go through DFS commands
3) How do i handle the binary data? I want to store binary format data using
MTOM in my databse.
It can be handled in the same way as a conventional file

G Sudha Sadasivam


[email protected]> wrote:


From: shwitzu <[email protected]>
Subject: Need Info
To: [email protected]
Date: Thursday, October 15, 2009, 7:19 AM



Hello Sir!

I am new to hadoop. I have a project  based on webservices. I have my
information in 4 databases with different files in each one of them. Say,
images in one, video, documents etc. My task is to develop a web service
which accepts the keyword from the client and process the request and send
back the actual requested file back to the user. Now I have to use Hadoop
distributed file system in this project.

I have the following questions:

1) How should I start with the design?
2)  Should I upload all the files and create Map, Reduce and Driver code and
once I run my application will it automatically go the file system and get
back the results to me?
3) How do i handle the binary data? I want to store binary format data using
MTOM in my databse.

Please let me know how I should proceed. I dont know much about this hadoop
and am I searching for some help. It would be great if you could assist me.
Thanks again

--
View this message in context: http://www.nabble.com/Need-Info-tp25901902p25901902.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 6 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedOct 15, '09 at 1:50a
activeOct 29, '09 at 11:42a
posts6
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2023 Grokbase