|| at Feb 5, 2013 at 9:39 pm
If you are collecting logs from multiple servers then you can use flume for
if the contents of the logs are different in format then you can just use
textfileinput format to read and write into any other format you want for
your processing in later part of your projects
first thing you need to learn is how to setup hadoop
then you can try writing sample hadoop mapreduce jobs to read from text
file and then process them and write the results into another file
then you can integrate flume as your log collection mechanism
once you get hold on the system then you can decide more on which paths you
want to follow based on your requirements for storage, compute time,
compute capacity, compression etc
On Wed, Feb 6, 2013 at 3:01 AM, Mayur Patil wrote:
I am new to Hadoop. I am doing a project in cloud in which I
have to use hadoop for Map-reduce. It is such that I am going
to collect logs from 2-3 machines having different locations.
The logs are also in different formats such as .rtf .log .txt
Later, I have to collect and convert them to one format and
collect to one location.
So I am asking which module of Hadoop that I need to study
for this implementation?? Or whole framework should I need
to study ??
Seeking for guidance,
Thank you !!