and Alexander... you're right, I was just looking for something else aside
from logs, but it could be a good start, you're right :)
Thanks again!
On Wed, Mar 7, 2012 at 6:52 PM, Gauthier, Alexander wrote:
Sounds like you're looking for a "problem" to solve, you mentioned being a
"web developer" how about loading some web logs and try to do some
sessionization analysis? There are plenty of map-reduce functions out
there; doing just that (with minor modification to conform to your log
format).... that would be a good place to start thinking in term of "big
data" :)
HTH.
-----Original Message-----
From: Fernando Doglio
Sent: Tuesday, March 06, 2012 5:21 AM
To: common-dev@hadoop.apache.org
Subject: Looking for a place to start
Hello everyone, this is my first mail to list.
My question has probably been answered before, but I couldn't find a way
to search through the archives so.. here it goes:
I've been toying around with Hadoop for a few weeks now, I've installed
Cloudera's VM, tried some of the examples, wrote the classic word count
example (seems like it's the "hello world" of Hadoop :P) using streaming
and now I'm looking for a bigger challenge.
My main purpose of these tests is to train myself to think in "big data"
terms, instead of the classic approach a web developer takes when dealing
with information.
So, taking all this into account, what would you recommend I try next?
I've been looking for a big source of data to work with, something to get
information out of. I know I could generate it myself, but I was hoping
that something like that would already exists somewhere.
What where your next steps when starting out with this tech?
Thanks in advance!
Fernando
Sounds like you're looking for a "problem" to solve, you mentioned being a
"web developer" how about loading some web logs and try to do some
sessionization analysis? There are plenty of map-reduce functions out
there; doing just that (with minor modification to conform to your log
format).... that would be a good place to start thinking in term of "big
data" :)
HTH.
-----Original Message-----
From: Fernando Doglio
Sent: Tuesday, March 06, 2012 5:21 AM
To: common-dev@hadoop.apache.org
Subject: Looking for a place to start
Hello everyone, this is my first mail to list.
My question has probably been answered before, but I couldn't find a way
to search through the archives so.. here it goes:
I've been toying around with Hadoop for a few weeks now, I've installed
Cloudera's VM, tried some of the examples, wrote the classic word count
example (seems like it's the "hello world" of Hadoop :P) using streaming
and now I'm looking for a bigger challenge.
My main purpose of these tests is to train myself to think in "big data"
terms, instead of the classic approach a web developer takes when dealing
with information.
So, taking all this into account, what would you recommend I try next?
I've been looking for a big source of data to work with, something to get
information out of. I know I could generate it myself, but I was hoping
that something like that would already exists somewhere.
What where your next steps when starting out with this tech?
Thanks in advance!
Fernando