Hello everyone, this is my first mail to list.
My question has probably been answered before, but I couldn't find a way to
search through the archives so.. here it goes:
I've been toying around with Hadoop for a few weeks now, I've installed
Cloudera's VM, tried some of the examples, wrote the classic word count
example (seems like it's the "hello world" of Hadoop :P) using streaming
and now I'm looking for a bigger challenge.
My main purpose of these tests is to train myself to think in "big data"
terms, instead of the classic approach a web developer takes when dealing
So, taking all this into account, what would you recommend I try next? I've
been looking for a big source of data to work with, something to get
information out of. I know I could generate it myself, but I was hoping
that something like that would already exists somewhere.
What where your next steps when starting out with this tech?
Thanks in advance!