FAQ
Hello everyone, this is my first mail to list.

My question has probably been answered before, but I couldn't find a way to
search through the archives so.. here it goes:

I've been toying around with Hadoop for a few weeks now, I've installed
Cloudera's VM, tried some of the examples, wrote the classic word count
example (seems like it's the "hello world" of Hadoop :P) using streaming
and now I'm looking for a bigger challenge.

My main purpose of these tests is to train myself to think in "big data"
terms, instead of the classic approach a web developer takes when dealing
with information.

So, taking all this into account, what would you recommend I try next? I've
been looking for a big source of data to work with, something to get
information out of. I know I could generate it myself, but I was hoping
that something like that would already exists somewhere.

What where your next steps when starting out with this tech?

Thanks in advance!

Fernando

Search Discussions

  • Dave Shine at Mar 6, 2012 at 1:37 pm
    I suggest you get a copy of Hadoop, The Definitive Guide by Tom White. I found it very informative when I was first starting out.

    As for sample "big data" , the book uses weather data from the NCDC. You can download it from https://github.com/tomwhite/hadoop-book/tree/master/input/ncdc/all

    Dave

    -----Original Message-----
    From: Fernando Doglio
    Sent: Tuesday, March 06, 2012 8:21 AM
    To: common-dev@hadoop.apache.org
    Subject: Looking for a place to start

    Hello everyone, this is my first mail to list.

    My question has probably been answered before, but I couldn't find a way to search through the archives so.. here it goes:

    I've been toying around with Hadoop for a few weeks now, I've installed Cloudera's VM, tried some of the examples, wrote the classic word count example (seems like it's the "hello world" of Hadoop :P) using streaming and now I'm looking for a bigger challenge.

    My main purpose of these tests is to train myself to think in "big data"
    terms, instead of the classic approach a web developer takes when dealing with information.

    So, taking all this into account, what would you recommend I try next? I've been looking for a big source of data to work with, something to get information out of. I know I could generate it myself, but I was hoping that something like that would already exists somewhere.

    What where your next steps when starting out with this tech?

    Thanks in advance!

    Fernando

    The information contained in this email message is considered confidential and proprietary to the sender and is intended solely for review and use by the named recipient. Any unauthorized review, use or distribution is strictly prohibited. If you have received this message in error, please advise the sender by reply email and delete the message.
  • Fernando Doglio at Mar 9, 2012 at 12:15 pm
    Thank you, everyone, I'll try to take a look at the book suggested by Dave
    and Alexander... you're right, I was just looking for something else aside
    from logs, but it could be a good start, you're right :)

    Thanks again!
    On Wed, Mar 7, 2012 at 6:52 PM, Gauthier, Alexander wrote:

    Sounds like you're looking for a "problem" to solve, you mentioned being a
    "web developer" how about loading some web logs and try to do some
    sessionization analysis? There are plenty of map-reduce functions out
    there; doing just that (with minor modification to conform to your log
    format).... that would be a good place to start thinking in term of "big
    data" :)

    HTH.

    -----Original Message-----
    From: Fernando Doglio
    Sent: Tuesday, March 06, 2012 5:21 AM
    To: common-dev@hadoop.apache.org
    Subject: Looking for a place to start

    Hello everyone, this is my first mail to list.

    My question has probably been answered before, but I couldn't find a way
    to search through the archives so.. here it goes:

    I've been toying around with Hadoop for a few weeks now, I've installed
    Cloudera's VM, tried some of the examples, wrote the classic word count
    example (seems like it's the "hello world" of Hadoop :P) using streaming
    and now I'm looking for a bigger challenge.

    My main purpose of these tests is to train myself to think in "big data"
    terms, instead of the classic approach a web developer takes when dealing
    with information.

    So, taking all this into account, what would you recommend I try next?
    I've been looking for a big source of data to work with, something to get
    information out of. I know I could generate it myself, but I was hoping
    that something like that would already exists somewhere.

    What where your next steps when starting out with this tech?

    Thanks in advance!

    Fernando

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedMar 6, '12 at 1:21p
activeMar 9, '12 at 12:15p
posts3
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Fernando Doglio: 2 posts Dave Shine: 1 post

People

Translate

site design / logo © 2022 Grokbase