FAQ
Hello,

Yes, another person looking to contribute to and develop Hadoop. I'm looking
to start off small, fixing a few bugs before moving into larger stuff.

First, a bit of background:
Years ago I had the idea of creating a semi-decentralized distributed file
system. The idea came when I was working for a small/medium sized company
who was looking for a simple backup solution for their workstations. PC's
back then came with 100+ GB hard drives but, as simple workstations,
employees were using less than half that space. Why not have each
workstation backup to a few other workstations, duplicating files across
multiple machines for redundancy. RAID for the network. I started coming up
with design and architecture specs, protocol examples and even started
writing a bit of the system (in Java). I tried to find a few interested
developers but everyone seemed to think the task was much too large to be
accomplished as a side project (and I didn't think, given the IT industry of
the time, that anyone would fund it). Later, I realized such a distributed
system could be much more than a simple file backup solution.

It looks like Hadoop and HDFS are creating a lot of what I had wanted to
create, it's already surpassed what I had in mind in most ways.

So, where should I start? Just start fixing bugs listed in JIRA?

Geoff

Search Discussions

  • Jakob Homan at Sep 30, 2009 at 12:46 am
    Thanks for your interest, Geoff. Yes, finding open JIRAS and
    contributing patches is very helpful. We also maintain a wishlist of
    projects that one could work on:
    http://wiki.apache.org/hadoop/ProjectSuggestions. In addition, please
    do consider documentation and example work as well, as this is very
    helpful both to new users and developers starting on the project.

    Thanks,
    Jakob
    Hadoop at Yahoo!

    Geoffrey Gallaway wrote:
    Hello,

    Yes, another person looking to contribute to and develop Hadoop. I'm looking
    to start off small, fixing a few bugs before moving into larger stuff.

    First, a bit of background:
    Years ago I had the idea of creating a semi-decentralized distributed file
    system. The idea came when I was working for a small/medium sized company
    who was looking for a simple backup solution for their workstations. PC's
    back then came with 100+ GB hard drives but, as simple workstations,
    employees were using less than half that space. Why not have each
    workstation backup to a few other workstations, duplicating files across
    multiple machines for redundancy. RAID for the network. I started coming up
    with design and architecture specs, protocol examples and even started
    writing a bit of the system (in Java). I tried to find a few interested
    developers but everyone seemed to think the task was much too large to be
    accomplished as a side project (and I didn't think, given the IT industry of
    the time, that anyone would fund it). Later, I realized such a distributed
    system could be much more than a simple file backup solution.

    It looks like Hadoop and HDFS are creating a lot of what I had wanted to
    create, it's already surpassed what I had in mind in most ways.

    So, where should I start? Just start fixing bugs listed in JIRA?

    Geoff

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-dev @
categorieshadoop
postedSep 30, '09 at 12:40a
activeSep 30, '09 at 12:46a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Geoffrey Gallaway: 1 post Jakob Homan: 1 post

People

Translate

site design / logo © 2022 Grokbase