Yes, another person looking to contribute to and develop Hadoop. I'm looking
to start off small, fixing a few bugs before moving into larger stuff.
First, a bit of background:
Years ago I had the idea of creating a semi-decentralized distributed file
system. The idea came when I was working for a small/medium sized company
who was looking for a simple backup solution for their workstations. PC's
back then came with 100+ GB hard drives but, as simple workstations,
employees were using less than half that space. Why not have each
workstation backup to a few other workstations, duplicating files across
multiple machines for redundancy. RAID for the network. I started coming up
with design and architecture specs, protocol examples and even started
writing a bit of the system (in Java). I tried to find a few interested
developers but everyone seemed to think the task was much too large to be
accomplished as a side project (and I didn't think, given the IT industry of
the time, that anyone would fund it). Later, I realized such a distributed
system could be much more than a simple file backup solution.
It looks like Hadoop and HDFS are creating a lot of what I had wanted to
create, it's already surpassed what I had in mind in most ways.
So, where should I start? Just start fixing bugs listed in JIRA?