FAQ
Bruce,

I helped design and teach an undergrad course based on Hadoop last year.
Along with some folks at Google, we then made the resources available
together to distribute to other universities and the public at large
(via Creative Commons license, actually).

All the materials are available online here:
http://code.google.com/edu/content/parallel.html
(lecture notes, labs, and even video lectures.)

It includes suggested lab activities. Good free data sets you can
download include Netflix prize data and a copy of the wikipedia corpus.
Of course, you can set up Nutch and do your own web crawl too.

We also highly endorse the Amazon EC2 idea for doing your own labs :)

Best of luck,
- Aaron



Edward Bruce Williams wrote:
Hello



I am a student doing an independent study project investigating the
possibility of teaching large scale computing on a small scale budget. Th



My thought is to use available Open Source ( Hadoop) and Creative Commons
and other materials as the text. A student could then do significant
computing on Amazon for the cost of what they would usually pay for a
textbook. I have convinced an agency of the state of California that paying
for computer time for a CS student is "like buying a textbook or calculator
for a math student", so "so far so good."



I am asking if anyone has some largish data sets, preferably on Amazon, we
could use for class projects to contact me off list.



Thanks,



Bruce Williams


Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 6 | next ›
Discussion Overview
groupcommon-user @
categorieshadoop
postedNov 16, '07 at 2:34p
activeApr 2, '08 at 10:20p
posts6
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase