Smith Stan wrote:
Hey Cloudera genius guys .
Sorry, not cloudera. I speak for myself.
I read this
Via Cloudera, Hadoop is currently used by most of the giants in the
space including Google, Yahoo, Facebook (we wrote about Facebook’s use
of Cloudera here), Amazon, AOL, Baidu and more.
I would be doubful that any on that list use the cloudera distro,
because once you manage a cluster to the extent you create your own RPMs
for PXE-preboot and kickstart install then you know what you are doing
and will be worrying more about the power budget of your datacentre -as
measured in megawatts-, and whether your off-site replication plan is
copying data to other facilities on different earthquake fault lines for
than how hadoop-site.xml works.
This is not much different from saying these companies all use TCP/IP,
Http, MySQL and Linux, therefore a Linux server running apache and
mysqld will help you to beat them.
Hadoop is a tool for very large datasets, works best if you can group
and scan them independently.
* If you do not know what you are doing, it will not help
* if you do not have a sufficiently large dataset, it is not worth the
effort
* if you havent outgrown an RDBMS, stick with the database
* Cloudera are offering to help with running/using hadoop, but they
aren't going to code your datamining algorithms for you.
see also:
http://teddziuba.com/2008/04/im-going-to-scale-my-foot-up-y.html-Steve