I am starting an install of hadoop on a new cluster. However, I am a little
confused what set of instructions I should follow, having only installed and
played around with hadoop on a single node ubuntu box with 2 cores (on a
single board) and 2 GB of RAM.
The new machine has 2 internal nodes, each with 4 cores. I would like to run
Hadoop to run in a distributed context over these 8 cores. One of my biggest
issues is the definition of the word "node". From the Hadoop wiki and
documentation, it seems that "node" means "machine", and not a board. So, by
this definition, our cluster is really one "node". Is this correct?
If this is the case, then I shouldn't be using the "cluster setup"
instructions, located here:
But this one:
Which is what I've been doing. But what should the operation be? I don't
think it should be standalone. Should it be Psuedo-distributed? If so, how
can I guarantee that it will be spread over all the 8 processors? What is
necessary for the hadoop-site.xml file?
Here are the specs of the machine.
-Mac Pro RAID Card 065-7214
-Two 3.0GHz Quad-Core Intel Xeon (8-core) 065-7534
-16GB RAM (4 x 4GB) 065-7179
-1TB 7200-rpm Serial ATA 3Gb/s 065-7544
-1TB 7200-rpm Serial ATA 3Gb/s 065-7546
-1TB 7200-rpm Serial ATA 3Gb/s 065-7193
-1TB 7200-rpm Serial ATA 3Gb/s 065-7548
Could someone please point me to the correct mode of operation/instructions
to install things correctly on this machine? I found some information how to
install on a OS X machine in the archives, but they are a touch outdated and
seems to be missing some things.
Thank you very much for you time.