Things to consider are cost, reliability, scalability, and what equipment you might already own.
- SAN / NAS: generally less reliable than HDFS in terms of "how much data do you lose if lightning strikes a box?". Many SAN/NAS solutions start with the assumption that a given piece of hardware will never fail; I have found this to be a lousy assumption at our site.
- At today's disk failure rates, you can expect 2 dead disks a day for a petabyte scale solution. Keep this in mind for your plans. A HDFS-based solution will recover nicely from disk deaths.
- local DAS can be more scalable depending on your application.
- If you already own a SAN/NAS and it is sufficient for your install, don't throw out the equipment. Use it.
- local DAS comes in cheaper *if* you need to buy the computational power anyway.
A lot of this comes down to what your operations staff is used to.
- If you have deep experience with a vendor-supported file system (i.e., GPFS), I'd recommend continuing to use it.
- If you have no background in this area, you would probably benefit from Hadoop support from a company like Cloudera.
Hope this helps - you didn't give much background into your specific situation, so I can only answer in very general terms.
On Dec 22, 2009, at 10:24 AM, Doopah Shaf wrote:
Does anyone have any recommendations for / against using a NAS / SAN system
as the underlying physical storage for a hadoop cluster, instead of local
data node DAS?