We have 600TB of EMC SAN storage. Currently, Oracle RAC is utilizing this storage. We are replacing Oracle RAC with Hadoop Storage (Yarn,Spark - Hive, Shark) for scalability reasons - though we compromised on performance a bit.
For Hadoop, local storage is recommended than SAN storage. But our management is not willing to waste the SAN storage. They want to protect the investment on SAN storage.
How best can we use SAN for Hadoop? Ethernet upgrade will help? What are the options to make use of the SAN storage to the maximum (as Hadoop Storage).
Obviously you use SAN for Hadoop but it is not advisable. There will be contention in SAN controllers and degrades the performance.
The best way to use SAN for hadoop are:
1.Create LUN with RAID-0.
2.LUN should not be shared and it needs to be dedicated to one DataNode server only
3.If a DataNode needs 10GB then create 2 LUNs (or even numbers) and load balance these LUNs between two controllers of SAN.
Obviously you can use SAN for NameNode with appropriate RAID level (with redundancy - non-zero).