Pull out your SAP HANA Technical Operations Manual for the latest SPS09 release and you will find section 2.4 about Monitoring the SAP HANA Database, and section 2.9 discussing High Availability for SAP HANA at very shallow depth. After reading them, are you confident that your HANA setup is well monitored and highly available enough to go Production? If the answer is NO, then you've got some work to do before that in-memory investment can go live.
Let's start with Monitoring. Here are your choices from SAP:
- SAP HANA Studio
- SAP HANA Cockpit from a Web browser (essentially SAP DBA Cockpit)
- SAP Management Console
- SAP Solution Manager technical monitoring
- SAP Landscape and Virtualization Manager (LVM)
Basic monitoring should provide information on your landscape (hosts, services, detail/coordinator type, status), performance utilization, and alerts. I have set up and played with all the above and some work right out of the box but is not really an enterprise solution, like the HANA Studio or the Management Console. On the other hand Solution Manager and LVM can be time consuming to setup and maintain (see my other article Thoughts on SAP LVM) . Most IT operations teams are not going to use Solution Manager for enterprise-wide monitoring, it's just too much to manage!
If you can get the above combinations to play together for your organization, you have a head start. Otherwise, look for a third-party solution and worry about running your HANA environment instead of the monitoring solution.
Once you have monitoring taken care of, how do you test your HANA scenario, whether it's scale-out or scale up deployment, for operational robustness during failure. Sure you have the best hardware redundancy and operating system setup, but that doesn't mean HANA services or servers won't fail. This is a brief set of HA scenarios you need to test, and document expected versus actual results. Correct them if it does not meet your service level or operational objectives.
- Shut off a single instance of a system (whether you have one or multiple HANA instance on a cluster)
- Shut off one node of a system (if you have multiple nodes in a scale-out cluster)
- Restart one of the physical server of a system
- Failover a HANA Master node
- Failover a HANA Slave node
- Kill the STATISTICS service on a system
- Kill the INDEXSERVER service on a system
- Kill the NAMESERVER service on a system
- Kill the DAEMON service on a system
- Fill up the memory on one node by creating a rogue memory hog
- Fill up one of the HANA data volume by populating a table with dummy data
In testing these and I'm sure there are many other scenarios you can come up with as variations for your needs, you must ensure your monitoring solution detects and handles the alerts / notifications appropriately.
Please plan your operational readiness for SAP HANA adequately!
What's been your experience in effort, time, cost associated with getting SAP HANA production ready?