Monitor SAP to See How Meltdown and Spectre May Affect SAP Performance

  

If you're working in IT, no doubt by now you have likely heard in the last week of Meltdown and Spectre affecting modern computers. As I am preparing this article, I have an urgent System Update popup that I cannot dismiss, and I'm sure it's related to the chip flaw.  According to the article Kernel-memory-leaking Intel processor design flaw forces Linux and Windows redesign, there are expected performance hits.  The effects are still being benchmarked, but we already see with some of our SAP monitoring that it is likely already having an impact.  We know from past work that Patches can Fix or Break SAP Performance.  Let's dig deeper.

Expected Performance Hit

As per the article cited above "a fundamental design flaw in Intel's processor chips has forced a significant redesign of the Linux and Windows kernels to defang the chip-level security bug.....these updates to both Linux and Windows will incur a performance hit on Intel products. The effects are still being benchmarked, however, we're looking at a ballpark figure of five to 30 percent slow down, depending on the task and the processor model".

How SAP Performance May Show Up

Below is a series of charts taken from a sample Production system showing CPU utilization, Users, and Average Dialog Response Times comparing recent days versus a similar period last month when the IT-Conductor monitored SAP.

CPU Utilization

This series of SAP app servers' CPU Utilization initially caught our eye as they are higher than any period last month.  It could be attributed to the new year processing, but we'll assume they are the same as last month for now and will address that point later.

IT-Conductor Performance Overview CPU Daily Compared to Last Month

The composite chart above shows a higher average CPU across all app servers > 10%.

The charts below break them down by individual app server daily and are incrementally higher, not by much but fall in line with the 5-30% impact cited by earlier benchmarks.

IT-Conductor KPI CPU Utilization Daily for the Last Month - App Servers 1-2

IT-Conductor KPI CPU Utilization Daily for the Last Month - App Servers 3-4

IT-Conductor KPI CPU Utilization Daily for the Last Month - App Servers 5-7

 

Average Dialog Response Times

SAP Performance is typically baselined on transactional systems by average dialog response times.  Based on the same system above, the chart below shows the average at least 10% higher compared to the same period last month.  Tabular data was used to examine more closely each day.

IT-Conductor Performance Overview Dialog Response Times Daily vs Last Month

What If It's Just More Workload?

The earlier question was "What if the higher CPU and response times are due to a higher workload for the new year?"  Well, a few simple workload measures can be the number of active users and/or the number of SAP dialog steps.  We'll just show the user count by day below which shows that with the exception of the last couple of days, the performance impact already started several days earlier when user count was lower than the same period last month. Ideally, we would have the same data for the same period last year but this system was only recently monitored in the last few months.

IT-Conductor Performance Overview Dialog User Count Daily vs Last Month

Cloud-based Systems and Databases

AWS and Azure, two of the most dominant SAP public cloud vendors have issued accelerated patch schedules to address the issues.  Based on AWS Processor Speculative Execution Research Disclosure, they "have not observed meaningful performance impact for the overwhelming majority of EC2 workloads".  Yet in our monitoring, we can see clearly below the CPU utilization increase on our EC2 and Database instances after we patched.

App Server

AWS Performance Overview EC2 CPU Comparison Hourly for Last 2 Weeks

Database Cluster

AWS Performance Overview DB CPU Comparison Hourly for Last 2 Weeks

Summary

It's still early and some industry analysts have said this deep chip-level flaw will have long-lasting repercussions.  The advice would be to stay on top of patches as required to prevent possible security exploits from these issues.  Equally important is to stay vigilant in monitoring and managing the impact of performance on critical enterprise applications by following the SAP Performance Best Practices.

 

The Fastest and Most Efficient Way to Monitoring SAP for Your Environment