SAP Performance Service Level Management

  

As far back as I can remember, and that's more than 20 years working in SAP Basis, since the days of R/3, there's always been this 2-second average response time per dialog step (or simply 2-second response) benchmark as the acceptable performance of an SAP OLTP system. Depending on the customer, some will tweak that to fit their business, and I have seen it range from 0.5 seconds to as high as 5 seconds. Honestly, it's a misleading indicator of performance because, on any given mid-size SAP ECC production system with about 500 concurrent users, that number quickly becomes a useless measure with more transaction mix between SAP and custom code. To put it into perspective, a system may have a 0.5-second response time, but there are constant complaints from users with bad response times. Why? It's the law of averages. The more small transactions there are (those that run relatively quickly), the more they bring down the system average into an attractive average response time, even though there are transactions that run forever and frustrate the hell out of those users who need to run them as part of their daily business process. It's because dialog steps are not created equal.

I've seen many customers have SAP service levels defined to such an average dialog response time, and some even have a dedicated resource pulling statistics from CCMS and downloading them to spreadsheets on a regular basis to report them to management every week, month, quarter, etc. This practice yields more mess and guessing than actual management because the numbers reported are long after the fact and grossly averaged. Sure, some will elaborate and have more tables and charts to show what day of the week or hours of the day and maybe even the top consumers for those periods. If it's bad enough, some poor performance or Basis resource is going to be assigned the task of finding the root cause and fixing it. That person is unlikely to find those detailed statistics in the system nor figure out what else was running during that time that may have impacted or been impacted by the bad transaction.

Does this sound familiar? It's OK to admit that we're probably all guilty of this at some point in our line of work. SAP does not make it easy to monitor and manage these kinds of performance service levels. Sure there are ways to configure CCMS to monitor single transaction response times, but even with Solution Manager's E2E monitoring, it requires the configuration of robots to play synthetic transactions and record their performance. They don't actually represent actual transaction response times in the system.

The better way would be to create groups of transactions, by any combination of these attributes such as TCODE, USER, TASK TYPE, APP SERVER, USER TERMINAL, etc. They may represent business processes, such as Sales-Order-to-Cash, and service level objectives may be set up for average response times across Orders (VA01, VA02, VA03), Delivery (VL01N, VL02N, VL03N), Shipping (VT01N, VT02N, VT03N) and then Billing (VF01, VF02, VF03). Then aggregate their key performance indicator, the default would be dialog response time, but it could be database response time, network GUI time, etc., and monitor, manage them automatically against service level objectives, and trigger actions based on policy-based exceptions such as alerts and notifications. All that, along with other application, database, and system metrics as well as events so that correlation can be used if needed during performance analysis. That would be what I consider proactive and effective application performance service level management.

If I have teased your curiosity enough, then watch a short video of SAP transaction monitoring compared to IT-Conductor and how we approach this one performance aspect of service level management to put the above into context.

 

If you think this solves a major Basis performance need, I'm interested to hear your feedback and feel free to take a test drive.