How Do You Measure Cloud Migration Success?


Defining cloud migration success starts with setting business objectives and establishing key performance indicators (KPIs) that should meet what the business aims to achieve. In agile, think of business objectives as the Definition of Done where deliverables are discussed with the teams involved to build a common understanding of work completion. On the other hand, think of KPIs as acceptance criteria that need to be satisfied before a particular deliverable can be considered complete.

At the start of this year, we recommended looking at cloud transformation with an agile mindset. But regardless of what methodology your organization intends to implement, business objectives and KPIs will serve as your baseline to measure the success of your cloud migration efforts.

Questions to Ask When Setting Business Objectives

Organizations may or may not have the same objectives when migrating to the cloud. Most are very particular to the total cost of ownership (TCO) savings which they can benefit from the pay-as-you-go options when implementing cloud solutions. Others are contemplating making the move because of the agility and scalability of cloud environments.

There are several drivers behind cloud transformation initiatives that urge organizations to move away from infrastructure resources from on-prem to a cloud environment. But before actually taking the plunge, you should start pondering the following questions:

  1. What problems do you want to solve by moving your resources to a cloud environment?

  2. What benefits do you expect to reap from the migration?

  3. What cloud benefits spark interest among end-users in your organization?

  4. What cloud computing model would most likely satisfy your business needs?

  5. What specific application(s) will you migrate and how will you move that(those) application(s)?

  6. What business and operational processes would be affected?

  7. How will you prioritize the workloads as you move to the cloud?

  8. How will you address unexpected issues that may arise during the migration activity?

  9. How will the network support the migration?

  10. How will you describe the digital experience after migrating to the cloud?

These questions should be able to guide you in setting your business objectives such that operations, end-users, and business impact are all taken into consideration.

Establishing Cloud Migration KPIs

For organizations to say that the cloud migration was successful, the business objectives that you defined should be met. The question now lies in how will you know whether or not the objectives are satisfied. This is where KPIs come in useful.

Establishing KPIs lays the groundwork for measuring the success of your cloud migration journey. Just like acceptance criteria, KPIs should be quantifiable as they will serve as your success metrics.

1. Application Performance

Application performance metrics are the data collected from applications that tell you about the overall health of a system. When migrating systems to the cloud, you should be able to tell whether or not your systems are healthy because it is of utmost importance that your systems are either working the same or even better. Any degradation in performance is a sign that your migration was not successful which then requires you to troubleshoot or revert as necessary.

The following are the application performance metrics that you should take into consideration:

Application Availability

Needless to say, your application(s) should be accessible and functional. But here comes the tricky part—application availability is more than just knowing its UP/DOWN state. Measuring application availability should be based on the amount of time the application(s) is available and operational which now brings us to the two metrics that you need to measure—the uptime and the downtime.

Uptime is a metric that represents the percentage of time the application is available and running without any issues. On the other hand, downtime represents the percentage of time the application is unavailable and not functional.

SAP System Availability Chart in IT-Conductor

 Figure 1: SAP System Availability Chart in IT-Conductor
As an organization, you should have an agreed percentage of uptime and downtime that you need to meet in a year and they should be defined in your SLAs. This should be computed considering the impact of downtime in your business. For instance, 99.99% uptime means that the application is unavailable for 52 minutes per year. How would that affect your business? Is that an acceptable percentage from a financial standpoint?

Response Time

When evaluating application performance from a more granular level, you would want to start with measuring the amount of time it takes to complete an individual transaction which we call the response time. When a user request is sent, it takes a certain amount of time for the application, database, or server to respond back.

Response time is influenced by different factors such as the number of requests being processed, the number of users currently using the application, and even the network bandwidth. So, it is recommended that you measure the following to get a more rational estimation of response time.

  • Average Response Time indicates the average response time over a specified period of time.

  • Peak Response Time indicates the longest recorded response time within a specified period of time.

Workload Monitoring Charts in IT-Conductor
Figure 2: Workload Monitoring Charts in IT-Conductor

Learn more about SAP Basis Monitoring for Workload Performance.

Error Rate

Error rate is simply the percentage of the number of requests that result in an error over a specified period of time. It is that one metric that tells you outright whether or not a system or application has successfully been migrated. Your application may be showing as available but if the error rate is too high, it clearly tells you that something is wrong.

SAP CCMS Alerts Pie Chart in IT-Conductor

Figure 3: SAP CCMS Alerts Pie Chart in IT-Conductor


SAP ABAP Dump Frequency Chart in IT-Conductor

Figure 4: SAP ABAP Dump Frequency Chart in IT-Conductor

Errors are not conclusive to issues within the application. They could also indicate issues occurring in the cloud environment where the application is hosted. When investigating application error rates, you should consider factors in both the application and infrastructure domain.

Timeouts, Retries and Backoffs, Jitters

Now let's move on to the nitty-gritty part—setting the timeouts, retries, backoffs, and jitters. System failures and latencies are inevitable. Any application is susceptible to issues regardless if they're migrated successfully to the cloud or not. These four application performance metrics are essential to building resilient systems. Especially for distributed systems, just merely measuring application availability, response time, and error rates only gives us the tip of the iceberg.

  • Timeout indicates the time the application is idle or not responding to requests. According to AWS, a best practice is to set a timeout on any remote call including both connection and request timeouts.

  • Retries facilitate the ability of applications to handle transient failures. However, retries can even make the problem worse, especially when the issue is caused by an overload, as it adds to the load by constantly reconnecting to the service or application. The preferred solution by AWS is what we call the backoff. Instead of retrying immediately when the service or application is unavailable, the client waits sometime before trying again. According to AWS, the most common pattern is an exponential backoff, where the wait time is increased exponentially after every attempt. In addition, a maximum value is typically set to limit the number of retries and also avoid retrying for too long. This works hand in hand with timeouts, allowing the client to give up on its own given that the timeout value is configured. Finding the right trade-offs is now up to you to decide.

  • When a system or an application is experiencing an overload, timeouts, retries, and backoffs may not be enough. Adding jitters would probably help. Jitters insert variation in time when a request or any remote call is initiated. "It may seem to appear that adding jitters is a counter-intuitive idea, trying to improve performance of a system by adding some kind of randomness but it actually makes a great case when you want to spread out spikes to an approximately constant rate", mentioned by Mark Brooker, AWS Senior Principal Engineer, in his blog entitled Exponential Backoff and Jitter.

2. Infrastructure Performance

When migrating systems, infrastructure performance should also be taken into consideration. Regardless of what cloud model—Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS)—your organization decides to adopt, knowing what lies behind your application is crucial to understanding your systems from a much holistic view.

Network Performance Metrics

  • Network Availability is the first key metric that you should understand when evaluating network performance. Just like application availability, network availability measures network uptime over a specified period of time.

  • You also need to understand network connectivity from the administrator standpoint wherein you need to know what virtual network (Vnet) or virtual private cloud (VPC) resources were provisioned, subnets, or network security groups (NSGs) they belong to, etc. Also, make sure that the previous security rules and permissions are properly transitioned to ensure that only the right people can access the right resources, and anyone outside the security perimeter does not have any entry point to resources they're not allowed to access.

  • Network Bandwidth is also one of the key metrics that you should take into consideration because it dictates how much data can be transmitted at once. Knowing this would affect how you perceive response time when measuring application performance.

  • Throughput, on the other hand, is the rate at which the number of transactions over a specified period of time is calculated. When performing load testing during the validation stage, you will be varying load conditions to test the endurance of your system, analyze its behavior, and measure the throughput. The higher the throughput, the higher the performance of an application or system. 

  • Latency is a metric that is often disregarded because it is often seen as a separate event from that of an application. While it's true that latency is network-related, it is actually a big factor when it comes to understanding the performance of an application as it affects the time it takes an application to respond to the client request and vice versa.

Server Performance Metrics

  • CPU Utilization refers to how much of the CPU resource capacity is being used by all the running services and applications on a server. It is often represented in percentages. While CPUs are designed to run at 100% capacity, it has never been a practice in production to fully utilize the CPU of a server. This is because of how the application reacts to the resource where it's hosted. As the CPU utilization increases, the application tends to react slower and may cause a degradation in performance.

  • Memory Utilization refers to how much of the memory resource capacity is being utilized by all the running services and applications in a server. It is also represented in percentages. Like CPU, computer memory is also designed to run at full capacity. However, memory utilization also causes a degradation in application performance when it is fully utilized.

  • Disk Utilization simply refers to how much of the disk resource capacity is being used. Like CPU and memory utilization, physical hardware disks are built to store a specific amount of data and designed to handle that much. However, utilizing the capacity of disks at 100% has never been a practice in production as it may cause a degradation in application performance as well. This, however, is not directly related to disk capacity, but more to the I/O performance of an application when processing requests.

3. User Experience Metrics

User experience comes third in this list but that does not mean it is of lesser importance than that application and infrastructure performance metrics. In fact, if you go back and read the questions to ask when setting business objectives, most of them are pertaining to user experience. This is why it's important to include measuring metrics from the standpoint of end-users.

Unlike application and infrastructure performance, it's quite challenging to quantify user experience because it is predominantly dependent on what the application or system is used for. But we're not saying it's not possible.

The first step that you need to do when measuring user experience is to look for patterns in how end-users use the application or system. From there, list down the factors related to the business objectives you've initially set. Then determine how they relate to what we have discussed so far in application and infrastructure performance metrics.

Below are some common user experience metrics:

  • System Usability indicates whether or not the application or a system is available and functioning as expected.

  • HTTP Response Time refers to the time it takes to complete an HTTP request.

  • Page Load Time refers to the time it takes for a page to display all content on a web page.

Learn more about SAP End-User Experience Monitoring.

Next Steps

Measuring cloud migration success is not a one-time effort that you need to perform right after the actual migration. As a matter of fact, you need to begin your cloud migration journey with the end in mind. Starting from setting your objectives up until putting figures on your established KPIs, cloud migration success is a continuous effort to gauge whether or not your systems are working as expected, with added benefits.

Ideally, you need to have a good command of the technical know-how of everything we have discussed here. But given that everything is new to your organization, it may be good to involve a credible and certified third-party subject matter expert just so there's another party who can look at your migration efforts from the outside, see what else is missing, and help you with whatever it is that you need.

Also, consider the amount of time it takes to migrate to the cloud. This will help you determine the deployment cost that you will be needing, including labor and capital investment. Then after the migration activity, you will be subjected to operational costs that come with moving your resources to the cloud. It might seem as though you are investing a great deal of money here, but over time, it will be more beneficial for you as a business because cloud computing only lets you pay for what you use compared to investing in physical hardware.

Not all solutions are created equal. This is exactly why you need to research, plan, and strategize carefully on how you should approach your cloud migration efforts. To learn more about how IT-Conductor can help you with migrating your systems to the cloud, you can read What is Cloud Migration? Strategy, Process, and Implementation.