Complex Problem
In today's enterprises, there are often very complex business processes managed by SAP ecosystem often involving transactions, batch jobs, process chains, messages, documents (iDocs, BDocs, etc.) and a myriad of inter-dependencies. Regardless of how they are designed, failure of these activities is not a question of IF but a question of WHEN, because they depend on systems, data, business logic and sometimes just human error.
How does IT-Conductor help customers manage their business-critical processes so when failures occur, errors are monitored, responsible party notified, assist in the root-cause analysis and resolution of the issue? We will describe our approach to SAP job monitoring and management as an example scenario which is currently used by customers to remotely manage large number of jobs, process chains, messages, and many more critical services 24x7.A Simple Yet Powerful Solution
1) Job Scheduling
Automation means we can schedule single or multi-step SAP jobs, but they don't stop there. We can also schedule native jobs against databases, and OS systems as well.
2) Process Automation
Orchestrate a set of activities on one or more applications, databases or systems. Periodic IT Processes can be digitized and automated for on-demand launch, recovery action trigger, or scheduled action.
3). Failed Job Alert
The most basic requirement, when failure occurs monitor the issue and alert
4). Delayed Job Alert
When it's released and scheduled to run but just sits there, and it hasn't failed neither. This situation can occur due to various system or scheduling issues. A rare occurrence but when it happens, a real mess in batch schedules can impact the business.
5). Job In-flight Time
It's running and it's taking too long! Thresholds per job or for any jobs exceeding processing time can raise warning, alert, and even trigger automated actions such as activating performance trace. This in-flight tracker tells a story over a lifetime of a job and provides valuable run time performance information especially when combined with time-synchronized view of other system resources utilization, such as locks, cpu/memory/IO consumption. Even if the job abort and SAP failed to capture complete job statistics, we have a historical footprint of the job.
6). Job Completed But It's Too Fast
Yeah it's strange when we first heard about this requirement, but it's actually a real world problem when a job that is supposed to run for a while and process lots of data, ran too quickly. This could indicate a problem with input data either missing or lack thereof.
7). Targeted Notification
When time is of the essence in resolving and restoring service, bypass all the middlemen and notify the rightful owner of the business process of the issue. Be specific with the subscription using subscribers and selective criteria.
8). Workload Monitoring
Monitor all the workload together and gather performance intelligence to properly diagnose and manage shared resources. Drill down on the job and get more details on internal stats such as DB time, CPU time, memory usage, IO metrics, etc.
9) Related Events
Some failures occur by themselves but often they are related to other events. Context is important and IT-Conductor provides both events and service management to help correlate, such as Availability, failed Process Chains, Syslog alerts, etc.
10) Bringing Together on a Dashboard
Events, Charts, KPIs and Analytics focusing on IT Processes. Zoom in to clickable charts time synchronized with customizable KPI's
Summary
I hope this blog post provided you some 'background' on job monitoring and options available to you along with features/benefits of each. Please feel free to comment and share your thoughts, challenges and concerns - we created IT-Conductor to make IT folks' life easier! Remote management of complex enterprises from Afar!