Automation in IT Operations by extending the ITIL-Processes
Hans-Christian Boos 2008
As the first and most promising approach to a general standardisation of IT operations, ITIL is on everyone’s lips today. However: The description of ITIL as the best innovation in IT maintenance is misleading. ITIL is only a standard for better management of conventional IT operation environments. Considered in terms of the difference between leadership and management or innovation and optimisation, ITIL clearly falls into the second category. It is possible to gear IT operations up to the next higher level through the use of ITIL processes. And to use the words of a long past industrialisation, it is possible to make manufacturing into a (IT) factory beyond that.
Those who provide support for applications distinguish between two basic options for dealing with malfunctions. On the one hand, incidents can be responded to reactively. This requires that the administrators to gain knowledge of the incident. This could be via a phone call to the helpdesk or through an alarm on a monitoring system. On the other hand, IT operations can attempt to proactively avoid the occurrence of incidents – that is, before they occur. The ITIL processes for the reactive handling of incidents are Incident and Problem Management. Their proactive counterparts are called Availability and Capacity Management. Change Management is excluded at this point because changes to the environment can develop from both reactive and proactive measures and can be implemented through the ITIL change management process.
ITIL as the Foundation of Automation
All tasks manually executed within the context of ITIL processes can be performed in all kinds of IT operations environments currently in place. The ITIL standard only provides for a framework of processes enabling these tasks can be carried out with the greatest possible efficiency and effectiveness. As such the ITIL standard also makes it possible to view the entire IT operations environment as a connected and coherently integrated unit. This is essential for the further development of the operating concept, which will be illustrated in this article. We will first examine the ITIL processes for Incident, Problem, Capacity, and Availability Management.
Reactive and Proactive Processes of the ITIL
The ITIL process dealing with faults limiting functionality for user is Incident Management. The process is initiated by a help desk receiving notification of a malfunction or a monitoring system generating an appropriate message.
((IMG Incident Managment))
The information about this fault is given to the appropriate IT-experts – the Incident Managers – who then initiate a short-term solution. In contrast to the Problem Management process carrying out a comprehensive investigation of the failure´s root cause, is avoided in the Incident Management process. This process concentrates on finding the fastest possible solution, a so-called workaround. Besides finding these workarounds Incident Management is mainly a communication process. This means that for every fault documented through the Incident Management process the reporting or concerned entities (e.g. customer, service manager, …) are kept up to date on the state of current operations. This communication for every fault is kept up even when many faults have a common reason that is being dealt with in the Problem Management process.
The process behind Incident Management is Problem Management. In this process incoming alarms from the monitoring system or faults that have been documented by the Incident Management process are classified their root cause is investigated in detail thoughout the IT environment.. After the root cause is determined, the Problem Manager eliminates it through a change to the environment – via the Change Process – and thus permanently solves the problem subsequently all fault notifications resulting from this cause.
The ITIL processes dealing with the proactive prevention of bottle necks are Capacity and Availability Management. The business need is analyzed with respect to availability and IT resources. The results of this analysis is compared to the actual data generated by the monitoring system. This analysis is used to generate a prognosis for the trends of resource capacity usage. Should this prognosis suggest that resources will soon run low, the experts will either find a way to cut down on resource usage or they will proactively initiate a process to integrate additional resources into the IT landscape.
Availability Management shows the proactive nature of these processes even better.
In Availability Management the actual availability of an application is compared with the user requirements in terms of quality assurance. If this comparison indicates that the actual service offered is inadequate, the IT experts are informed. This gives them the opportunity to technically countermand the development of serious quality problems before they occur.
The Limits of ITIL
To many of us working in IT it seems like a huge step forward, when the described (and other ITIL) processes have been introduced successfully and are running in day-to-day operations. And yes the implementation of ITIL processes enables the organisation to manage “previsouly uncontrolled” growth, avoids single points of knowledge, brings transparency into the operating services and above all ensures high quality documentation of operating processes – also including the re-use of solutions for a problem once they have found.
The next logical step goes beyond the optimizing the administrative side of IT operations by implementing the ITIL processes. There is a point in time, when those responsible for IT operation have used up the improvement of known procedure models to squeeze the last drop of efficiency out of the system. Still there is this question: How can the handling of the IT environment be so fundamentally changed that the amount of manual labour required to keep it running actually decreases. IT can learn from classical industry when answering this question. In industry, manual labour has been replaced by the machine labout starting in the industrial revolution. Thus automation – replacing manual labour by machine labour – is the next logical step in the advanced development ot the IT industry.
000000;">Data Collection and Monitoring as the Basis for Automation
Finally the catch phrase “automation” is out in the open.. But what does automation mean in an environment normally used to automate work in other areas? Above all that means to check if it would be possible for a “machine” to take over the task whenever an IT expert is involved in the process of keeping IT up and running. This does not mean that an independent consultancy project has to be initiated for each group of IT experts who could become part of automation. Instead the processes followed by the administrators are checked for overlapping or at least similar actions. These can then be used as a starting point for automation. Such overlaps are quickly located through the monitoring process. Since each of the ITIL processes is significantly dependent on the function of the monitoring system, this central system presents an obvious starting point for checking out the automation possibilities. A monitoring system is therefore the driving force behind setting up automation in IT operations.
But not only the perspective of the process architecture strongly suggests the monitoring system as the logical starting point for automation, the nature of automation itself demands that decisions be made on the basis of the incoming (monitoring) data through a “ruleset”. These decisions then trigger actions that otherwise would have been carried out manually. In the end the source for the data the decisions are base on can only be a monitoring system. It delivers clear and traceable data that is measured against the same references over a give period of time across the system environment. These are the pre requirements for data to be drawn on as input parameters for a reuleset automated actions are to be based on.
The next generation of IT operations will rely upon the monitoring system as a central source of data besides the CMDB. Based on these data sources a library of rules and the actions that they trigger are the actual innovation to the IT operating processes. This library is located between the stream of monitoring data and the manual ITIL processes. Thus the manual parts of the ITIL processes are either taken over by the reuleset completely or the ruleset prepares and consolidates the input to the manual tasks within the ITIL processes. This results in an inevidable extantion of the ITIL library. Compared to the current ITIL processs this extension does not simply improve the coordination or standardise the communication around known tasks in IT operations. Instead it actually helps to automate tasks and therefore change the way IT is operated fundamentally all on the basis of well tested and well placed ITIL processes.