Automation is Knowledge Conservation

Automation, Business Impact of Automation, Market, Social Impact of Automation 1 Comment »

000080;">Warning: This post contains just as much sarcasm as it contains serious content.

In many discussions I have founds that grasping the concept of automation is alien to most people´s mindset. Are you one of them? Do you really prefer to work your butt off doing seriously dull stuff than sitting in an arm chair with a cocktail? Or if you are not that lazy, do you really prefer mind numbing repetitive tasks to trying out thrilling new things or finding an elegant solution to a tricky problem? (Well, if you answered yes to any of these, please go and visit some soap opera or sitcom blog instead and never ask yourself why your life is soooo boring that you need to tune into life somewhere else….)

So you are still reading? 000080;">Glad to meet you. I do believe that most great inventions were made because we are a lazy kind of animal. The only thing that can get us out of our laziness is something stimulating to our brains. Everything else we try to get rid of. Usually we start out with the low hanging fruits and move on to more complex problems from there. E.g. inventing the wheel meant getting rid of the need to carry everything on our backs, using many people to transport a heavy item or it meant using fewer people or animals to do the same job (ancient form of cost cutting and let us not talk about the invention of sliced bread here). An example of a more complex problem would be managing a Web Portal with 1.2 Million transactions a day that is connected to three different ERP Systems using two different SOA approaches and so on…

Corporate Culture without Automation

Corporate Culture without Automation

Are we back to the point where you say “that cannot or should not be automated”? Yes it can, and yes it should be automated, because once you know how to handle the everyday hick-ups of even this complex IT environment you become very bored with it. Well you might say, if that really is automated then the job of administrating this stuff will be gone – so what? So were the jobs of the people who used to carry the bricks to the pyramids when they all of a sudden started using wheels and carts. And guess what…. Since then the population and average wealth of people has increased greatly. And one more interesting piece of information… The people who started using the wheels right away got much richer or at least had much more fun that those “traditionalists” who said carrying bricks is supposed to be done manually. Why is that? Well because management liked to get things done quicker and cheaper… Sound familiar? Well, management has much fancier titles today than “just” pharaoh.
Well back to serious business, I guess you get the point – progress in IT administration is on its way and stopping it is not an option – especially not in the current economic situation.
So what do all these great inventions that really took work off our backs do? They conserve knowledge collected by hard work and experience and apply and reapply them. 

So conserving knowledge on how complex IT environments are managed is what we set out to dowhen developing the arago Automation Engine (ff0000;">aAE). Looking back at our operations we have done quite well. We are now able to handle roughly 68% of all issues coming up during the day automatically and only deal with the interesting ones manually. This is also why our administrators actually have an interesting job compared to the ones who do the same thing over and over and over again – just to keep busy.

So what do we do? We take a model of the IT environment and collect all the tiny steps necessary to keep this environment up and running at all times. These tiny steps are then generalized, so they can be applied and reapplied as needed. The big invention behind this is the algorithm that actually analyzes incoming issues and finds out which of the tiny administrative steps need to be combined in order to resolve these issues. So automated IT operation is the conservation of IT experience and knowledge as well as a fairly smart machine (not quite as cool as the wheel, but getting there) which knows how and when to apply these experiences.

PS: Downloaded and actually licensed that cartoon from www.CartoonStock.com… Really love it.

The Evolution of Automation Tools

Automation, Automation Technology Architect View, Business Impact of Automation, Clouds 2 Comments »

The history of delivering IT Services is certainly an evolutionary process. This is not even considering the huge evolution that has taken place in the technology available to deliver such services. The evolution in IT delivery or IT operation is more or less an evolution of tools. It began with the host operating systems where much of the software that came with the computer was only used to manage the machine itself. Skipping many steps, these tools went through the various stages of network and system management to business service management or business transaction management tools. The latter’s claim to fame is actually achieving what business service management set out to do – making IT manageable from a business point of view.

Automation Auto Pilot

Automation Auto Pilot

Speaking abstractly all these tools are automation tools. They automate several steps of work that an IT operator, administrator or delivery manager previously had to perform manually. But they are still just tools. They make life easier for the one who is doing the job, but would you call an industrial hammer an automation tool? Therefore I think it is time to take a look into the fish tank of (IT-)tools and approaches available today and show how evolution points towards engines (not so much the tools) that actually decide what to do and then take the action autonomously – only asking for permission, reassurance or assistance if required by process or if no solution is available to them. Such an engine could be called an automation auto pilot and is sitting on top of all the tools available to IT experts today.

We have been developing and using such an engine for more than ten years now and have achieved very good results in quality improvement, availability of documentation as part of compliance and cost cutting. But why do I most strongly believe that this is not an exotic idea, but the logical next step?

If we focus on the two dimensions IT management tool that can takes actions automatically or facilitate taking complex actions on a complex IT and application landscape, we end up with a trigger axis and an approach axis. The trigger axis describes under what conditions an action or tool invocation is triggered. The approach axis describes what kind of action will be taken and how flexible these actions can be taking the trigger conditions into account.

At the left of the trigger axis (x) we place “scheduled”, in the middle “event triggered” and at the right automated. This means that a tool positioned to the far left of the trigger axis will take action at a predefined time. Tools placed in the middle will take action if certain events occur and tools to the far right will take action as they become necessary. On the approach axis we placed “standardized” at the bottom, “rationalized” in the middle and “dynamic” at the top. This means that tools that perform predefined actions without reacting to any information gathered while executing (e.g. cron scripts), would be placed on the bottom, tools following a predefined process but building branches into the process that take current conditions into account would be placed in the middle and tools that combine the best process to be taken for the given situation out of a pool of possible actions are placed on top.

Tool Classification Dimensions

Tool Classification Dimensions

Placing the tools and concepts currently on the market onto these axes will show a clear evolutionary development from a scheduled standardized batch process to an engine that combines possible actions to a solution as the situation requires. The auto pilot function that I was talking about earlier is such a tool that would be placed up and to the right on our chart of automation evolution.

In the chart presented below, the placement of “hot” topics such as data center automation, work load automation and even run book automation are much more “old school” in their approaches and are therefore placed accordingly. Our auto pilot engine clearly takes up the “new approach” position – with a very notable difference – we have been running a successful business on this model for a long time. Thus this is not a fancy idea, but a valid approach and current trends in management software are pointing to exactly this approach.

Automation Auto Pilot as Trend

Automation Auto Pilot as Trend

Maybe this “sorting of the tools” article has helped a little to place other thoughts on automation published here. It will certainly be necessary when we look at why dynamic automation becomes more and more unavoidable as complexity and change rate increase. E.g. following the current discussions on cloud computing from the Atlanta cloud camp organized by John Willis or even the dynamically evolving enterprise clouds as described by Mark Masterson, an automation auto pilot is the only way to keep track of an IT landscape that is fully distributed and dynamic. Just solving the problem of distributed computing and dynamic resources from an OS point of view by creating good cloud managers or VMs does not solve the problem of keeping business applications alive and available with proper execution quality and correct business results. If any of you have ever configured e.g. the Tivoli Correlation Engine in an Enterprise console successfully you know how much work that is. Putting your environment in a cloud would essentially mean you would have to review all correlation rues every time the cloud manager changes your environment. Not possible you say – well that was only the correlation engine. No other system management, IT service management or business service management tool or visualization was even touched. So you see, something will have to be done in order to keep the actual delivery of business services up and running when moving to a fully dynamic environment – this something is an autonomous automation engine or an automation auto pilot.

The Difference between Automation and AUTOMATION – Part I

Automation, Automation Technology Architect View, Business Impact of Automation, DataCenters, Events No Comments »

Talking about automation on the way to IBM PULSE 2009 got me some interesting insights. I did know that most people do not really feel comfortable, when a machine actually acts autonomically. But that most people would expect to actually get a tool that forces them to click though their whole IT infrastructure and application landscape in order to DO something was really astonishing to me.

Why? Well because we have been working so differently for many years now. Thus I feel obliged to give some examples of automating tasks in an environment like the one I have been writing about for almost a year now. Maybe some of the things you found quite interesting will become clearer after reading this practical example.

Ok, let us think you would want to automate the task of deleting a user across your IT landscape. A rule we have entered years ago and have been using ever since. In an automated environment like ours all you would have to do is set an „issue“ onto the automation engine bus that says „delete user XYZ‘. This issue will be set upon the graph of the IT dependency model and look for any node that has rules attached to it that know how to handle „something“ with the data „user“ and the action „delete“. For this issue the graph of the IT dependency model will reduce itself to these nodes that know how to handle it. The issue will then map out a road though the nodes –this is what the engine does and this is the real secret behind automation. Each node the issue visits will perform some action – in accordance with the rules – and will return some input to the engine. The engine can  make out whether it needs to add additional nodes onto the issue’s travel list, remove nodes – because some other action has already taken care of the demands of the issue requesting action, or whether the issue in resolved, because there are not more actions to be taken. If the latter is the case the engine will look for any other issue that may be able to use or issue’s data and if that isn‘t the case, dismiss the task as completed.

So what does this mean practically?

For every OS you have to write one action rule that specifies how to delete a user. For every kind of directory or IAM application you have running you will have to write a rule respectively. That‘s it! These are probably scripts you have anyway and you simply upload them into the engine with the rules. The engine will determine what nodes the rule should attach itself to and will execute the rule for any issue that seems suitable.

So compared to a system where you actually have to define what to do where before it will delete a user across your infrastructure this is remarkably simple. Not only the time for deleting a user will go down from 40 minutes to 1 as some other vendors say, but the time for installing this neat gadget will go down from 2 days to 0 because the rule is already there for most OS and IAM solutions. If you really want to add some exotic system, then you will probably need 10 mins to do so.

So the next logical step in automation is not just improving the tool that lets you execute some commands, maybe remotely or maybe with a good archive of scripts, but to have an intelligent tool that will actually work for you. You tell it the result you want, in this case remove user, and it will find out how to go about to achieve this result.

Deleting a user is a change and most likely an unplanned one as such. The same technology can also be applied when reacting to incidents, problems or user error reports. Then you can tell the engine that the desired end result is that you want the problem to go away. It will find out what to do where in your IT infrastructure by itself and it will do it – well maybe it will go and ask you for permission though integration into a process management system, (that is the way we do it) for some critical actions, but other than that it actually goes on and does the job – it actually figures out what to do, follows through and documents all actions taken.

So there is not much difference between what you use as automation today and what AUTOMATION can actually do from a „do I have to be afraid“ point of view. But there is a great difference in result. An automation technology, that will actually figure things out will much better align to business requests, work with a changing IT landscape and will integrate into all the ITIL operating processes.

000080;">Got you interested? See some examples at IBM PULSE 2009 tomorrow. Conference Center 123, 3:30-4:40pm. See you there….

Integrating ITIL and Automation

Automation Technology Architect View No Comments »

I finally find the time to write to you on the integration of automated IT operating into today´s working environment. One should think that automation means that just some other tool will be installed into the void of the IT service management jungle and maybe some administrators use this tool and become a lot better and a lot faster. Actually that is exactly what I am NOT talking about. If you are interested in my opinion on the “Automation Market” you will read an article here soon. So what I am talking about is more of an auto pilot – a machine that actually looks at problems and chooses to take action.

The question I am always beeing asked is ‘does this integrate with an established working environment and established processes e.g. ITIL?’ Yes it does. Indeed it needs established processes to function properly. An automation engine like ours (ff0000;">aAE) basically replaces the initial contact to IT experts.

Fig 1 - classic ITIL incident management

Fig 1 - classic ITIL incident management

Let us look at the ITIL V2 and V3 incident management process for example. As you probably know (see figure 1) a normal ITIL incident management process is either initiated by an alarm from some monitoring system or by a user contacting the helpdesk. The helpdesk handles all the bureaucracy and then passes the incident on to the IT experts who will perform further analysis – if required collect additional data and perform additional analysis – and then either take immediate action to produce a solution or initiate a change process to take this action. As you can see in figure 2 in an automated incident management process the automation engine takes the place of these IT experts. This also includes the engine communicating with the helpdesk, performing additional analysis, requesting additional data, documenting its actions and so forth. When the automation engine cannot find a solution it will contact the IT experts and ask them to step in. Only this time the experts will get a well analyzed incident with most of the boring work and analysis already done and well documented so they can actually work on something new and interesting.

fig. 2 - ITIL incident management with automation

fig. 2 - ITIL incident management with automation

This is how we introduced this auto pilot into our own ITIL compliant IT service management unit. We promised the real technical experts that they would never be bored to death by everyday tasks and tedious busywork. Instead the engine puts only these problems on their desk where an expert as such is actually required and can use his or her talent instead of just keeping mindlessly occupied. If you want to read some more on the human element and concerns connected with the introduction of automation you might want to look at the article “Plays Well with Others” written by Ellen Fussell Policastro last August. In this Article automation is looked upon not in an IT sense but in an industrial sense. This environment deals with change more practically than just IT and therefore it is probably an early adopter for the automation change on its way now.

So you can see that in an environment with well defined processes it is very easy to place an automation engine or an IT operating auto pilot. In an organization that does not have IT operating processes in place yet, just finding the proper interfaces for the automation engine and redefining the roles of the IT experts is probably a piece of work for Sisyphus.

fig. 3 - ITIL integration of automation

fig. 3 - ITIL integration of automation

Incident management is just one example of how an automation engine that actually acts like an auto pilot can be integrated to dramatically reduce cost in IT operation while simultaneously increasing quality and making the jobs of IT experts much more interesting. As can be seen from figure 3 the automation engine places itself between CMDB with enterprise monitoring system and the process layer actually involving IT experts. This is not only valid for the reactive ITIL processes like incident or problem management but also for proactive processes such as availability or capacity management where our autopilot engine will itself invoke work load automation tools in order to up- or downscale an IT environment according to predicted usage and demand.

This level of integration into established processes and behavioral patterns of technical advanced staff is very rare for a tool that radically changes the workload of IT operating teams, service managers. So this approach is one of the few roads available to actually move one step ahead in an environment that produces ever more complex IT applications, interdependencies between IT services and speed of change within the environment. Just think about the kind of pressure a fully cloud computing based banking data center would put on administrators…

They would have to cope with a dynamically changing environment, changing dependencies and rapidly changing communication matrixes. Automation could handle the ordinary tasks in such an environment without being pressurized by speed and changing preconditions and contact IT experts with exact and well documented information when an unknown issue occurs – relieving them of the pressure generated by the dynamic IT environment and making much more use of their actual expertise. If that is not what we want (I could not really see a reason) it is certainly what we need to keep up with a changing world without further demolishing the image of the IT-Crowd.

Welcome to 2009 – A Year of Great Change and a Year Loaded with Opportunity for Technology

Automation, Automation Technology Architect View, Business Impact of Automation, Clouds, Market, Social Impact of Automation No Comments »

I wish you all a happy new year. This may sound hollow as the upcoming year is starting out with immeasurable uncertainties. A recession is unavoidable as the economic mechanisms are working their way through the different economic sectors and into everyday life. Given the origin of this recession – the financial industry with capital being one of three pillars of our economic system– even systematic change may be in store. The greatest problem is decisions being held back due to these uncertainties thereby creating an even greater economical impact. Thus what we definitely are feeling as a crisis is a powerful well of change. This well will flood through economy, society and of course technology. We will need strong decision makers and innovators – real entrepreneurs – to embrace change and make use of its power to tackle some of the grand challenges built up during the last 50 years.

For those of us promoting new technologies the willingness to embrace change is often the biggest obstacle in putting these new technologies to use. Think about the argument of how cloud computing cannot be a good thing because it changes the relationship between our data and our computations we are so much used to. Or think about bringing the concept of automatic system operation to the administrators who will no longer be just operators but turn into system experts. All these high tech concepts require a dramatically changed way of approaching everyday problems and those of us implementing these new technologies know that inventing the technology is less than 50% of the way. The biggest challenge is attracting enough interest in all players the new technology touches, in order to make them embrace the required change to effectively make use of the new technology. The current situation may prove to be one of the most potent accelerators for technological change possible. So to all of you – those who invent, implement, decide upon or just make use of new technologies – make wise, well thought of and brave decisions embracing change. You will be the ones who will contribute towards a speedy way out of the current uncertain situation.

After giving you so much leeway ( ;-) ) by posting a few personal stories from the past summer to past autumn we are all back to business and I want to share some of the reading and thinking that I have done during the quiet time between Christmas and New Year´s Eve in the articles coming up this week. I will start out with a little catching up on the “clouds are bad discussion” started by Richard Stallman with an interview given to the Guardian in September 2008. I do believe there was a good deal of stubbornness and corporate mistrust behind condemning the cloud concept as you will read. I will then continue with a post on integrating the concept of automation – rather than just tools – into IT operation processes and tool infrastructure. After you have read Roland´s post on “Automating What?” in November you may be interested in how the concept of automation is integrated into everyday IT service management and how our concept of e.g. an automated incident management is incorporated into a working IT environment. Following this post I will try to show a landscape of technology and tools and the way the ongoing development is focusing in on automation as a concept. This process was started when tools were used to ease the manual process of maintaining system functionality (e.g. system management tools) and continued by the automation tools that enable complex changes to be performed by entering a simple command (e.g. change automation or run book automation tools). The process is now at a point where actually decisions are taken by the automation software (e.g. what hardware is used to do what tasks by which is decided by workload automation tools) and will finally come to tools that make use of all the experience of system administrators in order to automatically decide how to keep systems alive. Thus automating incident- problem- capacity- and availability management. This kind of tool is what we have been using and developing for quite some time now ( see the aAE) and the post will show how this kind of tool integrates with the whole landscape of tasks and tools involved in IT service management.

IT Automation – All the Things We Are Talking About

Automation, Automation Technology Architect View, Business Impact of Automation 1 Comment »

Reading and writing about IT automation, I keep on learning about the subject. Lately I found that there are so many flavors of automation around the operating processes of IT, that misunderstanding seems inevitable. So I try to make a point here to talk about the different kinds of automation one can use all around maintaining a high quality IT environment.

Types of Automation Tasks

  1. 003366;">Incident-, Problem-, Capacity- and Availability Management
    Automation engines specialized on analyzing and handling events that occur in a IT environment that may lead to or themselves represent malfunctions, loss of quality and the like. Both reactive (automated reaction to an incoming event) and proactive (automated actions taken to prevent events from occurring) are target of these engines. Automation engines that handle the “fault operating” are either embedded into the ITIL processes (see blog entry on extending ITIL with automation) like our automation engine (003366;">aAe) or are embedded into system components or management systems with a narrow scope e.g. on redundancy activation.
  2. 003366;">Change Management
    Automation engines specialized on performing changes that modify or extend an IT environment automatically. Either these engines are Inserting an abstracted layer above tasks that need to be performed (like adding users, restarting a component and the like) these engines allow an administrator to perform tasks on many machines or on different platforms without by interacting with the automation engine. An example for this kind of engine is the Puppet framework with a very structured approach to abstraction. Or these engines focus on scaling an IT environment by dynamically adding resources or automatically installing or modifying a system like the Tivoli Provisioning Manager or VMWare Virtual Center does.

I really do hope (not just to save you some consulting fees) to have helped avoid misunderstandings, when you are talking to others about automation and even better maybe I could point out some additional techniques you can look at to make life easier.

Implementing Automation – the Inevitable Step after Implementing ITIL Processes

Automation, Automation Technology Architect View, Business Impact of Automation 2 Comments »

Some time ago I published an article on the future of IT operation after we are through with all the ITIL implementations (still) taking place. Assuming that all the nice failure handling, proactive failure avoiding and communication processes like Incident, Problem, Capacity and Availability Management are in place, implementing automation is the logical way to move ahead. Compared to implementing ITIL automation actually changes the things that are done and the way they are done. As you may have guessed this statement alone was fertile ground for interesting and heated debating.

Generally the article concluded that implementing the ITIL processes concentrates on the interfaces between IT experts, clients, business requirements and the like where automation concentrates on the way IT operation is actually “produced” (in an industrial meaning of the word). Even though these two may be viewed separately the article shows how an automation environment highly depends on monitoring and IT component data. An ITIL environment puts forth a valid definition of both data sources for a complete IT environment and is therefore a good foundation to start implementing automation.

Automation integrated into ITIL

An IT operations environment with implemented ITIL processes also has common interfaces to the acting staff members. This makes it very easy to “inject” a new entity – like an automation engine – into the whole system. In such an approach the automation engine wraps itself around the data sources of CMDB and monitoring systems. All communication that would today be directed towards human recipients is handled by the automation engine first. Only if the automation engine is not able to complete the task the IT experts are involved.

This short description reveals how well an ITIL implementation prepares an IT organization for implementing automation.  It also shows how automation is made completely transparent to the business using the IT – as the automation engine acts like any human entity taking part in the ITIL processes.

The article itself gives a short overview of the “operational” ITIL processes and how their implementation builds the foundation for automation. If you are interested you may read the whole text here.

A Simplistic Approach to IT Dependency Modeling: M—A-R-S

Automation, Automation Technology Architect View 3 Comments »

You have seen a number of abstract articles talking about the “interdependency model” of an IT environment that is necessary to actually automate across operational silos on this blog. Building this model actually is the main challenge in implementing automation.

I have seen customer situations, where building the dependency model was such an extensive effort, that the focus for the goal of implementing automation was completely lost. An interdependency model is supposed to answer the question “how does one entity in a IT environment depend upon or influence other objects” or “Why the ff0000;">@)!(/Q$§ doesn´t it work anymore after someone I don´t even know changed something I don´t even care about?”.

Though simple answering these questions without having a good CMDB in place and being able to query that CMDB on an expert level, leaves a long trail across an IT organization. Since these key questions are asked for every failure and should be asked for every change an interdependency model is obviously something one REALLY wants to have.

As this model is also one of the key input streams to an automation engine that actually operates IT across silo and competency boundaries we have put quite some thought into the art of modeling. Typical techies we are, our first approach was to build THE BEST of all possible models. We started to build our methodology on the basis of economic dependency models. Well, what can I say… We got a great model and maintaining it just for fun would have cost us an arm and a leg plus maintenance would never have been possible from our technical teams.

So we went back to the think tank with the preliminary assumption, that we would be willing to simplify the model – if we could produce one that could and would be maintained by the technical staff themselves (acceptance is an important factor).

We arrived at something we call the arago M-A-R-S model.

M-A-R-S Model Description

003366;">The “M” is for “Machine”

A machine (real or virtual, cloud compartment or actual operating system) is still the basic component of any IT application. Machines can be servers, network components and anything else. Administrators tasked with keeping the infrastructure alive do normally not know much about the business applications running on their “machines” but they know their machines on a first name basis. So a machine is a basic building block of IT infrastructure as well as a component very close to the technical staff. Thus the entity of a machine fulfills both our pre requirements (simple to understand and maintainable by the IT staff)

003366;">The “A” is for “Application”

An application is something that is used in a business process. Or technically speaking an application is the “thing” a “user” complains about when interfacing (talking to) the IT department. An application is therefore the basic building block from “the other side”. Where a machine is the basic building block of the technical view of IT, applications are the basic building blocks of the business view of IT.
Naturally an application uses machines or much better “-S-ervices” offered by these.

003366;">The “R” is for “Resource”

As you may imagine building up a dependency model by listing all the services offered by a number of machines and then listing all the services used by an application may leave you with very long lists. So we decided to introduce one layer of abstraction into our approach to dependency modeling. This is called the “resource” layer. A resource combines a number of services with a 100% dependency (e.g. an SAP service will never run without a database service, thus there would be an SAP resource combining the two services into one entity and thereby reducing the complexity of the dependency tree of an application).

003366;">The “S” is for “Service”

Service defines some functionality that is offered by a machine or a cluster of machines (for redundancy and availability reasons). A service in this case is an IT term that describes a simple building block of software running on a machine. A service can be anything ranging from an operating system or a network connection though a simple application such as a web server or database all the way up to an SAP system or an individually programmed piece of software. A service is something often talked about in the IT organization and usually something that has developers or vendors attached to it.

As simple as this may sound: this four-layer model allows for a real connection of all silos within technology and organization. This model can be maintained manually or – better – can be imported from a good CMDB.

Combining monitoring information with this model (i.e. hooking up monitoring and master data with the nodes of the model) is the basic input for an automated environment (also see input stream and naming convention articles on this blog). You can read the full article on “Measurement as the Prerequisite for Automation” here.

I am sure you will find other good applications for a simple model – or even better maybe you do have some suggestions on how to improve and/or further simplify this model.

Input to an Automation Engine – Namespace as a Starting Point

Automation, Automation Technology Architect View No Comments »

I have been talking quite a bit about the technology that drives an automation engine. Actually there could be many approaches for the technology that evaluates conditions and chooses the right actions to execute. Our technology takes a “divide and conquer approach” in a very distributed system and therefore simulates the behavior of a good human administrator. Other technologies take a “drill-down” or “boil-up” approach. All the technologies produce automation results and normally they are used for special tasks. E.g. a drill-down approach is focused on a straight forward root cause analysis approach.

Apart from all these technologies and very important backend decision the question of what goes into an automation engine is paramount to the actual results of automation. I have written a blog entry on the basic IO model of an automation engine emphasizing this point.  As you may remember I proposed two different streams of preliminary input data to the automation engine. First there is the model data that build up the space automation is to take place within and second there is the monitoring data representing the actual condition of each node in the model. These two basic data streams are evaluated by the rules engine.

The data streams have to fulfill certain pre conditions in order to produce proper automation output.  I will talk about the attributes of the model data stream in this article in more detail. The monitoring data stream holds either event driven or time series data. Finding a way to normalize this stream so a rule can evaluate monitoring data at any given point in time will be the contents of an entry here soon.

The IT model is described as a representation of the interdependencies between IT entities or in a ITIL way of speaking between configuration items. There are a lot approaches towards building such a model. Depending on the approach the model has a different number of layers and dimensions as well as different kinds of relations between its nodes. Just like an up to date model is key to automation as well as to orderly IT processes, the complexity and accuracy of the model will have to compromise with its maintainability. Many vendors are trying to reduce this need to compromise by building auto discovery solutions such as IBM’s TADDM. Still the complexity of the model is proportional to the user acceptance of every process and technology based upon this model.

Behind the model of interdependencies are the nodes that are interdependent. And these nodes have to describe IT entities using meta-information. This meta information is put down into attributes and these attributes can either save the world or be the cause of all evil.

Therefore building the actual values for the attributes should be worth some thought. Surely there are simple attributes like HOSTNAME or the like and we do not have to think much about it. But yet a simple attribute such as OS can be a bigger challenge than would be expected. When you simply assign “Windows” or “Linux” to OS, then you will only be able to match this exact system when building conditions for automation. When you assign something like “server.windows.2003″ where the first part describes the OS usage, the second the OS family and the third the actual system you can match other windows servers by building a condition like “server.windows ..*” or you will be able to select all  Linux systems (regardless of desktop or server) by building a condition like “.*.linux..*”.

Maybe this little example shows the power of building up proper systems for name spacing. So what kind of system is appropriate for automation? A simple “name” solution (like the first example) is not good for anything but a quick and dirty test of an automation engine algorithm. The second approach shown above (a tree like structure) is very powerful and very close to XML (which most people use to declare structured data these days). These tree like structures are good for expression matching and therefore good as an input stream to automation engines. When using these structures you have to build up a clear understanding of the trees to use first. As you can see from many discussions (one of the most competent between Van Wiles and William Vambenepe) the problem of agreeing to applicable and technically usable naming conventions is still up in the air even though it can be one of the major causes for CMDB projects failing and definitely has major impact on any automation engine. Each vendor has their own naming conventions and definitions eloquently elaborated, but unfortunately no one has looked upon the problem from higher ground. The closest I have found so far is a chapter in the book Implementing ITIL Configuration Management by Larry Klosterboer.

I have had many encounters with strange approaches towards the issue of naming conventions and namespace  and therefore made sure that our algorithms can work with any kind of namespace (with varying degrees of performance). If you want to “do it right” I would strongly suggest to stick to the following principles when building up your personal CMDB or model of interdependencies:

  1. If you are willing to attach yourself to a vendor (not just for the CMDB, but most products delivering towards the ITIL processes), stick with the naming conventions of this vendor. The guys usually have put some thought into them. If this is not possible for you (either because you strategically have to place large vendors against each other, because you like your software zoo or just because…) completely build up your own space.
  2. Use a treelike structure for everything and make this tree structure fixed. Meaning that each depth level in the tree always correlates to the same sub attribute. This may mean that you will have to “fill” some levels in the tree for some nodes (like “windows.windows.2003″). This will save you from extensive misinterpretation by people who do not use your namespace everyday.
  3. Do not include versions into the tree-structured attributes. Versions are a secondary decision criteria and are used AFTER you know what you are dealing with. Not just our automation engine does use different parts of a rule but still the same rule for different versions of the same environment, many other tools do – therefore performance increases when you keep versions separately.
  4. Do not “outperform” yourself when building or using naming conventions. In any case (using a vendor´s approach (who has to be very flexible) or your own (you may want to do it scientifically tight)) only fill in or use the attributes and sub attributes that make sense for the task at hand. If you stick to the proper structure you can always enter additional data later on (as you need it). Data in place has to be of some use, as it just by being there creates costs).

Just by sticking to these (1. and 4. being the most important to bracket things up) you can make sure that your IT model is easily understood, has low maintenance cost and can be used for something innovative like automation right away.

Who is automated „away“

Automation, Social Impact of Automation No Comments »

As discussed before, automation in IT operations definitely has a strong social impact. It is a question of how IT professionals deal with the change that will make the difference in the end.

As I spent most of last week at an American University, I obviously had quite some discussions on how automation impacts the lives of IT administrators. There seems to be a lot of personal discomfort (understandably). Unfortunately these personal issues get mixed up with the technical ones. Many people have asked me questions like “do you trust the machine to stop a service, restart a machine or even allocate resources dynamically?” Well, yes I do. I have trusted my system for quite some time to allocate memory and disk space for me and so have you and we are trusting computer programs to land planes, control elevators and life support systems in an ER. So why – WHY – should we not trust a machine to do something radical like rebooting a server?

In my opinion a machine has two major advantages over a human administrator in standard situations. First it never executes radical commands due to “gut feeling” (like boot feels good) and second it documents the path it took to reach to conclusion that executing specific commands is a good idea. So you do have documentation (hello to all you SOX consultants out there) and if there really is an error you know where to look and you will be able to change you rule set accordingly.

Garex Ok, so maybe we can solve the problem of trust through logical argument. Unfortunately some people are very much resistant to logic. So another approach we sometimes take is to do a dry run. That means, we install the automation engine and disable all execution and redirect the execution command to document everything it would do into a trouble-ticket. As soon as administrators start pasting commands out of the tickets you know it is time to enable the real automation.

But let us get down to the actual administrators and the consequences all that automation has on them. There is this geek shirt “Go away, or I will replace you with a very small shell script”. By the way, the guy in the picture is actually one of our administrators – one of the guys who really DO automation. I think the shirt was done to scare off users. But nowadays this is actually what will happen to administrators who do not want to be part of this changing world. In my vision of the future there will only be two kinds of administrative staff close to a data center: Real IT experts (the Gurus) and janitors. The experts are today´s administrators who want to get rid of all the boring – I have done that about 10.000 times – tasks and deal with the exciting stuff instead. Well the others …..

To get it straight: I actually do not think that there will be fewer jobs in IT administration in the future, mainly because IT is an ever growing plant. I do think that there will be a lot less “boring” and unqualified work in IT – as we have seen in all other industries. Before.

So, is that really a bad thing? More exciting tasks, more real results, more happy administrators? I don´t think so… Let´s get it on guys

Top