Archive for the 'Automation Technology Architect View' Category

IT Automation – All the Things We Are Talking About

Automation, Automation Technology Architect View, Business Impact of Automation 1 Comment »

Reading and writing about IT automation, I keep on learning about the subject. Lately I found that there are so many flavors of automation around the operating processes of IT, that misunderstanding seems inevitable. So I try to make a point here to talk about the different kinds of automation one can use all around maintaining a high quality IT environment.

Types of Automation Tasks

  1. Incident-, Problem-, Capacity- and Availability Management
    Automation engines specialized on analyzing and handling events that occur in a IT environment that may lead to or themselves represent malfunctions, loss of quality and the like. Both reactive (automated reaction to an incoming event) and proactive (automated actions taken to prevent events from occurring) are target of these engines. Automation engines that handle the “fault operating” are either embedded into the ITIL processes (see blog entry on extending ITIL with automation) like our automation engine (aAe) or are embedded into system components or management systems with a narrow scope e.g. on redundancy activation.
  2. Change Management
    Automation engines specialized on performing changes that modify or extend an IT environment automatically. Either these engines are Inserting an abstracted layer above tasks that need to be performed (like adding users, restarting a component and the like) these engines allow an administrator to perform tasks on many machines or on different platforms without by interacting with the automation engine. An example for this kind of engine is the Puppet framework with a very structured approach to abstraction. Or these engines focus on scaling an IT environment by dynamically adding resources or automatically installing or modifying a system like the Tivoli Provisioning Manager or VMWare Virtual Center does.

I really do hope (not just to save you some consulting fees) to have helped avoid misunderstandings, when you are talking to others about automation and even better maybe I could point out some additional techniques you can look at to make life easier.

Implementing Automation – the Inevitable Step after Implementing ITIL Processes

Automation, Automation Technology Architect View, Business Impact of Automation 2 Comments »

Some time ago I published an article on the future of IT operation after we are through with all the ITIL implementations (still) taking place. Assuming that all the nice failure handling, proactive failure avoiding and communication processes like Incident, Problem, Capacity and Availability Management are in place, implementing automation is the logical way to move ahead. Compared to implementing ITIL automation actually changes the things that are done and the way they are done. As you may have guessed this statement alone was fertile ground for interesting and heated debating.

Generally the article concluded that implementing the ITIL processes concentrates on the interfaces between IT experts, clients, business requirements and the like where automation concentrates on the way IT operation is actually “produced” (in an industrial meaning of the word). Even though these two may be viewed separately the article shows how an automation environment highly depends on monitoring and IT component data. An ITIL environment puts forth a valid definition of both data sources for a complete IT environment and is therefore a good foundation to start implementing automation.

Automation integrated into ITIL

An IT operations environment with implemented ITIL processes also has common interfaces to the acting staff members. This makes it very easy to “inject” a new entity - like an automation engine - into the whole system. In such an approach the automation engine wraps itself around the data sources of CMDB and monitoring systems. All communication that would today be directed towards human recipients is handled by the automation engine first. Only if the automation engine is not able to complete the task the IT experts are involved.

This short description reveals how well an ITIL implementation prepares an IT organization for implementing automation.  It also shows how automation is made completely transparent to the business using the IT - as the automation engine acts like any human entity taking part in the ITIL processes.

The article itself gives a short overview of the “operational” ITIL processes and how their implementation builds the foundation for automation. If you are interested you may read the whole text here.

A Simplistic Approach to IT Dependency Modeling: M—A-R-S

Automation, Automation Technology Architect View 3 Comments »

You have seen a number of abstract articles talking about the “interdependency model” of an IT environment that is necessary to actually automate across operational silos on this blog. Building this model actually is the main challenge in implementing automation.

I have seen customer situations, where building the dependency model was such an extensive effort, that the focus for the goal of implementing automation was completely lost. An interdependency model is supposed to answer the question “how does one entity in a IT environment depend upon or influence other objects” or “Why the @)!(/Q$§ doesn´t it work anymore after someone I don´t even know changed something I don´t even care about?”.

Though simple answering these questions without having a good CMDB in place and being able to query that CMDB on an expert level, leaves a long trail across an IT organization. Since these key questions are asked for every failure and should be asked for every change an interdependency model is obviously something one REALLY wants to have.

As this model is also one of the key input streams to an automation engine that actually operates IT across silo and competency boundaries we have put quite some thought into the art of modeling. Typical techies we are, our first approach was to build THE BEST of all possible models. We started to build our methodology on the basis of economic dependency models. Well, what can I say… We got a great model and maintaining it just for fun would have cost us an arm and a leg plus maintenance would never have been possible from our technical teams.

So we went back to the think tank with the preliminary assumption, that we would be willing to simplify the model - if we could produce one that could and would be maintained by the technical staff themselves (acceptance is an important factor).

We arrived at something we call the arago M-A-R-S model.

M-A-R-S Model Description

The “M” is for “Machine”

A machine (real or virtual, cloud compartment or actual operating system) is still the basic component of any IT application. Machines can be servers, network components and anything else. Administrators tasked with keeping the infrastructure alive do normally not know much about the business applications running on their “machines” but they know their machines on a first name basis. So a machine is a basic building block of IT infrastructure as well as a component very close to the technical staff. Thus the entity of a machine fulfills both our pre requirements (simple to understand and maintainable by the IT staff)

The “A” is for “Application”

An application is something that is used in a business process. Or technically speaking an application is the “thing” a “user” complains about when interfacing (talking to) the IT department. An application is therefore the basic building block from “the other side”. Where a machine is the basic building block of the technical view of IT, applications are the basic building blocks of the business view of IT.
Naturally an application uses machines or much better “-S-ervices” offered by these.

The “R” is for “Resource”

As you may imagine building up a dependency model by listing all the services offered by a number of machines and then listing all the services used by an application may leave you with very long lists. So we decided to introduce one layer of abstraction into our approach to dependency modeling. This is called the “resource” layer. A resource combines a number of services with a 100% dependency (e.g. an SAP service will never run without a database service, thus there would be an SAP resource combining the two services into one entity and thereby reducing the complexity of the dependency tree of an application).

The “S” is for “Service”

Service defines some functionality that is offered by a machine or a cluster of machines (for redundancy and availability reasons). A service in this case is an IT term that describes a simple building block of software running on a machine. A service can be anything ranging from an operating system or a network connection though a simple application such as a web server or database all the way up to an SAP system or an individually programmed piece of software. A service is something often talked about in the IT organization and usually something that has developers or vendors attached to it.

As simple as this may sound: this four-layer model allows for a real connection of all silos within technology and organization. This model can be maintained manually or - better - can be imported from a good CMDB.

Combining monitoring information with this model (i.e. hooking up monitoring and master data with the nodes of the model) is the basic input for an automated environment (also see input stream and naming convention articles on this blog). You can read the full article on “Measurement as the Prerequisite for Automation” here.

I am sure you will find other good applications for a simple model - or even better maybe you do have some suggestions on how to improve and/or further simplify this model.

Input to an Automation Engine – Namespace as a Starting Point

Automation, Automation Technology Architect View No Comments »

I have been talking quite a bit about the technology that drives an automation engine. Actually there could be many approaches for the technology that evaluates conditions and chooses the right actions to execute. Our technology takes a “divide and conquer approach” in a very distributed system and therefore simulates the behavior of a good human administrator. Other technologies take a “drill-down” or “boil-up” approach. All the technologies produce automation results and normally they are used for special tasks. E.g. a drill-down approach is focused on a straight forward root cause analysis approach.

Apart from all these technologies and very important backend decision the question of what goes into an automation engine is paramount to the actual results of automation. I have written a blog entry on the basic IO model of an automation engine emphasizing this point.  As you may remember I proposed two different streams of preliminary input data to the automation engine. First there is the model data that build up the space automation is to take place within and second there is the monitoring data representing the actual condition of each node in the model. These two basic data streams are evaluated by the rules engine.

The data streams have to fulfill certain pre conditions in order to produce proper automation output.  I will talk about the attributes of the model data stream in this article in more detail. The monitoring data stream holds either event driven or time series data. Finding a way to normalize this stream so a rule can evaluate monitoring data at any given point in time will be the contents of an entry here soon.

The IT model is described as a representation of the interdependencies between IT entities or in a ITIL way of speaking between configuration items. There are a lot approaches towards building such a model. Depending on the approach the model has a different number of layers and dimensions as well as different kinds of relations between its nodes. Just like an up to date model is key to automation as well as to orderly IT processes, the complexity and accuracy of the model will have to compromise with its maintainability. Many vendors are trying to reduce this need to compromise by building auto discovery solutions such as IBM’s TADDM. Still the complexity of the model is proportional to the user acceptance of every process and technology based upon this model.

Behind the model of interdependencies are the nodes that are interdependent. And these nodes have to describe IT entities using meta-information. This meta information is put down into attributes and these attributes can either save the world or be the cause of all evil.

Therefore building the actual values for the attributes should be worth some thought. Surely there are simple attributes like HOSTNAME or the like and we do not have to think much about it. But yet a simple attribute such as OS can be a bigger challenge than would be expected. When you simply assign “Windows” or “Linux” to OS, then you will only be able to match this exact system when building conditions for automation. When you assign something like “server.windows.2003″ where the first part describes the OS usage, the second the OS family and the third the actual system you can match other windows servers by building a condition like “server\.windows \..*” or you will be able to select all  Linux systems (regardless of desktop or server) by building a condition like “.*\.linux\..*”.

Maybe this little example shows the power of building up proper systems for name spacing. So what kind of system is appropriate for automation? A simple “name” solution (like the first example) is not good for anything but a quick and dirty test of an automation engine algorithm. The second approach shown above (a tree like structure) is very powerful and very close to XML (which most people use to declare structured data these days). These tree like structures are good for expression matching and therefore good as an input stream to automation engines. When using these structures you have to build up a clear understanding of the trees to use first. As you can see from many discussions (one of the most competent between Van Wiles and William Vambenepe) the problem of agreeing to applicable and technically usable naming conventions is still up in the air even though it can be one of the major causes for CMDB projects failing and definitely has major impact on any automation engine. Each vendor has their own naming conventions and definitions eloquently elaborated, but unfortunately no one has looked upon the problem from higher ground. The closest I have found so far is a chapter in the book Implementing ITIL Configuration Management by Larry Klosterboer.

I have had many encounters with strange approaches towards the issue of naming conventions and namespace  and therefore made sure that our algorithms can work with any kind of namespace (with varying degrees of performance). If you want to “do it right” I would strongly suggest to stick to the following principles when building up your personal CMDB or model of interdependencies:

  1. If you are willing to attach yourself to a vendor (not just for the CMDB, but most products delivering towards the ITIL processes), stick with the naming conventions of this vendor. The guys usually have put some thought into them. If this is not possible for you (either because you strategically have to place large vendors against each other, because you like your software zoo or just because…) completely build up your own space.
  2. Use a treelike structure for everything and make this tree structure fixed. Meaning that each depth level in the tree always correlates to the same sub attribute. This may mean that you will have to “fill” some levels in the tree for some nodes (like “windows.windows.2003″). This will save you from extensive misinterpretation by people who do not use your namespace everyday.
  3. Do not include versions into the tree-structured attributes. Versions are a secondary decision criteria and are used AFTER you know what you are dealing with. Not just our automation engine does use different parts of a rule but still the same rule for different versions of the same environment, many other tools do - therefore performance increases when you keep versions separately.
  4. Do not “outperform” yourself when building or using naming conventions. In any case (using a vendor´s approach (who has to be very flexible) or your own (you may want to do it scientifically tight)) only fill in or use the attributes and sub attributes that make sense for the task at hand. If you stick to the proper structure you can always enter additional data later on (as you need it). Data in place has to be of some use, as it just by being there creates costs).

Just by sticking to these (1. and 4. being the most important to bracket things up) you can make sure that your IT model is easily understood, has low maintenance cost and can be used for something innovative like automation right away.

Cloud Computing needs automation

Automation, Automation Technology Architect View, Business Impact of Automation, Uncategorized 2 Comments »

Yesterday I had the chance to get a feeling for one of the hottest topics in IT infrastructure. A panel session at IBM PULSE 2008 was dedicated to the topic of Cloud Computing (even though IBM marketing people don´t seem to like the term and have come up with quite some innovative words – words no one uses, so let us stick with the cloud). The panel was buzzing with intelligence, unfortunately we as the audience could not really match up. So we listened to a pretty much directed discussion on how cloud computing would replace today´s approach to hardware and infrastructure in general. Well I do agree, no one needs dedicated servers when resources can be allocated dynamically and come preconfigured and interconnected. Kristin Hansen stripped the key features of a cloud down to simplicity (users do not care how their resources are set up, they just use them), mobility (obviously use is possible from anywhere and even a large computing cluster could be controlled from a phone like device) and elasticity (you only setup or pay what you really need). Sounds fine to everyone and Google and Amazon have definitely shown to the world that this concept works in a closed shop environment. According to Dave Lindquist IBM is working on a methodology and technology to make most applications “cloudable”. The most interesting remark I heard during the discussion was the “Cloud Computing is the combination of technology (virtualization and automation) and discipline (a stringent way of breaking down the offered services into small blocks in order to recombine them quickly and automatically upon the user´s request as well as defining standards or service catalogues to be offered)”. I guess the discipline part will put forth a great deal of discussions between process consultants and methodology consultants and in the end there will certainly be a couple of good ways to set things up. Just as certainly there will be the need to standardize these processes and methodologies in the end, so clouds are not proprietary but keep mobile even between cloud providers.

Naturally I am more interested in the technology part, that is needed behind cloud computing. Technology - in this case - not referring to the cloud management servers and agents themselves, but the technology surrounding them. The first technology that comes to mind is virtualization as without this core there will be no cloud, at least no cloud that can integrate legacy applications rather than working in a very tightly closed universe like Google does. There are quite some good approaches to virtualization – commercially as well as open source – and the approach taken should really depend on the needs of the applications to be run on a specific part of the cloud. It does probably make sense to even merge the available virtualization technologies within one cloud. It might make sense to use containers build into the operating system or complete hardware virtualization depending on the kind of application to be run and therefore a cloud manager will have to deal with all kinds of virtualization technology.

More on my focus is the service management side of cloud computing and I strongly believe that automated operating is a key component of a good cloud infrastructure. Definitely the cloud infrastructure and management components will take care auf automatic provisioning and resource management, but as soon as legacy applications – that do not really know that they are running on a beautifully scalable environment – are involved manual administration of these applications would mean chasing an ever changing rabbit across a chameleon planet – an image most amusing to bystanders but neither funny to administrators nor to the ones paying them. So in my opinion an automation engine could be fed IT model data and monitoring feeds directly from the cloud manager and could thus deal with the ever changing environment and keep the application automation rules up to date with the cloud components currently in use. This automation engine cannot use a drill down approach, because the infrastructure might not even support drill downs and can change ever so often. The automation engine assuring a good foundation for quality service a professional service management will have to use a more human “circle in” or divide and conquer approach.

Does this sound familiar? By the way, check out the articles on the “Blue Cloud”; technical pioneers at work (other bloggers also think about the blue cloud)…. Also interesting is the cooperation between Google and IBM on producing cloud standards

I/O scheme of an automation engine – or the Importance of having a correct IT Model

Automation Technology Architect View 2 Comments »

The automation engine is a computer program and as such it follows the simple scheme of “Input - Processing - Output”. The engine takes care of the processing part. So in order to talk about the quality of such a program, we have to examine the I/O scheme of the automation engine.

automation engine IO schemeThere are two lanes of external input into the automation engine. First there is the model of the IT infrastructure and application landscape the automation Engine is to work on. The second input stream is monitoring or event data from all the components of the infrastructure described by the first input stream. The automation engine only has one internal input or configuration stream. This stream defines the rule set of the engine in an appropriate format. This rule set will enable the “black box” to determine which action has to be executed automatically as well as the conditions permitting the execution of a certain action. This configuration stream also includes the actions themselves or links to a repository of actions - e.g. scripts written by administrators.

The automation engine will produce two output streams. One is an external stream documenting the actions taken by the automation engine to a service management or similar system as well as exporting skill management data to interface with manual operations effectively. The main output of the automation engine is a stream of commands to the components described in the IT model previously mentioned.

The internal input and output of the automation engine is the primary processing of the software and therefore part of the implementation of an automation engine. Proper functionality of the engine strongly relies on the quality of the external input streams (IT Model and event or monitoring data). The integration of the automation engine into the manual administration processes or rather the control of the manual processes though the automation engine determines the effectiveness of the operational workforce and the degree of automation that can be reached.

Thus the input data and the integration of the output produced by the automation engine are issues that must not be underestimated. It is not enough to have some sort of CMDB to import components of the IT infrastructure to be operated upon. The input has to include relationship and attribute information that are at the necessary level of detail and the monitoring streams have to be connected to the configuration items and their relations. Otherwise the automation engine will not operate at the desired level of effectiveness or even worse generate false commands to be executed.

I have found that in many cases the configuration of the automation engine - especially hooking it up to the IT infrastructure and the available monitoring environment - is best done manually. Many CMDB implementations can be used to cross check the configuration but I am still looking for a CMDB implementation that will give an automation engine a good IT model and access to the required monitoring and event data.

First look at an automation engine

Automation, Automation Technology Architect View No Comments »

as you - hopefully - have read, automation is not magic, not even black magic. It is the execution of actions based on conditions. As this does not sound all that difficult, what do we need to integrate this concept into everyday IT maintenance life? Simple, we need some sort of machine - an automation engine - that will sit on all the IT components of our environment and execute actions if some conditions we have programmed the machine with become true.

Simple may be a pretty misleading term. The concept of this machine is very simple, but this automation engine has to monitor all data available in our IT environment in order to match any conditions and on the other hand this engine will have to find the right action to execute. The concept behind this approach is simple; the technical problems to be solved in order to make this machine work are numerous and have to be dealt with. Let me take a glimpse at a few of the most immanent ones:

  1. Mass of data to be processed
    As you may remember, we are looking at all the system management, KPI and quality data we can get our hands on for all the IT around us. So there is a lot of data and we have to deal with all of it.
  2. Mass of conditions
    Besides all that data there are a lot of conditions that have to be evaluated upon the available data series. The automation engine is a very elaborate version of a rule engine, because it is dealing with a highly interconnected logic tree (the IT model) and many conditions on a large data space. So typical approaches like decision matrixes do not work for cutting short on rule evaluation.
  3. Unknown rules
    If we wanted to put everything that needs to be executed automatically into an explicit rule, building the rule system would take a lifetime and the problem “mass of conditions” would become ever more influential. Building implicit rules is too complicated for the user. So the automation engine has to adopt a behavior of encircle the problem. This is a divide and conquer approach instead of asking a user to enter every circumstance and every reaction because this kind of “brain dump” is simply not invented yet. I know this is very abstract and I am sure that I will find a little more time soon to elaborate on the way an automation engine has to find the proper actions to take to solve a real life problem.

By the way, most computer systems and approaches in system management software take a simple approach to tackle these challenges. Techniques like root cause analysis or autonomic systems try to move down the dependency tree and find the problem somewhere down there. Why is this approach practical? Well it narrows down the amount of possible data sources and actions that can be taken quickly and in that way a computer system can actually work by out the simple problem resulting. And why are these approaches a short jump? Well, they simply don’t work with complex problems that show symptoms in some remote location of the IT environment or problems that are caused by multiple sources. Most problems in modern IT systems are of the latter kind and therefore the common top down approaches execute quite some actions, but not merely as much as an automation engine should solve. Or would you expect your best system administrator to simply go down the logical tree of connected systems while trying to find out why your ecommerce application is not working? No, not really, because good administrators encircle the cause of a problem and thus exclude great parts of the IT environment throught their experience as possibly causes and then only concentrate on the “relevant” remainders.

The simple concepts behind automation

Automation, Automation Technology Architect View No Comments »

I should be the last person, to say that automation is a simple concept - I make my money on automation. And often people really think that “a technical concept behind automation” is the hard part - well let me tell you: it is not. The technical concept of automation is something deeply embedded in the binary way our It works. The technical principle behind automation simply is

IF (a complex condition) IS TRUE THEN (do something)

So does that sound familiar? And we have not even introduced the concept of ELSE, ELIF or CASE yet J. Well we can put this into a little more technical terms by saying:

Automation is the condition based execution of actions to ensure the quality of service of an IT environment. Where conditions can be combined from expressions covering all aspects of the IT environment in question and actions can be one or a serious of command execution in one or many locations of the IT environment regarded.

So to go back to the divide and conquer there that was so useful in solving many IT problems we have to ask ourselves three questions:

  1. What is the IT environment and what are the interdependencies within this environment?
  2. What are the expressions “a complex condition” is composed of and what is the data evaluated in these expressions?
  3. What are the actions to be taken and where are they to be executed?

So let us try to answer the three questions. First the IT environment and its interdependencies can be modeled. The entities the environment s composed of all “configuration items” that are part of the environment in questions. The interdependencies are relations between these entities. The “detail questions” to be solved are: At what level of detail do we model, and what kind of relationship model will we use? Well answering those will take us into a specific implementation of IT automation, and we are right now looking at the concept behind all these implementations, so let us stay at this level of abstraction.

Second the expressions evaluated in order to know which actions to execute are embedded in the knowledge of the administrators doing exactly this job today. So the expressions and conditions could be classified as the knowledge database put into machine readable for. The data needed to evaluate the expressions - after we know what they are - is all data available on the IT environment we are looking at. This includes technical monitoring data, end-to-end monitoring data, data processing information, transaction monitoring but it also includes quality of service information, KPIs, SLA information and business impact data - basically anything we can get hold of.

And third actions are the things administrators and gurus enter via keyboard, mouse or telepathic network link in order to make the “bad condition” go away. An action can be a simple command on one system but it can also be a series of commands (maybe with conditional execution) or even scripts of commands distributed to many systems. So an action can be as simple as /etc/init.d/apache restart or it can be something as complex as a 10.000 line program, some SQL scripts and a shell script executed on a dozen machines. But in the end these actions are put together today - as scripts and How-to in the system administrators’ dens of the world.

So you see. Automation is something simple: We should know about our IT environment and the interdependencies of its entities anyway. We know about the conditions - or at least we can find out - and then execute actions (some of which we have already put into scripts or programs to make life easier). So automation is just a centralization and connection of things we are already doing.

Top