Archive for April, 2008

Finally spring

Private Life No Comments »

finally springThis year the local weather was not exactly kind. Loads of rain and cold winds did a good job in keeping me inside. BUT this weekend the tides have turned, Four days in a row the weather is beautiful and I love it. I have spent four days outdoors with Eudemis and just on a blanked out in the grass… Spring is an experience and a feeling obviously only poets can write about, so I will spare you. But let me say again spring has finally arrived in good old Germany. The only problem with warmer weather and horses are all the flies coming out. But as you can see from the picture, Eudemis has found a good way to deal with this inconvenience

A CMDB that can deliver the model

Uncategorized No Comments »

As we are not really consultants for modeling IT infrastructure, we are always looking for a good way to minimize our manual effort when installing our automation engine, I actually thought it should be easy to load the necessary M-A-R-S information out of any CMDB, but so far that has proven much more difficult than expected. Most CMDBs we have looked at, did either not supply the needed relationship and interdependency data or did not contain the static node information we need to bind rules. BUT yesterday we had a workshop at the IBM briefing center in Mainz to take a look at the IBM CCMDB. And tell you what: It looks like we found a CMDB that actually contains all the data we need to load the IT interdependency model. Even if some organizations keep attributes we need for rule binding in excel sheets or other strange data sources we can load them off the IBM CCMDB through its federation technology.

But that is not the whole story. We were quite exuberant about the depth of relationship and interdependencies stored in the CMDB, but it really got amazing when we saw in an actual environment, that most of the interdependencies were detected automatically. Someone at IBM actually did the work of modeling quite some ssh connections and scripts to pull this information out of netstat and other system calls. Well going though firewalls without losing the network angle seems to be a difficulty that means actual real time detection of different zones of trust is not really possible, but what we are getting out here is much better than anything we have seen before. It will save us about 80% time on implementing automation for highly complex applications. Also the time our customers need to maintain the model in place while they change their IT landscape is probably greatly reduced. We will look into creating a persistent interface to the IBM CCMDB and while we are at it to their event bus and execution facilities as well.

You know my comments on other CMDBs and our difficulties of reading anything more than SML out of them. Normally I am quite taken aback and don´t say much, but this time I am really happy.

The arago Automation Engine (aAE)

Automation Technology Deep Insight No Comments »

You must have suspected, that I am not just philosophizing about building an automation engine, but have already done so (or at least I have designed the concepts and we at arago have built it – and you also know the software architect behind the whole scene – Jens “Cy” Bartsch. So after introducing you to the concepts of automation engines and my ideas on the social impact of automation, I would like to give a short overview on the technology we use to actually have an engine that performs the system administration tasks mostly done manually today. We have build an engine that will learn and is instructed by system administrators to increase its operational abilities every day. Actually we have been working on developing this engine since 1995 and are currently at major release 4 of the engine.

As you will understand, I cannot reveal too much technical detail, but I still want to give a short look at the concepts we use. The key input to our engine is an IT infrastructure and application model based on the four layer M—A-R-S approach described earlier. The nodes of this model are enhanced with “static” data on the node, such as software version, log file location and everything else that can be found on the subject. Of course different data will appear in different kinds of nodes (obviously a machine node does not need a software version J). This model is read into the automation engine and represents a basic graph. All the nodes of the model are connected regarding to their interdependencies. The real time event and monitoring information is now connected to the nodes.

On this basic graph that represents the actual IT infrastructure as a model and with all available monitoring and event data the engine is to work on, rules connect to the nodes. These rules can be simple threshold rules or complex constructs built from conditions across many IT components. When such a rule matches an issue object is created. An issue is sort of a “pre incident” that tells us, something may be going slay. An issue object can now travel the graph. This travel is directed by the issues urge to collect new data in order to match an action rule that will allow the issue to perform an action – either to collect more data or to resolve the issue automatically. While travelling the graph the issue collects more and more data from the nodes it visits and relates to other issues supplying access to different branches of the graph or additional data. The travel algorithm focuses on achieving the maximum number of actions available to the issue. Of course issues can be injected into the graph – for example by reporting an incident – as well.

Compared to a top down rules evaluation or aggregation used by so called root cause analysis systems, an issue in our automation engine can circle in an a problem, testing the functionality it is looking for from different angles of the IT infrastructure. Thus finding the spot of the action problem not by drill down, but by a divide and conquer approach, like a good system administrator would do. Also by relating issues problems that are spread across the infrastructure and would not normally be found by system management software or a specialized administrator can be identified as such and then be solved by the same mechanism. The automatic relation of issues when their combined data opens new automation actions to be taken also creates many implicit rules, i.e. someone creating rules does not necessarily have to know all actions that have to be taken throughout the infrastructure but the automation engine will find all the connected actions by itself. A good example is the fact that many dependant systems require restarts after a centralized component or service broker has been changed. The people generating the rules surrounding the change of the central service broker do not know anything about the other components and the people maintaining the depending component do not know anything about the change processes of the central service broker. Not a problem for the graph approach we have chosen, because the issues created by the change at the central service broker meet on the IT interdependency model relate to each other and thus derive combined or correlated actions to be taken without any explicit rule.

I/O scheme of an automation engine – or the Importance of having a correct IT Model

Automation Technology Architect View 2 Comments »

The automation engine is a computer program and as such it follows the simple scheme of “Input - Processing - Output”. The engine takes care of the processing part. So in order to talk about the quality of such a program, we have to examine the I/O scheme of the automation engine.

automation engine IO schemeThere are two lanes of external input into the automation engine. First there is the model of the IT infrastructure and application landscape the automation Engine is to work on. The second input stream is monitoring or event data from all the components of the infrastructure described by the first input stream. The automation engine only has one internal input or configuration stream. This stream defines the rule set of the engine in an appropriate format. This rule set will enable the “black box” to determine which action has to be executed automatically as well as the conditions permitting the execution of a certain action. This configuration stream also includes the actions themselves or links to a repository of actions - e.g. scripts written by administrators.

The automation engine will produce two output streams. One is an external stream documenting the actions taken by the automation engine to a service management or similar system as well as exporting skill management data to interface with manual operations effectively. The main output of the automation engine is a stream of commands to the components described in the IT model previously mentioned.

The internal input and output of the automation engine is the primary processing of the software and therefore part of the implementation of an automation engine. Proper functionality of the engine strongly relies on the quality of the external input streams (IT Model and event or monitoring data). The integration of the automation engine into the manual administration processes or rather the control of the manual processes though the automation engine determines the effectiveness of the operational workforce and the degree of automation that can be reached.

Thus the input data and the integration of the output produced by the automation engine are issues that must not be underestimated. It is not enough to have some sort of CMDB to import components of the IT infrastructure to be operated upon. The input has to include relationship and attribute information that are at the necessary level of detail and the monitoring streams have to be connected to the configuration items and their relations. Otherwise the automation engine will not operate at the desired level of effectiveness or even worse generate false commands to be executed.

I have found that in many cases the configuration of the automation engine - especially hooking it up to the IT infrastructure and the available monitoring environment - is best done manually. Many CMDB implementations can be used to cross check the configuration but I am still looking for a CMDB implementation that will give an automation engine a good IT model and access to the required monitoring and event data.

First look at an automation engine

Automation, Automation Technology Architect View No Comments »

as you - hopefully - have read, automation is not magic, not even black magic. It is the execution of actions based on conditions. As this does not sound all that difficult, what do we need to integrate this concept into everyday IT maintenance life? Simple, we need some sort of machine - an automation engine - that will sit on all the IT components of our environment and execute actions if some conditions we have programmed the machine with become true.

Simple may be a pretty misleading term. The concept of this machine is very simple, but this automation engine has to monitor all data available in our IT environment in order to match any conditions and on the other hand this engine will have to find the right action to execute. The concept behind this approach is simple; the technical problems to be solved in order to make this machine work are numerous and have to be dealt with. Let me take a glimpse at a few of the most immanent ones:

  1. Mass of data to be processed
    As you may remember, we are looking at all the system management, KPI and quality data we can get our hands on for all the IT around us. So there is a lot of data and we have to deal with all of it.
  2. Mass of conditions
    Besides all that data there are a lot of conditions that have to be evaluated upon the available data series. The automation engine is a very elaborate version of a rule engine, because it is dealing with a highly interconnected logic tree (the IT model) and many conditions on a large data space. So typical approaches like decision matrixes do not work for cutting short on rule evaluation.
  3. Unknown rules
    If we wanted to put everything that needs to be executed automatically into an explicit rule, building the rule system would take a lifetime and the problem “mass of conditions” would become ever more influential. Building implicit rules is too complicated for the user. So the automation engine has to adopt a behavior of encircle the problem. This is a divide and conquer approach instead of asking a user to enter every circumstance and every reaction because this kind of “brain dump” is simply not invented yet. I know this is very abstract and I am sure that I will find a little more time soon to elaborate on the way an automation engine has to find the proper actions to take to solve a real life problem.

By the way, most computer systems and approaches in system management software take a simple approach to tackle these challenges. Techniques like root cause analysis or autonomic systems try to move down the dependency tree and find the problem somewhere down there. Why is this approach practical? Well it narrows down the amount of possible data sources and actions that can be taken quickly and in that way a computer system can actually work by out the simple problem resulting. And why are these approaches a short jump? Well, they simply don’t work with complex problems that show symptoms in some remote location of the IT environment or problems that are caused by multiple sources. Most problems in modern IT systems are of the latter kind and therefore the common top down approaches execute quite some actions, but not merely as much as an automation engine should solve. Or would you expect your best system administrator to simply go down the logical tree of connected systems while trying to find out why your ecommerce application is not working? No, not really, because good administrators encircle the cause of a problem and thus exclude great parts of the IT environment throught their experience as possibly causes and then only concentrate on the “relevant” remainders.

Top