Archive for the 'Automation Technology Deep Insight' Category

An Administrator´s First Contact with Automation

Automation, Automation Technology Deep Insight, Social Impact of Automation 2 Comments »

Thomas NeuderthSurfing our intranet I was totally surprised to find one of our administrators – Thomas “thommy” Neuderth – writing about his first contact with automation. I am really happy that one of the best IT experts I had the pleasure of working with has found himself having “no fear of being automated away” and rather interprets automation as a good way to actually live the life of an “IT expert” instead of being an “IT nanny”.

The automation of a simple task like archiving logfiles obviously convinced a “real techie” that there is more than just a little upside to using an automation engine. Of course the implementation of automation actually forced quite a bit of rethinking the common ways of administration and “thommy” describes the skepticism the first contact and the actual adoption of change in a down to earth way. If you are interested, you may read the whole document here.

Taking a Look inside aAE (arago Automation Engine)

Automation, Automation Technology Deep Insight 1 Comment »

Time and again people ask me, what they see, after implementing an automation engine. My answer usually was “well nothing really, you will see that your applications have a better uptime…”, but obviously that is not what people want to hear. The whole idea of an automation engine is, that things happen in the background and no one has to sit in front of some console watching lights turn red.

aAE Visualizer ScreenshotStill people want to see what is going on. And as automation is a matter of trust – the trust of system administrators and managers, that such an engine will improve IT service instead of messing it up – it probably is a good idea to enable a peek under the hood of the machine. Actually as we are using a graph algorithm approach to finding the automatic steps to be taken in order to resolve a problem, it sounds like we should show a graph of the whole thing.

So that is just what we have done. In the screenshot attached you see the prototype of “aAE Visualizer”. This JAVA Application actually displays the IT model and the issues and events travelling the model. On the model Graph it is possible to see where issues are created and how they travel the engine in order to find actions to take. But this visualization application is not just a pretty way to let interested people “look at what is happening” in our automation engine; it also allows to locate hotspots in an automated IT landscapes easily. Hotspots always indicate a challenge. Either a hot spot is an error in the model – a place where problems travel in circles without finding any resolution – or a hot spot is an actual bottle neck in the IT infrastructure that is not visible from a capacity management point of view.

So I am very happy to announce, that this visualization application will not only make my job of explaining how “automation works” easier, but will also allow our administrators to locate model problems or IT landscape problems with much less effort than before.

A CMDB that can deliver the model

Automation, Automation Technology Deep Insight No Comments »

As we are not really consultants for modeling IT infrastructure, we are always looking for a good way to minimize our manual effort when installing our automation engine, I actually thought it should be easy to load the necessary M-A-R-S information out of any CMDB, but so far that has proven much more difficult than expected. Most CMDBs we have looked at, did either not supply the needed relationship and interdependency data or did not contain the static node information we need to bind rules. BUT yesterday we had a workshop at the IBM briefing center in Mainz to take a look at the IBM CCMDB. And tell you what: It looks like we found a CMDB that actually contains all the data we need to load the IT interdependency model. Even if some organizations keep attributes we need for rule binding in excel sheets or other strange data sources we can load them off the IBM CCMDB through its federation technology.

But that is not the whole story. We were quite exuberant about the depth of relationship and interdependencies stored in the CMDB, but it really got amazing when we saw in an actual environment, that most of the interdependencies were detected automatically. Someone at IBM actually did the work of modeling quite some ssh connections and scripts to pull this information out of netstat and other system calls. Well going though firewalls without losing the network angle seems to be a difficulty that means actual real time detection of different zones of trust is not really possible, but what we are getting out here is much better than anything we have seen before. It will save us about 80% time on implementing automation for highly complex applications. Also the time our customers need to maintain the model in place while they change their IT landscape is probably greatly reduced. We will look into creating a persistent interface to the IBM CCMDB and while we are at it to their event bus and execution facilities as well.

You know my comments on other CMDBs and our difficulties of reading anything more than SML out of them. Normally I am quite taken aback and don´t say much, but this time I am really happy.

The arago Automation Engine (aAE)

Automation Technology Deep Insight No Comments »

You must have suspected, that I am not just philosophizing about building an automation engine, but have already done so (or at least I have designed the concepts and we at arago have built it – and you also know the software architect behind the whole scene – Jens “Cy” Bartsch. So after introducing you to the concepts of automation engines and my ideas on the social impact of automation, I would like to give a short overview on the technology we use to actually have an engine that performs the system administration tasks mostly done manually today. We have build an engine that will learn and is instructed by system administrators to increase its operational abilities every day. Actually we have been working on developing this engine since 1995 and are currently at major release 4 of the engine.

As you will understand, I cannot reveal too much technical detail, but I still want to give a short look at the concepts we use. The key input to our engine is an IT infrastructure and application model based on the four layer M—A-R-S approach described earlier. The nodes of this model are enhanced with “static” data on the node, such as software version, log file location and everything else that can be found on the subject. Of course different data will appear in different kinds of nodes (obviously a machine node does not need a software version J). This model is read into the automation engine and represents a basic graph. All the nodes of the model are connected regarding to their interdependencies. The real time event and monitoring information is now connected to the nodes.

On this basic graph that represents the actual IT infrastructure as a model and with all available monitoring and event data the engine is to work on, rules connect to the nodes. These rules can be simple threshold rules or complex constructs built from conditions across many IT components. When such a rule matches an issue object is created. An issue is sort of a “pre incident” that tells us, something may be going slay. An issue object can now travel the graph. This travel is directed by the issues urge to collect new data in order to match an action rule that will allow the issue to perform an action – either to collect more data or to resolve the issue automatically. While travelling the graph the issue collects more and more data from the nodes it visits and relates to other issues supplying access to different branches of the graph or additional data. The travel algorithm focuses on achieving the maximum number of actions available to the issue. Of course issues can be injected into the graph – for example by reporting an incident – as well.

Compared to a top down rules evaluation or aggregation used by so called root cause analysis systems, an issue in our automation engine can circle in an a problem, testing the functionality it is looking for from different angles of the IT infrastructure. Thus finding the spot of the action problem not by drill down, but by a divide and conquer approach, like a good system administrator would do. Also by relating issues problems that are spread across the infrastructure and would not normally be found by system management software or a specialized administrator can be identified as such and then be solved by the same mechanism. The automatic relation of issues when their combined data opens new automation actions to be taken also creates many implicit rules, i.e. someone creating rules does not necessarily have to know all actions that have to be taken throughout the infrastructure but the automation engine will find all the connected actions by itself. A good example is the fact that many dependant systems require restarts after a centralized component or service broker has been changed. The people generating the rules surrounding the change of the central service broker do not know anything about the other components and the people maintaining the depending component do not know anything about the change processes of the central service broker. Not a problem for the graph approach we have chosen, because the issues created by the change at the central service broker meet on the IT interdependency model relate to each other and thus derive combined or correlated actions to be taken without any explicit rule.

Top