Archive for the 'Automation' Category

Taking a Look inside aAE (arago Automation Engine)

Automation, Automation Technology Deep Insight 1 Comment »

Time and again people ask me, what they see, after implementing an automation engine. My answer usually was “well nothing really, you will see that your applications have a better uptime…”, but obviously that is not what people want to hear. The whole idea of an automation engine is, that things happen in the background and no one has to sit in front of some console watching lights turn red.

aAE Visualizer ScreenshotStill people want to see what is going on. And as automation is a matter of trust – the trust of system administrators and managers, that such an engine will improve IT service instead of messing it up – it probably is a good idea to enable a peek under the hood of the machine. Actually as we are using a graph algorithm approach to finding the automatic steps to be taken in order to resolve a problem, it sounds like we should show a graph of the whole thing.

So that is just what we have done. In the screenshot attached you see the prototype of “aAE Visualizer”. This JAVA Application actually displays the IT model and the issues and events travelling the model. On the model Graph it is possible to see where issues are created and how they travel the engine in order to find actions to take. But this visualization application is not just a pretty way to let interested people “look at what is happening” in our automation engine; it also allows to locate hotspots in an automated IT landscapes easily. Hotspots always indicate a challenge. Either a hot spot is an error in the model – a place where problems travel in circles without finding any resolution – or a hot spot is an actual bottle neck in the IT infrastructure that is not visible from a capacity management point of view.

So I am very happy to announce, that this visualization application will not only make my job of explaining how “automation works” easier, but will also allow our administrators to locate model problems or IT landscape problems with much less effort than before.

Virtual Datacenter insights

Automation No Comments »

Ok, I may not have been too enthusiastic in writing about our Pulse visit up until now, but as we just enjoyed a really great session about virtualized datacenters I thought this would be a good reason to start doing so. Ok, it’s quite a different kind of automation compared to what we deal with usually, but it was quite impressive to see a reporting service being installed “on demand” on a virtualization cluster during the demo part of the session by means of just a few clicks in Tivoli Provisioning Manager. Ok, so you may wonder what is so exciting about the automated installation of a virtual system? Well, aside from the fact that you felt that the speaker (Vanja Gorazi) was really deep into the technical stuff behind it, it was probably that the way this happened was simply done right. Instead of specifying you need an xOS with ySQL to support the zReporting system, you simply choose the type of service needed and do not need to worry about all packaging and stuff behind the scenes. Oh, and did I mention that automation didn’t stop with the installation but continued with automatic addition of required or removal of unused resources by means of a component named Intelligent Orchestrator? Of course, the rules engine behind it sounded quite basic, so I guess there may will be some more interesting perspectives around here… guess I’ll need some time to think about that when this is over…

Cloud Computing needs automation

Automation, Automation Technology Architect View, Business Impact of Automation 2 Comments »

Yesterday I had the chance to get a feeling for one of the hottest topics in IT infrastructure. A panel session at IBM PULSE 2008 was dedicated to the topic of Cloud Computing (even though IBM marketing people don´t seem to like the term and have come up with quite some innovative words – words no one uses, so let us stick with the cloud). The panel was buzzing with intelligence, unfortunately we as the audience could not really match up. So we listened to a pretty much directed discussion on how cloud computing would replace today´s approach to hardware and infrastructure in general. Well I do agree, no one needs dedicated servers when resources can be allocated dynamically and come preconfigured and interconnected. Kristin Hansen stripped the key features of a cloud down to simplicity (users do not care how their resources are set up, they just use them), mobility (obviously use is possible from anywhere and even a large computing cluster could be controlled from a phone like device) and elasticity (you only setup or pay what you really need). Sounds fine to everyone and Google and Amazon have definitely shown to the world that this concept works in a closed shop environment. According to Dave Lindquist IBM is working on a methodology and technology to make most applications “cloudable”. The most interesting remark I heard during the discussion was the “Cloud Computing is the combination of technology (virtualization and automation) and discipline (a stringent way of breaking down the offered services into small blocks in order to recombine them quickly and automatically upon the user´s request as well as defining standards or service catalogues to be offered)”. I guess the discipline part will put forth a great deal of discussions between process consultants and methodology consultants and in the end there will certainly be a couple of good ways to set things up. Just as certainly there will be the need to standardize these processes and methodologies in the end, so clouds are not proprietary but keep mobile even between cloud providers.

Naturally I am more interested in the technology part, that is needed behind cloud computing. Technology - in this case - not referring to the cloud management servers and agents themselves, but the technology surrounding them. The first technology that comes to mind is virtualization as without this core there will be no cloud, at least no cloud that can integrate legacy applications rather than working in a very tightly closed universe like Google does. There are quite some good approaches to virtualization – commercially as well as open source – and the approach taken should really depend on the needs of the applications to be run on a specific part of the cloud. It does probably make sense to even merge the available virtualization technologies within one cloud. It might make sense to use containers build into the operating system or complete hardware virtualization depending on the kind of application to be run and therefore a cloud manager will have to deal with all kinds of virtualization technology.

More on my focus is the service management side of cloud computing and I strongly believe that automated operating is a key component of a good cloud infrastructure. Definitely the cloud infrastructure and management components will take care auf automatic provisioning and resource management, but as soon as legacy applications – that do not really know that they are running on a beautifully scalable environment – are involved manual administration of these applications would mean chasing an ever changing rabbit across a chameleon planet – an image most amusing to bystanders but neither funny to administrators nor to the ones paying them. So in my opinion an automation engine could be fed IT model data and monitoring feeds directly from the cloud manager and could thus deal with the ever changing environment and keep the application automation rules up to date with the cloud components currently in use. This automation engine cannot use a drill down approach, because the infrastructure might not even support drill downs and can change ever so often. The automation engine assuring a good foundation for quality service a professional service management will have to use a more human “circle in” or divide and conquer approach.

Does this sound familiar? By the way, check out the articles on the “Blue Cloud”; technical pioneers at work (other bloggers also think about the blue cloud)…. Also interesting is the cooperation between Google and IBM on producing cloud standards

A CMDB that can deliver the model

Automation, Automation Technology Deep Insight No Comments »

As we are not really consultants for modeling IT infrastructure, we are always looking for a good way to minimize our manual effort when installing our automation engine, I actually thought it should be easy to load the necessary M-A-R-S information out of any CMDB, but so far that has proven much more difficult than expected. Most CMDBs we have looked at, did either not supply the needed relationship and interdependency data or did not contain the static node information we need to bind rules. BUT yesterday we had a workshop at the IBM briefing center in Mainz to take a look at the IBM CCMDB. And tell you what: It looks like we found a CMDB that actually contains all the data we need to load the IT interdependency model. Even if some organizations keep attributes we need for rule binding in excel sheets or other strange data sources we can load them off the IBM CCMDB through its federation technology.

But that is not the whole story. We were quite exuberant about the depth of relationship and interdependencies stored in the CMDB, but it really got amazing when we saw in an actual environment, that most of the interdependencies were detected automatically. Someone at IBM actually did the work of modeling quite some ssh connections and scripts to pull this information out of netstat and other system calls. Well going though firewalls without losing the network angle seems to be a difficulty that means actual real time detection of different zones of trust is not really possible, but what we are getting out here is much better than anything we have seen before. It will save us about 80% time on implementing automation for highly complex applications. Also the time our customers need to maintain the model in place while they change their IT landscape is probably greatly reduced. We will look into creating a persistent interface to the IBM CCMDB and while we are at it to their event bus and execution facilities as well.

You know my comments on other CMDBs and our difficulties of reading anything more than SML out of them. Normally I am quite taken aback and don´t say much, but this time I am really happy.

The arago Automation Engine (aAE)

Automation Technology Deep Insight No Comments »

You must have suspected, that I am not just philosophizing about building an automation engine, but have already done so (or at least I have designed the concepts and we at arago have built it – and you also know the software architect behind the whole scene – Jens “Cy” Bartsch. So after introducing you to the concepts of automation engines and my ideas on the social impact of automation, I would like to give a short overview on the technology we use to actually have an engine that performs the system administration tasks mostly done manually today. We have build an engine that will learn and is instructed by system administrators to increase its operational abilities every day. Actually we have been working on developing this engine since 1995 and are currently at major release 4 of the engine.

As you will understand, I cannot reveal too much technical detail, but I still want to give a short look at the concepts we use. The key input to our engine is an IT infrastructure and application model based on the four layer M—A-R-S approach described earlier. The nodes of this model are enhanced with “static” data on the node, such as software version, log file location and everything else that can be found on the subject. Of course different data will appear in different kinds of nodes (obviously a machine node does not need a software version J). This model is read into the automation engine and represents a basic graph. All the nodes of the model are connected regarding to their interdependencies. The real time event and monitoring information is now connected to the nodes.

On this basic graph that represents the actual IT infrastructure as a model and with all available monitoring and event data the engine is to work on, rules connect to the nodes. These rules can be simple threshold rules or complex constructs built from conditions across many IT components. When such a rule matches an issue object is created. An issue is sort of a “pre incident” that tells us, something may be going slay. An issue object can now travel the graph. This travel is directed by the issues urge to collect new data in order to match an action rule that will allow the issue to perform an action – either to collect more data or to resolve the issue automatically. While travelling the graph the issue collects more and more data from the nodes it visits and relates to other issues supplying access to different branches of the graph or additional data. The travel algorithm focuses on achieving the maximum number of actions available to the issue. Of course issues can be injected into the graph – for example by reporting an incident – as well.

Compared to a top down rules evaluation or aggregation used by so called root cause analysis systems, an issue in our automation engine can circle in an a problem, testing the functionality it is looking for from different angles of the IT infrastructure. Thus finding the spot of the action problem not by drill down, but by a divide and conquer approach, like a good system administrator would do. Also by relating issues problems that are spread across the infrastructure and would not normally be found by system management software or a specialized administrator can be identified as such and then be solved by the same mechanism. The automatic relation of issues when their combined data opens new automation actions to be taken also creates many implicit rules, i.e. someone creating rules does not necessarily have to know all actions that have to be taken throughout the infrastructure but the automation engine will find all the connected actions by itself. A good example is the fact that many dependant systems require restarts after a centralized component or service broker has been changed. The people generating the rules surrounding the change of the central service broker do not know anything about the other components and the people maintaining the depending component do not know anything about the change processes of the central service broker. Not a problem for the graph approach we have chosen, because the issues created by the change at the central service broker meet on the IT interdependency model relate to each other and thus derive combined or correlated actions to be taken without any explicit rule.

I/O scheme of an automation engine – or the Importance of having a correct IT Model

Automation Technology Architect View 2 Comments »

The automation engine is a computer program and as such it follows the simple scheme of “Input - Processing - Output”. The engine takes care of the processing part. So in order to talk about the quality of such a program, we have to examine the I/O scheme of the automation engine.

automation engine IO schemeThere are two lanes of external input into the automation engine. First there is the model of the IT infrastructure and application landscape the automation Engine is to work on. The second input stream is monitoring or event data from all the components of the infrastructure described by the first input stream. The automation engine only has one internal input or configuration stream. This stream defines the rule set of the engine in an appropriate format. This rule set will enable the “black box” to determine which action has to be executed automatically as well as the conditions permitting the execution of a certain action. This configuration stream also includes the actions themselves or links to a repository of actions - e.g. scripts written by administrators.

The automation engine will produce two output streams. One is an external stream documenting the actions taken by the automation engine to a service management or similar system as well as exporting skill management data to interface with manual operations effectively. The main output of the automation engine is a stream of commands to the components described in the IT model previously mentioned.

The internal input and output of the automation engine is the primary processing of the software and therefore part of the implementation of an automation engine. Proper functionality of the engine strongly relies on the quality of the external input streams (IT Model and event or monitoring data). The integration of the automation engine into the manual administration processes or rather the control of the manual processes though the automation engine determines the effectiveness of the operational workforce and the degree of automation that can be reached.

Thus the input data and the integration of the output produced by the automation engine are issues that must not be underestimated. It is not enough to have some sort of CMDB to import components of the IT infrastructure to be operated upon. The input has to include relationship and attribute information that are at the necessary level of detail and the monitoring streams have to be connected to the configuration items and their relations. Otherwise the automation engine will not operate at the desired level of effectiveness or even worse generate false commands to be executed.

I have found that in many cases the configuration of the automation engine - especially hooking it up to the IT infrastructure and the available monitoring environment - is best done manually. Many CMDB implementations can be used to cross check the configuration but I am still looking for a CMDB implementation that will give an automation engine a good IT model and access to the required monitoring and event data.

First look at an automation engine

Automation, Automation Technology Architect View No Comments »

as you - hopefully - have read, automation is not magic, not even black magic. It is the execution of actions based on conditions. As this does not sound all that difficult, what do we need to integrate this concept into everyday IT maintenance life? Simple, we need some sort of machine - an automation engine - that will sit on all the IT components of our environment and execute actions if some conditions we have programmed the machine with become true.

Simple may be a pretty misleading term. The concept of this machine is very simple, but this automation engine has to monitor all data available in our IT environment in order to match any conditions and on the other hand this engine will have to find the right action to execute. The concept behind this approach is simple; the technical problems to be solved in order to make this machine work are numerous and have to be dealt with. Let me take a glimpse at a few of the most immanent ones:

  1. Mass of data to be processed
    As you may remember, we are looking at all the system management, KPI and quality data we can get our hands on for all the IT around us. So there is a lot of data and we have to deal with all of it.
  2. Mass of conditions
    Besides all that data there are a lot of conditions that have to be evaluated upon the available data series. The automation engine is a very elaborate version of a rule engine, because it is dealing with a highly interconnected logic tree (the IT model) and many conditions on a large data space. So typical approaches like decision matrixes do not work for cutting short on rule evaluation.
  3. Unknown rules
    If we wanted to put everything that needs to be executed automatically into an explicit rule, building the rule system would take a lifetime and the problem “mass of conditions” would become ever more influential. Building implicit rules is too complicated for the user. So the automation engine has to adopt a behavior of encircle the problem. This is a divide and conquer approach instead of asking a user to enter every circumstance and every reaction because this kind of “brain dump” is simply not invented yet. I know this is very abstract and I am sure that I will find a little more time soon to elaborate on the way an automation engine has to find the proper actions to take to solve a real life problem.

By the way, most computer systems and approaches in system management software take a simple approach to tackle these challenges. Techniques like root cause analysis or autonomic systems try to move down the dependency tree and find the problem somewhere down there. Why is this approach practical? Well it narrows down the amount of possible data sources and actions that can be taken quickly and in that way a computer system can actually work by out the simple problem resulting. And why are these approaches a short jump? Well, they simply don’t work with complex problems that show symptoms in some remote location of the IT environment or problems that are caused by multiple sources. Most problems in modern IT systems are of the latter kind and therefore the common top down approaches execute quite some actions, but not merely as much as an automation engine should solve. Or would you expect your best system administrator to simply go down the logical tree of connected systems while trying to find out why your ecommerce application is not working? No, not really, because good administrators encircle the cause of a problem and thus exclude great parts of the IT environment throught their experience as possibly causes and then only concentrate on the “relevant” remainders.

The simple concepts behind automation

Automation, Automation Technology Architect View No Comments »

I should be the last person, to say that automation is a simple concept - I make my money on automation. And often people really think that “a technical concept behind automation” is the hard part - well let me tell you: it is not. The technical concept of automation is something deeply embedded in the binary way our It works. The technical principle behind automation simply is

IF (a complex condition) IS TRUE THEN (do something)

So does that sound familiar? And we have not even introduced the concept of ELSE, ELIF or CASE yet J. Well we can put this into a little more technical terms by saying:

Automation is the condition based execution of actions to ensure the quality of service of an IT environment. Where conditions can be combined from expressions covering all aspects of the IT environment in question and actions can be one or a serious of command execution in one or many locations of the IT environment regarded.

So to go back to the divide and conquer there that was so useful in solving many IT problems we have to ask ourselves three questions:

  1. What is the IT environment and what are the interdependencies within this environment?
  2. What are the expressions “a complex condition” is composed of and what is the data evaluated in these expressions?
  3. What are the actions to be taken and where are they to be executed?

So let us try to answer the three questions. First the IT environment and its interdependencies can be modeled. The entities the environment s composed of all “configuration items” that are part of the environment in questions. The interdependencies are relations between these entities. The “detail questions” to be solved are: At what level of detail do we model, and what kind of relationship model will we use? Well answering those will take us into a specific implementation of IT automation, and we are right now looking at the concept behind all these implementations, so let us stay at this level of abstraction.

Second the expressions evaluated in order to know which actions to execute are embedded in the knowledge of the administrators doing exactly this job today. So the expressions and conditions could be classified as the knowledge database put into machine readable for. The data needed to evaluate the expressions - after we know what they are - is all data available on the IT environment we are looking at. This includes technical monitoring data, end-to-end monitoring data, data processing information, transaction monitoring but it also includes quality of service information, KPIs, SLA information and business impact data - basically anything we can get hold of.

And third actions are the things administrators and gurus enter via keyboard, mouse or telepathic network link in order to make the “bad condition” go away. An action can be a simple command on one system but it can also be a series of commands (maybe with conditional execution) or even scripts of commands distributed to many systems. So an action can be as simple as /etc/init.d/apache restart or it can be something as complex as a 10.000 line program, some SQL scripts and a shell script executed on a dozen machines. But in the end these actions are put together today - as scripts and How-to in the system administrators’ dens of the world.

So you see. Automation is something simple: We should know about our IT environment and the interdependencies of its entities anyway. We know about the conditions - or at least we can find out - and then execute actions (some of which we have already put into scripts or programs to make life easier). So automation is just a centralization and connection of things we are already doing.

Is automation black magic?

Automation, Social Impact of Automation 1 Comment »

Often automating IT is handled as an obscure Art. Maybe some regard it as the black magic of the 21st century. When I don’t understand things, I tend to divide and then conquer them, so in this case why black and why magic? Maybe black, because automation is regarded as something evil by quite a few IT people. Good techies could lose their jobs or at least their “God” status, when automation actually works. And maybe magic, because automation is clear to us viewed on a single system - i.e. things you didn’t want to do manually are put into a script and voila the system does them automatically - but in a large IT environment, all of a sudden things seem to happen by themselves.

But let me tell you, IT automation is neither black, nor magic. It is not magic, because after all it can be broken down just to that simple script example above. So if you divide the automation of a large IT environment you will in the end arrive at one - or maybe more - scripts being executed under certain conditions. So the question - I guess we will be talking about that in a little while - is which script or scripts to execute under what condition. And automation is not black, because “people losing their jobs or their current status” is nothing evil but the way our world works. Change is the driving force of everything and anybody trying to position himself against the power of change will definitely loose in the long term. So I would recommend embracing the ideas of automation rather than putting it down there with devils and demons - and by the way we do have enough of the latter around in IT anyway.

Introducing the Social Challenge to the IT Crowd

Automation, Social Impact of Automation 2 Comments »

Automation in IT is about the last part on the IT landscape not under constant renewal - and this will come to an end soon.

It is about time technology becomes part of the change process…

IT has gained tremendous influence in everyday life and “we” - the IT crowd - are proud of “our” technology changing the world. But to put it out bluntly, IT itself has not changed that much and it is about time to pick up the pace.

Sure marketing has improved very much but most good concepts in IT have a very long life (e.g. virtual machines). On the other hand many IT trends are focused on an oscillating movement between polar architectural concepts e.g. host vs. PC vs. Client Server vs. Web vs. Web Services, vs. Web 2.0 and so on. Still the way the technical community handles IT did not change much. You may say that there have been a lot of changes all the while and that is certainly true. But these changes were mostly focused on development and creating cool interfaces. How about the concepts, techniques and social patterns applying to the operational environment? How about those of us who keep “IT” running? Not much change detected here….

Either we change or we are changed (away?)…

Do you know anyone (not from an IT profession) who is really happy with the way the IT around him works? Do you know people who believe their system administrator to be the greatest guy since Frank Sinatra? No, or why else would we need “Sysadmin appreciation day”? It seems that the people in and around IT operation are simply detached from the rest of the world. As IT becomes an integral part of “the rest of the world” just this world will not accept to depend on totally alienated concepts and people.

In IT it is still en vogue to be “god” of the system. Would anyone in any other industry accept the fact that the guy who runs the assembly line claims to be “god” of the place? No way and if there were such a guy - well in a positive environment he would be set up for counseling and in a bald world he would simply be sacked. So let´s face it, IT has worked its way into everyday life and cannot be alien anymore. Change is in the air. And if user complaints cannot do the job, controllers certainly will. Cost cutting initiatives have reduced IT operating budgets drastically. Still, just maintaining the status quo swallows up roughly 70% of all available IT budgets.

On the other hand, the guys I am just writing about are being bored to tears with everyday work and keep their sanity by building fancy tools just for themselves. Still for that rare occasion, when sh*t really hits the fan we are all happy these people are around, people who really know their way and understand the system. What an incredible squandering of talent, creativity and know how….

… a glimpse at the future

So finally we have arrived at a point where the way things are handled will be changed. IT operation no longer is a question of finding a cool new tool to sit in front of, but it is a question of having IT maintenance controlled by tools and processes - just like the work of so many other productive and creative people is controlled by machine driven processes.

In all other industries this is called automation and it is catching up with us. So if we want to be the gurus, we have to be part of the change process, we have to drive it and we have to find the technologies to enable the automation of IT operation and maintenance - after we are content with the fact that change is a good (and unavoidable) development.

Top