Archive for the 'Automation' Category

Automation what?

Automation, Market 1 Comment »

After blogging some months about automation, I thought it might be a good idea to talk about the definition of automation. Nearly everyone seems to have an “Automation solution” in place. So what is that Automation-hype all about?

The word Automation is derived from the ancient Greek language and means that something is operating or moving self dictated, which gives quite a good idea about what we are looking at.

Automation might have started with the invention of the wheel decades ago, is omnipresent in many branches and industries and a substancial factor for producing any kind of goods and services today.  Robots and Automated Manufacturing systems and During my journey through the world of IT Service Management, I encountered various kinds of Automation. From my point of view most vendors will agree to the following categorization:

Automated IT-Service Management / ITSM Process Automation

This is a umbrella term for solutions focused on supporting Service Management workflow, usually based on best practices and standards like ITIL or Cobit. Subordinated terms are Support Automation or Run-Book-Automation.

Support Automation

Support Automation refers to software packages are focused on supporting the routine work of help desk personnel. Think of it as a kind of script integration in existing Service desk, CRM application or even in Knowledge Base Applications for Automated Self Service. Examples for this category are products like CA SupportBridge or mValent Integrity, which is focused on Change Management Automation.

Run-Book-Automation

Products belonging to this category are very popular nowadays. They allow to define a set of ITSM-Workflows through a Graphical user interface. Good products offer a multitude of connectors and interfaces to existing ITSM suites like OpenView, Tivoli or Unicenter. Examples for this kind of products are Opalis Integration Server, BMC Realops or HP/Opsware/IConclude Opsforce.

IT-Workload Automation

These concepts stem from early (mainframe) days of computing, where batch processing or job Scheduling were a big improvement, allowing operator to “automate” recurring tasks. Though modern products are highly evolutionized through offering multi platform compatibility, event-triggering, policy-based execution and configured to smart coloured visual GUIs. These products are gaining ground in modern service oriented environments and are represented through products from big vendors like CA/Cybermation and IBM Tivoli or smaller competitors like ASG and UC4

Data Center Automation

This is the hottest topic today, as companies have started to deploy myriad of servers into an extremely fast growing number of data centers all over the world, bringing high demand for automated tools to provision, change and manage vast numbers of components. Any of the large vendor offers such a tool or suite and - you guessed it - here is place, where the bucks go. HP know that story. Products in this category are former Opsware Server Automation System, BMC BladeLogic, IBM Tivoli Provisioning Manager and to bring in some cloudy haze modern and cool products/players like Elastra or 3Tera/Applogic which allow to mix data center and cloud offerings.

Roland

IT Automation – All the Things We Are Talking About

Automation, Automation Technology Architect View, Business Impact of Automation 1 Comment »

Reading and writing about IT automation, I keep on learning about the subject. Lately I found that there are so many flavors of automation around the operating processes of IT, that misunderstanding seems inevitable. So I try to make a point here to talk about the different kinds of automation one can use all around maintaining a high quality IT environment.

Types of Automation Tasks

  1. Incident-, Problem-, Capacity- and Availability Management
    Automation engines specialized on analyzing and handling events that occur in a IT environment that may lead to or themselves represent malfunctions, loss of quality and the like. Both reactive (automated reaction to an incoming event) and proactive (automated actions taken to prevent events from occurring) are target of these engines. Automation engines that handle the “fault operating” are either embedded into the ITIL processes (see blog entry on extending ITIL with automation) like our automation engine (aAe) or are embedded into system components or management systems with a narrow scope e.g. on redundancy activation.
  2. Change Management
    Automation engines specialized on performing changes that modify or extend an IT environment automatically. Either these engines are Inserting an abstracted layer above tasks that need to be performed (like adding users, restarting a component and the like) these engines allow an administrator to perform tasks on many machines or on different platforms without by interacting with the automation engine. An example for this kind of engine is the Puppet framework with a very structured approach to abstraction. Or these engines focus on scaling an IT environment by dynamically adding resources or automatically installing or modifying a system like the Tivoli Provisioning Manager or VMWare Virtual Center does.

I really do hope (not just to save you some consulting fees) to have helped avoid misunderstandings, when you are talking to others about automation and even better maybe I could point out some additional techniques you can look at to make life easier.

An Administrator´s First Contact with Automation

Automation, Automation Technology Deep Insight, Social Impact of Automation 2 Comments »

Thomas NeuderthSurfing our intranet I was totally surprised to find one of our administrators – Thomas “thommy” Neuderth - writing about his first contact with automation. I am really happy that one of the best IT experts I had the pleasure of working with has found himself having “no fear of being automated away” and rather interprets automation as a good way to actually live the life of an “IT expert” instead of being an “IT nanny”.

The automation of a simple task like archiving logfiles obviously convinced a “real techie” that there is more than just a little upside to using an automation engine. Of course the implementation of automation actually forced quite a bit of rethinking the common ways of administration and “thommy” describes the skepticism the first contact and the actual adoption of change in a down to earth way. If you are interested, you may read the whole document here.

Implementing Automation – the Inevitable Step after Implementing ITIL Processes

Automation, Automation Technology Architect View, Business Impact of Automation 2 Comments »

Some time ago I published an article on the future of IT operation after we are through with all the ITIL implementations (still) taking place. Assuming that all the nice failure handling, proactive failure avoiding and communication processes like Incident, Problem, Capacity and Availability Management are in place, implementing automation is the logical way to move ahead. Compared to implementing ITIL automation actually changes the things that are done and the way they are done. As you may have guessed this statement alone was fertile ground for interesting and heated debating.

Generally the article concluded that implementing the ITIL processes concentrates on the interfaces between IT experts, clients, business requirements and the like where automation concentrates on the way IT operation is actually “produced” (in an industrial meaning of the word). Even though these two may be viewed separately the article shows how an automation environment highly depends on monitoring and IT component data. An ITIL environment puts forth a valid definition of both data sources for a complete IT environment and is therefore a good foundation to start implementing automation.

Automation integrated into ITIL

An IT operations environment with implemented ITIL processes also has common interfaces to the acting staff members. This makes it very easy to “inject” a new entity - like an automation engine - into the whole system. In such an approach the automation engine wraps itself around the data sources of CMDB and monitoring systems. All communication that would today be directed towards human recipients is handled by the automation engine first. Only if the automation engine is not able to complete the task the IT experts are involved.

This short description reveals how well an ITIL implementation prepares an IT organization for implementing automation.  It also shows how automation is made completely transparent to the business using the IT - as the automation engine acts like any human entity taking part in the ITIL processes.

The article itself gives a short overview of the “operational” ITIL processes and how their implementation builds the foundation for automation. If you are interested you may read the whole text here.

A Simplistic Approach to IT Dependency Modeling: M—A-R-S

Automation, Automation Technology Architect View 3 Comments »

You have seen a number of abstract articles talking about the “interdependency model” of an IT environment that is necessary to actually automate across operational silos on this blog. Building this model actually is the main challenge in implementing automation.

I have seen customer situations, where building the dependency model was such an extensive effort, that the focus for the goal of implementing automation was completely lost. An interdependency model is supposed to answer the question “how does one entity in a IT environment depend upon or influence other objects” or “Why the @)!(/Q$§ doesn´t it work anymore after someone I don´t even know changed something I don´t even care about?”.

Though simple answering these questions without having a good CMDB in place and being able to query that CMDB on an expert level, leaves a long trail across an IT organization. Since these key questions are asked for every failure and should be asked for every change an interdependency model is obviously something one REALLY wants to have.

As this model is also one of the key input streams to an automation engine that actually operates IT across silo and competency boundaries we have put quite some thought into the art of modeling. Typical techies we are, our first approach was to build THE BEST of all possible models. We started to build our methodology on the basis of economic dependency models. Well, what can I say… We got a great model and maintaining it just for fun would have cost us an arm and a leg plus maintenance would never have been possible from our technical teams.

So we went back to the think tank with the preliminary assumption, that we would be willing to simplify the model - if we could produce one that could and would be maintained by the technical staff themselves (acceptance is an important factor).

We arrived at something we call the arago M-A-R-S model.

M-A-R-S Model Description

The “M” is for “Machine”

A machine (real or virtual, cloud compartment or actual operating system) is still the basic component of any IT application. Machines can be servers, network components and anything else. Administrators tasked with keeping the infrastructure alive do normally not know much about the business applications running on their “machines” but they know their machines on a first name basis. So a machine is a basic building block of IT infrastructure as well as a component very close to the technical staff. Thus the entity of a machine fulfills both our pre requirements (simple to understand and maintainable by the IT staff)

The “A” is for “Application”

An application is something that is used in a business process. Or technically speaking an application is the “thing” a “user” complains about when interfacing (talking to) the IT department. An application is therefore the basic building block from “the other side”. Where a machine is the basic building block of the technical view of IT, applications are the basic building blocks of the business view of IT.
Naturally an application uses machines or much better “-S-ervices” offered by these.

The “R” is for “Resource”

As you may imagine building up a dependency model by listing all the services offered by a number of machines and then listing all the services used by an application may leave you with very long lists. So we decided to introduce one layer of abstraction into our approach to dependency modeling. This is called the “resource” layer. A resource combines a number of services with a 100% dependency (e.g. an SAP service will never run without a database service, thus there would be an SAP resource combining the two services into one entity and thereby reducing the complexity of the dependency tree of an application).

The “S” is for “Service”

Service defines some functionality that is offered by a machine or a cluster of machines (for redundancy and availability reasons). A service in this case is an IT term that describes a simple building block of software running on a machine. A service can be anything ranging from an operating system or a network connection though a simple application such as a web server or database all the way up to an SAP system or an individually programmed piece of software. A service is something often talked about in the IT organization and usually something that has developers or vendors attached to it.

As simple as this may sound: this four-layer model allows for a real connection of all silos within technology and organization. This model can be maintained manually or - better - can be imported from a good CMDB.

Combining monitoring information with this model (i.e. hooking up monitoring and master data with the nodes of the model) is the basic input for an automated environment (also see input stream and naming convention articles on this blog). You can read the full article on “Measurement as the Prerequisite for Automation” here.

I am sure you will find other good applications for a simple model - or even better maybe you do have some suggestions on how to improve and/or further simplify this model.

Input to an Automation Engine – Namespace as a Starting Point

Automation, Automation Technology Architect View No Comments »

I have been talking quite a bit about the technology that drives an automation engine. Actually there could be many approaches for the technology that evaluates conditions and chooses the right actions to execute. Our technology takes a “divide and conquer approach” in a very distributed system and therefore simulates the behavior of a good human administrator. Other technologies take a “drill-down” or “boil-up” approach. All the technologies produce automation results and normally they are used for special tasks. E.g. a drill-down approach is focused on a straight forward root cause analysis approach.

Apart from all these technologies and very important backend decision the question of what goes into an automation engine is paramount to the actual results of automation. I have written a blog entry on the basic IO model of an automation engine emphasizing this point.  As you may remember I proposed two different streams of preliminary input data to the automation engine. First there is the model data that build up the space automation is to take place within and second there is the monitoring data representing the actual condition of each node in the model. These two basic data streams are evaluated by the rules engine.

The data streams have to fulfill certain pre conditions in order to produce proper automation output.  I will talk about the attributes of the model data stream in this article in more detail. The monitoring data stream holds either event driven or time series data. Finding a way to normalize this stream so a rule can evaluate monitoring data at any given point in time will be the contents of an entry here soon.

The IT model is described as a representation of the interdependencies between IT entities or in a ITIL way of speaking between configuration items. There are a lot approaches towards building such a model. Depending on the approach the model has a different number of layers and dimensions as well as different kinds of relations between its nodes. Just like an up to date model is key to automation as well as to orderly IT processes, the complexity and accuracy of the model will have to compromise with its maintainability. Many vendors are trying to reduce this need to compromise by building auto discovery solutions such as IBM’s TADDM. Still the complexity of the model is proportional to the user acceptance of every process and technology based upon this model.

Behind the model of interdependencies are the nodes that are interdependent. And these nodes have to describe IT entities using meta-information. This meta information is put down into attributes and these attributes can either save the world or be the cause of all evil.

Therefore building the actual values for the attributes should be worth some thought. Surely there are simple attributes like HOSTNAME or the like and we do not have to think much about it. But yet a simple attribute such as OS can be a bigger challenge than would be expected. When you simply assign “Windows” or “Linux” to OS, then you will only be able to match this exact system when building conditions for automation. When you assign something like “server.windows.2003″ where the first part describes the OS usage, the second the OS family and the third the actual system you can match other windows servers by building a condition like “server\.windows \..*” or you will be able to select all  Linux systems (regardless of desktop or server) by building a condition like “.*\.linux\..*”.

Maybe this little example shows the power of building up proper systems for name spacing. So what kind of system is appropriate for automation? A simple “name” solution (like the first example) is not good for anything but a quick and dirty test of an automation engine algorithm. The second approach shown above (a tree like structure) is very powerful and very close to XML (which most people use to declare structured data these days). These tree like structures are good for expression matching and therefore good as an input stream to automation engines. When using these structures you have to build up a clear understanding of the trees to use first. As you can see from many discussions (one of the most competent between Van Wiles and William Vambenepe) the problem of agreeing to applicable and technically usable naming conventions is still up in the air even though it can be one of the major causes for CMDB projects failing and definitely has major impact on any automation engine. Each vendor has their own naming conventions and definitions eloquently elaborated, but unfortunately no one has looked upon the problem from higher ground. The closest I have found so far is a chapter in the book Implementing ITIL Configuration Management by Larry Klosterboer.

I have had many encounters with strange approaches towards the issue of naming conventions and namespace  and therefore made sure that our algorithms can work with any kind of namespace (with varying degrees of performance). If you want to “do it right” I would strongly suggest to stick to the following principles when building up your personal CMDB or model of interdependencies:

  1. If you are willing to attach yourself to a vendor (not just for the CMDB, but most products delivering towards the ITIL processes), stick with the naming conventions of this vendor. The guys usually have put some thought into them. If this is not possible for you (either because you strategically have to place large vendors against each other, because you like your software zoo or just because…) completely build up your own space.
  2. Use a treelike structure for everything and make this tree structure fixed. Meaning that each depth level in the tree always correlates to the same sub attribute. This may mean that you will have to “fill” some levels in the tree for some nodes (like “windows.windows.2003″). This will save you from extensive misinterpretation by people who do not use your namespace everyday.
  3. Do not include versions into the tree-structured attributes. Versions are a secondary decision criteria and are used AFTER you know what you are dealing with. Not just our automation engine does use different parts of a rule but still the same rule for different versions of the same environment, many other tools do - therefore performance increases when you keep versions separately.
  4. Do not “outperform” yourself when building or using naming conventions. In any case (using a vendor´s approach (who has to be very flexible) or your own (you may want to do it scientifically tight)) only fill in or use the attributes and sub attributes that make sense for the task at hand. If you stick to the proper structure you can always enter additional data later on (as you need it). Data in place has to be of some use, as it just by being there creates costs).

Just by sticking to these (1. and 4. being the most important to bracket things up) you can make sure that your IT model is easily understood, has low maintenance cost and can be used for something innovative like automation right away.

Can Automation be Trusted - Or How to Build Trust on Laziness

Automation, Social Impact of Automation 4 Comments »

Well, what a very basic question… Should we be discussing automation engines, when we should not have trust in them automatically taking action? Surely not, and obviously we are discussing automation engines.

So why do I hear so much about the lack of trust towards automated actions? It may be a stunning change in the field of system administration, that some entity takes automatic action where normally a system administrator would have typed in a couple of commands up to now. And change always induces fear and prejudice. Questions like “do you really trust the engine to restart this business critical service?” are not really uncommon. Well why should the machine not do that? After all the only action a system administrator would have taken is to restart the whole machine instead of just the service?

This simple every day example shows the real problem: Trust

We seem to have a problem when faced with the necessity to trust a machine or some lower level of reactive “intelligence”. Maybe this is just due to the many science fiction books we have read on robots and machines gone mad. In the end we are the ones who gave the engine the rule set by which it acts.

Actually we trust in automation every day we step into a lift. Much more than that, we rely on hard wired automation when we breathe or when our heart beats. I think none of us would be too happy about the idea of having to think and act out every breath and heartbeat consciously and willingly. Not much difference in automated actions in IT administration - and just like you can hold your breath automated actions can be overridden at any time.

This sounds very logical, doesn´t it? But logic is not the drink for “unsinkable rubber ducks” (the term true believer nowadays it too closely connected to politics - and besides much less enjoyable). So a good argument usually does not help much. In order to get on with automation either management uses force or try to employ man´s oldest habit - laziness (maybe we could get entangled in a discussion on greed or laziness being around first). And do not get me wrong, great things like the wheel were invented because of laziness. And on the way, we build trust towards automation in a non intrusive way - i.e. everyone involved can discover for himself that automation helps and is not evil. So this is how it is done:

  1. Setup the automation engine in full
  2. Disable all automated commands and redirect them to a trouble ticket or service management tool.
  3. Have administrators use this tool and hence make them see what the engine would have done.
  4. After a while people will start to copy and past the commands form the trouble ticket or service management tool into the various command lines.
  5. This is the time to enable automatic command execution. The connection to the service management or trouble ticket system stays as it is. So the commands executed are not in any way “block boxed”.
  6. There will not be mistrust and all the discussions, bad feelings and politics attached to it.

Hot Topic: Automation and Compliance

Automation, Business Impact of Automation No Comments »

We are all moved by compliance issues. Mainly storage vendors, consultants and auditors are having a feast. For most corporations introducing the new rules is quite a drain on resources. Besides this, changes in the working processes are the main cause for discomfort in the workforce and management of the entities affected by the rules.

Automation actually solves one big problem compliance poses on IT operation. However it may also make an old one reappear.

So let us take a look at the good news first. One demand often posed by auditors and clearly stated in all new compliance rule sets is, that all actions and the reasoning behind taking the actions should be well documented and archived. In a normal working environment this usually means getting on the case of everybody and forcing them to type explanations of what they did into some documentation system after the system has behaved like big brother and logged the technical parts of the doing. This can become tedious and does not have much positive effect on day-to-day business. So most explanations in these systems look like ‘fixed the ABC problem’ and the reasoning part is lost forever. This is where an automation engine really helps. An automation engine will document each action it takes, archive the data and the rules that have caused the action to be taken and reveal the planned next steps and all related actions and problems. So there is one big relief for everybody working on or auditing IT operations. Great, isn’t it?

The second topic is the way roles and rights are managed along compliance rule sets. In the dark ages, there was a super user (many administrators are still worshippers of this creed). According to the new rules one administrator can have the rights to perform manipulations on exactly the entities he is attached to. A database administrator for example should only be able to talk to his database and if he needs some different system settings, because his database requires more semaphores he will have to create a change request to the OS administrators. At least that is how it works in theory or whenever administrators want to slow each other down dramatically. I think the intention of the new rules is clear and unarguable: One human should only be able to have influence on the direct area he is dedicated to. Everything else can produce unpredictable risks and should thus be avoided. All fine and good and most corporations (at least the larger ones) have implemented ‘the admin silo view’ by using simple mechanisms like ’sudo’ or more complicated rights management systems.  Upon inserting an automation engine in this environment any administrator who can create a rule that is reusable could lead to command executions outside the rule author’s area of competence.

Well one would argue that is exactly what we want. We want to reuse the expert knowledge of someone who solved a problem in different environments. Auditors probably would say ‘no this is exactly what we do not want’….. A big dilemma?

I do not really think so. And I do think that we really want the knowledge to be distributed and here is why:

  1. The ones who are writing rules are experts. Like the export we call in, when we really cannot find the cause of or remedy for a problem.
  2. The guy who wrote the rule will always be identifiable from the engines point of view and that was the original intent of the compliance rules (make sure we know what was done by whom and where).
  3. One could restrict rule attachment by group signatures and the like (additional parameter in the IT model) to create peace and quiet, but should one really dismiss the power of implicit rules if every action and its originator is well documented? (Maybe someone really into the field of compliance could answer this question for me???).

So all in all automation may cause some auditors or process consultants some headaches, but heck - this is what they are paid for, isn´t it? On the other hand an automation engine produced well formed documentation and reasoning for the auditors, which is something that any kind of silo restriction on the human workforce cannot guarantee.

Who is automated „away“

Automation, Social Impact of Automation No Comments »

As discussed before, automation in IT operations definitely has a strong social impact. It is a question of how IT professionals deal with the change that will make the difference in the end.

As I spent most of last week at an American University, I obviously had quite some discussions on how automation impacts the lives of IT administrators. There seems to be a lot of personal discomfort (understandably). Unfortunately these personal issues get mixed up with the technical ones. Many people have asked me questions like “do you trust the machine to stop a service, restart a machine or even allocate resources dynamically?” Well, yes I do. I have trusted my system for quite some time to allocate memory and disk space for me and so have you and we are trusting computer programs to land planes, control elevators and life support systems in an ER. So why – WHY – should we not trust a machine to do something radical like rebooting a server?

In my opinion a machine has two major advantages over a human administrator in standard situations. First it never executes radical commands due to “gut feeling” (like boot feels good) and second it documents the path it took to reach to conclusion that executing specific commands is a good idea. So you do have documentation (hello to all you SOX consultants out there) and if there really is an error you know where to look and you will be able to change you rule set accordingly.

Garex Ok, so maybe we can solve the problem of trust through logical argument. Unfortunately some people are very much resistant to logic. So another approach we sometimes take is to do a dry run. That means, we install the automation engine and disable all execution and redirect the execution command to document everything it would do into a trouble-ticket. As soon as administrators start pasting commands out of the tickets you know it is time to enable the real automation.

But let us get down to the actual administrators and the consequences all that automation has on them. There is this geek shirt “Go away, or I will replace you with a very small shell script”. By the way, the guy in the picture is actually one of our administrators - one of the guys who really DO automation. I think the shirt was done to scare off users. But nowadays this is actually what will happen to administrators who do not want to be part of this changing world. In my vision of the future there will only be two kinds of administrative staff close to a data center: Real IT experts (the Gurus) and janitors. The experts are today´s administrators who want to get rid of all the boring – I have done that about 10.000 times – tasks and deal with the exciting stuff instead. Well the others …..

To get it straight: I actually do not think that there will be fewer jobs in IT administration in the future, mainly because IT is an ever growing plant. I do think that there will be a lot less “boring” and unqualified work in IT – as we have seen in all other industries. Before.

So, is that really a bad thing? More exciting tasks, more real results, more happy administrators? I don´t think so… Let´s get it on guys

Taking a Look inside aAE (arago Automation Engine)

Automation, Automation Technology Deep Insight No Comments »

Time and again people ask me, what they see, after implementing an automation engine. My answer usually was “well nothing really, you will see that your applications have a better uptime…”, but obviously that is not what people want to hear. The whole idea of an automation engine is, that things happen in the background and no one has to sit in front of some console watching lights turn red.

aAE Visualizer ScreenshotStill people want to see what is going on. And as automation is a matter of trust – the trust of system administrators and managers, that such an engine will improve IT service instead of messing it up – it probably is a good idea to enable a peek under the hood of the machine. Actually as we are using a graph algorithm approach to finding the automatic steps to be taken in order to resolve a problem, it sounds like we should show a graph of the whole thing.

So that is just what we have done. In the screenshot attached you see the prototype of “aAE Visualizer”. This JAVA Application actually displays the IT model and the issues and events travelling the model. On the model Graph it is possible to see where issues are created and how they travel the engine in order to find actions to take. But this visualization application is not just a pretty way to let interested people “look at what is happening” in our automation engine; it also allows to locate hotspots in an automated IT landscapes easily. Hotspots always indicate a challenge. Either a hot spot is an error in the model – a place where problems travel in circles without finding any resolution – or a hot spot is an actual bottle neck in the IT infrastructure that is not visible from a capacity management point of view.

So I am very happy to announce, that this visualization application will not only make my job of explaining how “automation works” easier, but will also allow our administrators to locate model problems or IT landscape problems with much less effort than before.

Top