IT Automation – All the Things We Are Talking About

Automation, Automation Technology Architect View, Business Impact of Automation 1 Comment »

Reading and writing about IT automation, I keep on learning about the subject. Lately I found that there are so many flavors of automation around the operating processes of IT, that misunderstanding seems inevitable. So I try to make a point here to talk about the different kinds of automation one can use all around maintaining a high quality IT environment.

Types of Automation Tasks

  1. Incident-, Problem-, Capacity- and Availability Management
    Automation engines specialized on analyzing and handling events that occur in a IT environment that may lead to or themselves represent malfunctions, loss of quality and the like. Both reactive (automated reaction to an incoming event) and proactive (automated actions taken to prevent events from occurring) are target of these engines. Automation engines that handle the “fault operating” are either embedded into the ITIL processes (see blog entry on extending ITIL with automation) like our automation engine (aAe) or are embedded into system components or management systems with a narrow scope e.g. on redundancy activation.
  2. Change Management
    Automation engines specialized on performing changes that modify or extend an IT environment automatically. Either these engines are Inserting an abstracted layer above tasks that need to be performed (like adding users, restarting a component and the like) these engines allow an administrator to perform tasks on many machines or on different platforms without by interacting with the automation engine. An example for this kind of engine is the Puppet framework with a very structured approach to abstraction. Or these engines focus on scaling an IT environment by dynamically adding resources or automatically installing or modifying a system like the Tivoli Provisioning Manager or VMWare Virtual Center does.

I really do hope (not just to save you some consulting fees) to have helped avoid misunderstandings, when you are talking to others about automation and even better maybe I could point out some additional techniques you can look at to make life easier.

Automation as a Strategic Issue at HP

Market 1 Comment »

Not just for all of us who have to deal with day-to-day operation of IT the topic of automation seems to be of great interest. Naturally the interest of people maintaining systems and services becomes the interest of vendors. I had the pleasure of attending the HP BTO Talk in Frankfurt and was glad to find out, that automation itself is the main focus of HP´s system management efforts.

For the first time since HP acquired OpsWare in 2007 I was actually able to see the platform in a customer environment. Swisscom attended the event and demonstrated their efforts in network automation. More impressive was the presentation of Mr. Rossa from Wien IT, who was able to show how standard changes and standard procedures in provisioning were captured into the automation suite.

I have seen more complex provisioning environments but in the HP presentation on the OpsWare platform I could get a glimpse at the visualization and reporting offered behind the scene. Coming up from the network layer they really found a very intuitive way to show what is actually available and going on in an IT infrastructure.

The strategic presentation offered by Mr. Winkler from HP put forward automation as the key to the HP software strategy. I consider this corporate understanding to be a major advantage in market development – much more than all the thousand features us techies like to talk about every day. So in my opinion HP´s view of the future is absolutely correct:

Good IT operation is, when you see nothing of it

I was a little amazed to see that the actual automation of operational tasks as well as tasks dealing with incidents and problems are still in a fairly basic state. All the cute things we have been talking about in this blog are still in the vision only. Simple rules and actions can be applied but that is all. Compared to the field of automated deployment, standard changes and predefined tasks the automated reaction to upcoming problems is not in an advanced stage. Even though there obviously is a really fancy interface for cross platform command execution. This interface could actually be hooked up to an automation engine like aAE and voila, commands would go out to the world. I actually think we will give this a try.

All in all I have to say that the visualization is impressive and the strategic alignment of the softwarestack is convincing. I will keep a close eye on the things happening around there – even though integration all the new acquisitions may still take some time.

Can Automation be Trusted - Or How to Build Trust on Laziness

Automation, Social Impact of Automation 4 Comments »

Well, what a very basic question… Should we be discussing automation engines, when we should not have trust in them automatically taking action? Surely not, and obviously we are discussing automation engines.

So why do I hear so much about the lack of trust towards automated actions? It may be a stunning change in the field of system administration, that some entity takes automatic action where normally a system administrator would have typed in a couple of commands up to now. And change always induces fear and prejudice. Questions like “do you really trust the engine to restart this business critical service?” are not really uncommon. Well why should the machine not do that? After all the only action a system administrator would have taken is to restart the whole machine instead of just the service?

This simple every day example shows the real problem: Trust

We seem to have a problem when faced with the necessity to trust a machine or some lower level of reactive “intelligence”. Maybe this is just due to the many science fiction books we have read on robots and machines gone mad. In the end we are the ones who gave the engine the rule set by which it acts.

Actually we trust in automation every day we step into a lift. Much more than that, we rely on hard wired automation when we breathe or when our heart beats. I think none of us would be too happy about the idea of having to think and act out every breath and heartbeat consciously and willingly. Not much difference in automated actions in IT administration - and just like you can hold your breath automated actions can be overridden at any time.

This sounds very logical, doesn´t it? But logic is not the drink for “unsinkable rubber ducks” (the term true believer nowadays it too closely connected to politics - and besides much less enjoyable). So a good argument usually does not help much. In order to get on with automation either management uses force or try to employ man´s oldest habit - laziness (maybe we could get entangled in a discussion on greed or laziness being around first). And do not get me wrong, great things like the wheel were invented because of laziness. And on the way, we build trust towards automation in a non intrusive way - i.e. everyone involved can discover for himself that automation helps and is not evil. So this is how it is done:

  1. Setup the automation engine in full
  2. Disable all automated commands and redirect them to a trouble ticket or service management tool.
  3. Have administrators use this tool and hence make them see what the engine would have done.
  4. After a while people will start to copy and past the commands form the trouble ticket or service management tool into the various command lines.
  5. This is the time to enable automatic command execution. The connection to the service management or trouble ticket system stays as it is. So the commands executed are not in any way “block boxed”.
  6. There will not be mistrust and all the discussions, bad feelings and politics attached to it.

Hot Topic: Automation and Compliance

Automation, Business Impact of Automation No Comments »

We are all moved by compliance issues. Mainly storage vendors, consultants and auditors are having a feast. For most corporations introducing the new rules is quite a drain on resources. Besides this, changes in the working processes are the main cause for discomfort in the workforce and management of the entities affected by the rules.

Automation actually solves one big problem compliance poses on IT operation. However it may also make an old one reappear.

So let us take a look at the good news first. One demand often posed by auditors and clearly stated in all new compliance rule sets is, that all actions and the reasoning behind taking the actions should be well documented and archived. In a normal working environment this usually means getting on the case of everybody and forcing them to type explanations of what they did into some documentation system after the system has behaved like big brother and logged the technical parts of the doing. This can become tedious and does not have much positive effect on day-to-day business. So most explanations in these systems look like ‘fixed the ABC problem’ and the reasoning part is lost forever. This is where an automation engine really helps. An automation engine will document each action it takes, archive the data and the rules that have caused the action to be taken and reveal the planned next steps and all related actions and problems. So there is one big relief for everybody working on or auditing IT operations. Great, isn’t it?

The second topic is the way roles and rights are managed along compliance rule sets. In the dark ages, there was a super user (many administrators are still worshippers of this creed). According to the new rules one administrator can have the rights to perform manipulations on exactly the entities he is attached to. A database administrator for example should only be able to talk to his database and if he needs some different system settings, because his database requires more semaphores he will have to create a change request to the OS administrators. At least that is how it works in theory or whenever administrators want to slow each other down dramatically. I think the intention of the new rules is clear and unarguable: One human should only be able to have influence on the direct area he is dedicated to. Everything else can produce unpredictable risks and should thus be avoided. All fine and good and most corporations (at least the larger ones) have implemented ‘the admin silo view’ by using simple mechanisms like ’sudo’ or more complicated rights management systems.  Upon inserting an automation engine in this environment any administrator who can create a rule that is reusable could lead to command executions outside the rule author’s area of competence.

Well one would argue that is exactly what we want. We want to reuse the expert knowledge of someone who solved a problem in different environments. Auditors probably would say ‘no this is exactly what we do not want’….. A big dilemma?

I do not really think so. And I do think that we really want the knowledge to be distributed and here is why:

  1. The ones who are writing rules are experts. Like the export we call in, when we really cannot find the cause of or remedy for a problem.
  2. The guy who wrote the rule will always be identifiable from the engines point of view and that was the original intent of the compliance rules (make sure we know what was done by whom and where).
  3. One could restrict rule attachment by group signatures and the like (additional parameter in the IT model) to create peace and quiet, but should one really dismiss the power of implicit rules if every action and its originator is well documented? (Maybe someone really into the field of compliance could answer this question for me???).

So all in all automation may cause some auditors or process consultants some headaches, but heck - this is what they are paid for, isn´t it? On the other hand an automation engine produced well formed documentation and reasoning for the auditors, which is something that any kind of silo restriction on the human workforce cannot guarantee.

Who is automated „away“

Automation, Social Impact of Automation No Comments »

As discussed before, automation in IT operations definitely has a strong social impact. It is a question of how IT professionals deal with the change that will make the difference in the end.

As I spent most of last week at an American University, I obviously had quite some discussions on how automation impacts the lives of IT administrators. There seems to be a lot of personal discomfort (understandably). Unfortunately these personal issues get mixed up with the technical ones. Many people have asked me questions like “do you trust the machine to stop a service, restart a machine or even allocate resources dynamically?” Well, yes I do. I have trusted my system for quite some time to allocate memory and disk space for me and so have you and we are trusting computer programs to land planes, control elevators and life support systems in an ER. So why – WHY – should we not trust a machine to do something radical like rebooting a server?

In my opinion a machine has two major advantages over a human administrator in standard situations. First it never executes radical commands due to “gut feeling” (like boot feels good) and second it documents the path it took to reach to conclusion that executing specific commands is a good idea. So you do have documentation (hello to all you SOX consultants out there) and if there really is an error you know where to look and you will be able to change you rule set accordingly.

Garex Ok, so maybe we can solve the problem of trust through logical argument. Unfortunately some people are very much resistant to logic. So another approach we sometimes take is to do a dry run. That means, we install the automation engine and disable all execution and redirect the execution command to document everything it would do into a trouble-ticket. As soon as administrators start pasting commands out of the tickets you know it is time to enable the real automation.

But let us get down to the actual administrators and the consequences all that automation has on them. There is this geek shirt “Go away, or I will replace you with a very small shell script”. By the way, the guy in the picture is actually one of our administrators - one of the guys who really DO automation. I think the shirt was done to scare off users. But nowadays this is actually what will happen to administrators who do not want to be part of this changing world. In my vision of the future there will only be two kinds of administrative staff close to a data center: Real IT experts (the Gurus) and janitors. The experts are today´s administrators who want to get rid of all the boring – I have done that about 10.000 times – tasks and deal with the exciting stuff instead. Well the others …..

To get it straight: I actually do not think that there will be fewer jobs in IT administration in the future, mainly because IT is an ever growing plant. I do think that there will be a lot less “boring” and unqualified work in IT – as we have seen in all other industries. Before.

So, is that really a bad thing? More exciting tasks, more real results, more happy administrators? I don´t think so… Let´s get it on guys

Automation makes green IT possible

Business Impact of Automation, Green IT No Comments »

Considering the fact, that only 30-40% of the energy consumed by a data center is used by the actual computational equipment and considering that another 30%-40% of the energy consumed by the IT equipment is converted to heat, only 21%-24% of the energy eaten up in data center is actually converted into computing power. Looking at these numbers from the other side means for each Euro spent on “computational energy” 4,76 Euro are spent on “overhead”. This ration can be improved by optimizing air conditioning, getting rid of heat hot spots or generally using energy efficient and modern equipment. Still it seems unlikely that this will help to get anywhere below 3 Euros of “overhead” for each Euro spent on “computational energy”. These numbers are taken out of the keynote presentation given by Steven Sams at IBM PULSE 2008 (Also check out the “Raised Floor Blog”, where Steven Sams is one of the authors)

On the other hand this means that reducing the energy needed for computational equipment will in absolute numbers decrease the excess energy consumption by a factor of more than 4. So improving the facility is a good start, but reducing the energy actually needed by computational equipment is the real price. The way to reduce energy needed is a direct result from capacity management. Generally speaking this means – in the best case – turning off as many components as possible – or if that is not possible, at least cutting their energy usage by slowing the CPUs or putting virtual instances into suspend mode until their service is really required. Does this sound easy? Well it does, but how does it work? Virtualization certainly is the key technology, but what good would virtual machines be, if their resources could not be allocated automatically depending o their actual use or – if you want to be cautious – by their predicted load and therefore by their predicted usage. A specialized set of rules is put behind process and operational automation, to perform the scale-down and scale-up of the virtual machine resources. This automation can even decide to turn off hardware, that is currently not needed or at least to slow the CPUs of hardware that cannot be turned off, but is in little use.

Modern “or very green systems” come along with special agent to detect energy consumption and usage deriving possible executions. But how about all the legacy applications – the applications that are running on more than 95% of all the components, using up energy in our data centers today? An automation engine that actually acts like an operator (someone who could manually cut down on power use) could examine the equipment in the IT landscape it is acting upon and execute general rules to reduce energy usage. By combining both technologies – the more effective combination of modern hardware and specialized software for new applications and a general automation engine for all the legacy applications – the power of virtualized components can actually be converted to green power. This is not just a fabulous business case, but it also is a good thing for the environment and hence for all of us.

Cloud Computing needs automation

Automation, Automation Technology Architect View, Business Impact of Automation, Uncategorized 2 Comments »

Yesterday I had the chance to get a feeling for one of the hottest topics in IT infrastructure. A panel session at IBM PULSE 2008 was dedicated to the topic of Cloud Computing (even though IBM marketing people don´t seem to like the term and have come up with quite some innovative words – words no one uses, so let us stick with the cloud). The panel was buzzing with intelligence, unfortunately we as the audience could not really match up. So we listened to a pretty much directed discussion on how cloud computing would replace today´s approach to hardware and infrastructure in general. Well I do agree, no one needs dedicated servers when resources can be allocated dynamically and come preconfigured and interconnected. Kristin Hansen stripped the key features of a cloud down to simplicity (users do not care how their resources are set up, they just use them), mobility (obviously use is possible from anywhere and even a large computing cluster could be controlled from a phone like device) and elasticity (you only setup or pay what you really need). Sounds fine to everyone and Google and Amazon have definitely shown to the world that this concept works in a closed shop environment. According to Dave Lindquist IBM is working on a methodology and technology to make most applications “cloudable”. The most interesting remark I heard during the discussion was the “Cloud Computing is the combination of technology (virtualization and automation) and discipline (a stringent way of breaking down the offered services into small blocks in order to recombine them quickly and automatically upon the user´s request as well as defining standards or service catalogues to be offered)”. I guess the discipline part will put forth a great deal of discussions between process consultants and methodology consultants and in the end there will certainly be a couple of good ways to set things up. Just as certainly there will be the need to standardize these processes and methodologies in the end, so clouds are not proprietary but keep mobile even between cloud providers.

Naturally I am more interested in the technology part, that is needed behind cloud computing. Technology - in this case - not referring to the cloud management servers and agents themselves, but the technology surrounding them. The first technology that comes to mind is virtualization as without this core there will be no cloud, at least no cloud that can integrate legacy applications rather than working in a very tightly closed universe like Google does. There are quite some good approaches to virtualization – commercially as well as open source – and the approach taken should really depend on the needs of the applications to be run on a specific part of the cloud. It does probably make sense to even merge the available virtualization technologies within one cloud. It might make sense to use containers build into the operating system or complete hardware virtualization depending on the kind of application to be run and therefore a cloud manager will have to deal with all kinds of virtualization technology.

More on my focus is the service management side of cloud computing and I strongly believe that automated operating is a key component of a good cloud infrastructure. Definitely the cloud infrastructure and management components will take care auf automatic provisioning and resource management, but as soon as legacy applications – that do not really know that they are running on a beautifully scalable environment – are involved manual administration of these applications would mean chasing an ever changing rabbit across a chameleon planet – an image most amusing to bystanders but neither funny to administrators nor to the ones paying them. So in my opinion an automation engine could be fed IT model data and monitoring feeds directly from the cloud manager and could thus deal with the ever changing environment and keep the application automation rules up to date with the cloud components currently in use. This automation engine cannot use a drill down approach, because the infrastructure might not even support drill downs and can change ever so often. The automation engine assuring a good foundation for quality service a professional service management will have to use a more human “circle in” or divide and conquer approach.

Does this sound familiar? By the way, check out the articles on the “Blue Cloud”; technical pioneers at work (other bloggers also think about the blue cloud)…. Also interesting is the cooperation between Google and IBM on producing cloud standards

The simple concepts behind automation

Automation, Automation Technology Architect View No Comments »

I should be the last person, to say that automation is a simple concept - I make my money on automation. And often people really think that “a technical concept behind automation” is the hard part - well let me tell you: it is not. The technical concept of automation is something deeply embedded in the binary way our It works. The technical principle behind automation simply is

IF (a complex condition) IS TRUE THEN (do something)

So does that sound familiar? And we have not even introduced the concept of ELSE, ELIF or CASE yet J. Well we can put this into a little more technical terms by saying:

Automation is the condition based execution of actions to ensure the quality of service of an IT environment. Where conditions can be combined from expressions covering all aspects of the IT environment in question and actions can be one or a serious of command execution in one or many locations of the IT environment regarded.

So to go back to the divide and conquer there that was so useful in solving many IT problems we have to ask ourselves three questions:

  1. What is the IT environment and what are the interdependencies within this environment?
  2. What are the expressions “a complex condition” is composed of and what is the data evaluated in these expressions?
  3. What are the actions to be taken and where are they to be executed?

So let us try to answer the three questions. First the IT environment and its interdependencies can be modeled. The entities the environment s composed of all “configuration items” that are part of the environment in questions. The interdependencies are relations between these entities. The “detail questions” to be solved are: At what level of detail do we model, and what kind of relationship model will we use? Well answering those will take us into a specific implementation of IT automation, and we are right now looking at the concept behind all these implementations, so let us stay at this level of abstraction.

Second the expressions evaluated in order to know which actions to execute are embedded in the knowledge of the administrators doing exactly this job today. So the expressions and conditions could be classified as the knowledge database put into machine readable for. The data needed to evaluate the expressions - after we know what they are - is all data available on the IT environment we are looking at. This includes technical monitoring data, end-to-end monitoring data, data processing information, transaction monitoring but it also includes quality of service information, KPIs, SLA information and business impact data - basically anything we can get hold of.

And third actions are the things administrators and gurus enter via keyboard, mouse or telepathic network link in order to make the “bad condition” go away. An action can be a simple command on one system but it can also be a series of commands (maybe with conditional execution) or even scripts of commands distributed to many systems. So an action can be as simple as /etc/init.d/apache restart or it can be something as complex as a 10.000 line program, some SQL scripts and a shell script executed on a dozen machines. But in the end these actions are put together today - as scripts and How-to in the system administrators’ dens of the world.

So you see. Automation is something simple: We should know about our IT environment and the interdependencies of its entities anyway. We know about the conditions - or at least we can find out - and then execute actions (some of which we have already put into scripts or programs to make life easier). So automation is just a centralization and connection of things we are already doing.

Is automation black magic?

Automation, Social Impact of Automation 1 Comment »

Often automating IT is handled as an obscure Art. Maybe some regard it as the black magic of the 21st century. When I don’t understand things, I tend to divide and then conquer them, so in this case why black and why magic? Maybe black, because automation is regarded as something evil by quite a few IT people. Good techies could lose their jobs or at least their “God” status, when automation actually works. And maybe magic, because automation is clear to us viewed on a single system - i.e. things you didn’t want to do manually are put into a script and voila the system does them automatically - but in a large IT environment, all of a sudden things seem to happen by themselves.

But let me tell you, IT automation is neither black, nor magic. It is not magic, because after all it can be broken down just to that simple script example above. So if you divide the automation of a large IT environment you will in the end arrive at one - or maybe more - scripts being executed under certain conditions. So the question - I guess we will be talking about that in a little while - is which script or scripts to execute under what condition. And automation is not black, because “people losing their jobs or their current status” is nothing evil but the way our world works. Change is the driving force of everything and anybody trying to position himself against the power of change will definitely loose in the long term. So I would recommend embracing the ideas of automation rather than putting it down there with devils and demons - and by the way we do have enough of the latter around in IT anyway.

Top