Input to an Automation Engine – Namespace as a Starting Point
Automation, Automation Technology Architect View No Comments »I have been talking quite a bit about the technology that drives an automation engine. Actually there could be many approaches for the technology that evaluates conditions and chooses the right actions to execute. Our technology takes a “divide and conquer approach” in a very distributed system and therefore simulates the behavior of a good human administrator. Other technologies take a “drill-down” or “boil-up” approach. All the technologies produce automation results and normally they are used for special tasks. E.g. a drill-down approach is focused on a straight forward root cause analysis approach.
Apart from all these technologies and very important backend decision the question of what goes into an automation engine is paramount to the actual results of automation. I have written a blog entry on the basic IO model of an automation engine emphasizing this point. As you may remember I proposed two different streams of preliminary input data to the automation engine. First there is the model data that build up the space automation is to take place within and second there is the monitoring data representing the actual condition of each node in the model. These two basic data streams are evaluated by the rules engine.
The data streams have to fulfill certain pre conditions in order to produce proper automation output. I will talk about the attributes of the model data stream in this article in more detail. The monitoring data stream holds either event driven or time series data. Finding a way to normalize this stream so a rule can evaluate monitoring data at any given point in time will be the contents of an entry here soon.
The IT model is described as a representation of the interdependencies between IT entities or in a ITIL way of speaking between configuration items. There are a lot approaches towards building such a model. Depending on the approach the model has a different number of layers and dimensions as well as different kinds of relations between its nodes. Just like an up to date model is key to automation as well as to orderly IT processes, the complexity and accuracy of the model will have to compromise with its maintainability. Many vendors are trying to reduce this need to compromise by building auto discovery solutions such as IBM’s TADDM. Still the complexity of the model is proportional to the user acceptance of every process and technology based upon this model.
Behind the model of interdependencies are the nodes that are interdependent. And these nodes have to describe IT entities using meta-information. This meta information is put down into attributes and these attributes can either save the world or be the cause of all evil.
Therefore building the actual values for the attributes should be worth some thought. Surely there are simple attributes like HOSTNAME or the like and we do not have to think much about it. But yet a simple attribute such as OS can be a bigger challenge than would be expected. When you simply assign “Windows” or “Linux” to OS, then you will only be able to match this exact system when building conditions for automation. When you assign something like “server.windows.2003″ where the first part describes the OS usage, the second the OS family and the third the actual system you can match other windows servers by building a condition like “server\.windows \..*” or you will be able to select all Linux systems (regardless of desktop or server) by building a condition like “.*\.linux\..*”.
Maybe this little example shows the power of building up proper systems for name spacing. So what kind of system is appropriate for automation? A simple “name” solution (like the first example) is not good for anything but a quick and dirty test of an automation engine algorithm. The second approach shown above (a tree like structure) is very powerful and very close to XML (which most people use to declare structured data these days). These tree like structures are good for expression matching and therefore good as an input stream to automation engines. When using these structures you have to build up a clear understanding of the trees to use first. As you can see from many discussions (one of the most competent between Van Wiles and William Vambenepe) the problem of agreeing to applicable and technically usable naming conventions is still up in the air even though it can be one of the major causes for CMDB projects failing and definitely has major impact on any automation engine. Each vendor has their own naming conventions and definitions eloquently elaborated, but unfortunately no one has looked upon the problem from higher ground. The closest I have found so far is a chapter in the book Implementing ITIL Configuration Management by Larry Klosterboer.
I have had many encounters with strange approaches towards the issue of naming conventions and namespace and therefore made sure that our algorithms can work with any kind of namespace (with varying degrees of performance). If you want to “do it right” I would strongly suggest to stick to the following principles when building up your personal CMDB or model of interdependencies:
- If you are willing to attach yourself to a vendor (not just for the CMDB, but most products delivering towards the ITIL processes), stick with the naming conventions of this vendor. The guys usually have put some thought into them. If this is not possible for you (either because you strategically have to place large vendors against each other, because you like your software zoo or just because…) completely build up your own space.
- Use a treelike structure for everything and make this tree structure fixed. Meaning that each depth level in the tree always correlates to the same sub attribute. This may mean that you will have to “fill” some levels in the tree for some nodes (like “windows.windows.2003″). This will save you from extensive misinterpretation by people who do not use your namespace everyday.
- Do not include versions into the tree-structured attributes. Versions are a secondary decision criteria and are used AFTER you know what you are dealing with. Not just our automation engine does use different parts of a rule but still the same rule for different versions of the same environment, many other tools do - therefore performance increases when you keep versions separately.
- Do not “outperform” yourself when building or using naming conventions. In any case (using a vendor´s approach (who has to be very flexible) or your own (you may want to do it scientifically tight)) only fill in or use the attributes and sub attributes that make sense for the task at hand. If you stick to the proper structure you can always enter additional data later on (as you need it). Data in place has to be of some use, as it just by being there creates costs).
Just by sticking to these (1. and 4. being the most important to bracket things up) you can make sure that your IT model is easily understood, has low maintenance cost and can be used for something innovative like automation right away.