The issue of how better to use IT resources is currently the focus of a lot of interesting buzz words such as capacity management, Green IT, energy management and, above all, Cloud computing. The intention of the following piece is to outline why these themes have apparently landed out of the blue – and with great force – on the desks of CIOs and are being pumped out by marketing machines the world over, and above all to examine what they mean and which approaches you can choose to get involved with – and which challenges you will face in the process.
The fallacy of the command economy in the example of IT capacity
In most companies the required infrastructure is purchased simultaneously with IT projects. As this forms part of the project budget such infrastructure is only used for a particular project. However, this also means that infrastructure thus acquired is amortised over a period of three to five years and has to be so designed right from the outset to be adequate to the requirements of running the IT solutions created in the project for the same period without any expansion worthy of note. This alone leads to an incredible overcapacity because it goes without saying that assumptions regarding growth and demands on the infrastructure tend to err on the side of caution, leading in all probability to overcapacity even after the end of the period, let alone at the start of production, for which a massive overcapacity is held in reserve.
Seen in statistical terms hard- and software investments (including maintenance) of less than 20% of the total IT budget can still be supported. However, if you take into account the energy costs generated by the hardware once it has been set up, amounting to another 20%, in combination with analysts’ predictions of massive increases in energy costs, what you are left with is the urgent need to put an immediate end to the deliberate generation of overcapacities. The energy factor is of particular relevance here as the proportion of energy actually required for the IT solution only amounts to 1/6 of the total energy consumed. The energy otherwise consumed is transformed into waste heat e.g. for cooling and other data center measures. In this case the energy consumption of the hardware depends only to a very low degree on the extent to which the hardware is used.
It is for these reasons that capacity management and energy management are such important elements of a modern IT strategy.
Capacity management – overarching approach with a need for action on the organisational level
In the energy management field there are some approaches, very typical of IT, according to which management software is used in combination with electricity management hardware to cut back on the use of energy by existing hardware <Link to Sun-Wiki, only TOC available>. This can have a positive effect but bears no real comparison with the capacity approach, according to which unnecessary resources are not acquired in the first place, let alone tying up electricity or administration capacities or requiring additional hardware investments.
Capacity management is therefore the clean approach. The assumption behind capacity management is, however, that any plan to acquire new technical solutions is preceded by the reallocation of the hardware acquisition budget from the IT projects to those specialist department s that have to pay the IT department over time for the actual use of IT resources. Secondly – and likewise before any technical measures are effected – the IT architects need to be given plausible reasons as to why they should no longer plan in hardware buffers as part of their architectures, as planning in such buffers immediately negates the positive effects of capacity management. Once these operational and psychological steps have been successfully negotiated it is then worth exploring in greater depth the questions surrounding the implementation of capacity management.
Capacity management and its organisational implementation
The current implementation strategies for capacity management very frequently talk of the need for migration to a completely new platform. If you look instead at the successful capacity management environments of, among others, Google or Amazon, you will be struck by the fact that these two organisations above all consistently pursued the express objective of continuing to use existing hardware for as long as possible. This measure seems worth imitating, leading to the question of which applications and environments are of relevance for the capacity management issue in the first place.
As far as the architect is concerned this question can very easily be answered: ALL OF THEM. In practice, however, it is worth defining a few clear rules to determine the sequence in which applications are to be migrated to a capacity management environment and which hardware should continue to be used, under which conditions hardware should be disposed of and, accordingly, under which restrictive conditions new hardware should be acquired. If you assume that you currently have an overcapacity of at least 80% then the proportion of your hardware requiring decommissioning is significant.
Such a body of rules might look like the following:
All hardware that is not be amortised in the next six months must continue to be used
All hardware that has already been amortised and already regularly requires extended maintenance is to be decommissioned.
All applications that have registered at least one incident in the last 12 months arising from capacity problems are to be migrated along with priority 1.
All applications for which new hardware acquisitions have already been decided but not yet implemented are to be migrated along with Priority 1.
All applications working at less than 5% of capacity are to be seen as a pool for these priority 1 migrations and added to the migrated applications until such time as the required peak capacity has been reached.
Just how the actual migration is to be effected without initiating a 1:1 manual migration of all IT applications is a significant challenge that does not fall within the scope of this contribution. This much can however be said: without large-scale automation such a project is not realistic and should not be attempted in the first place.
The architecture of an environment with capacity management
The first obvious tool for the implementation of an environment in line with the requirements of capacity management – one that flexibly makes the existing IT resources available to the applications as and when needed – is virtualisation. It’s no coincidence that these technologies have seen significant growth in recent years. Alongside that of virtualisation, which makes it possible to simulate several virtual resources on one physical one – in other words enabling the utilisation to maximum capacity of existing resources – the question also arises of how to administer such an environment, how to monitor and predict the required capacities and, last but not least, how to operate such a newly-created environment.
Let’s first take a look at the subject of the virtualisation platform. The assumption in the case of those applications which are to be migrated in the first stage to a capacity-managed environment is that various different platforms should be deployed. This means that you either need to use different virtualisation technologies (e.g. VMware for Linux and Windows environments, Solaris 10 for SPARC environments …) or to migrate applications into another environment before deploying virtualisation – whereby the latter seems to be unrealistic if there is to be any expectation of short-term results.
A good architecture for an environment that supports virtualisation for different technologies, thus making it possible to use capacity management throughout, must support different platforms and define a simple interface for the placing of a system on these platforms. This is the only way to avoid the destructive collision of any existing in-house virtualisation initiatives and to use them smoothly together in pursuit of the grand aim of capacity management. Such a procedure also has the clear advantage of offering one methodology irrespective of the platform – opening up the possibility, depending on market developments, of replacing one virtualisation technology with another and one platform provider with another, for example in cases where in future you also want to consider using external providers rather than your own data center.
A good architecture that brings together different platforms and providers for one and the same platform under one roof, with stable interfaces and procedures, is therefore a prerequisite. Firstly because this is the only way to guarantee long-term supervision of the newly-designed IT landscape and secondly because such interfaces represent the only way to automate transitions from one platform to another in normal operation and, above all, in the course of migration.
Capacity management, workload and the Cloud
When planning the size of such a capacity-managed environment you are automatically faced with the question of required total capacity. It will be at this point, if not before, that you will realise that Cloud approaches which independently assume the function of resource allocation are essential.
Because it is hard to get used to the idea that IT infrastructure is no longer important – a psychological problem given a rational face by discussions about security – the first point of contact with such an environment will logically be the “private cloud”. This means implementing the technical concepts of virtualisation, dynamic resource allocation etc. on a platform that you control 100% or that you actually own before starting to think about whether it makes sense – or is even possible – to buy in external IT resources.
The combination of dynamic requirements with the unequal distribution of resources therefore gives rise to Cloud technology even in cases where the physical hardware remains the property of your company. But the question arises of how much of the possible capacity reduction can actually be brought about in this way.
The mean capacity requirement can easily be calculated by determining the average IT load in respect of CPU, memory, storage and bandwidth. This is the way of calculating the mean capacity requirement. The maximum capacity requirement is calculated by adding the time series of the capacity required by all applications and calculating the absolute maximum of this new time series. In a normal company the variance – the range between maximum, minimum and mean required capacity – is relatively high. This is the inevitable result of the fact that the IT usage of a company with a business model – even if the latter is global – is always subject to certain cycles. Thus, for example, invoicing always takes place at month end, 90% of all transactions are executed in a market etc. This means that the maximum load on the IT and the associated required capacity reserves are also subject to this cycle, in which the required capacity of many systems accumulates at the same time.
In order to derive the maximum benefit from the capacity management it is therefore necessary to bundle the load of completely different business models on one physical IT platform. This can be done either by turning the company itself into a Cloud provider (e.g. Amazon) or by turning over the physical platforms used at least in the medium term to one or more Cloud providers. As the necessary distribution of the required IT resources is relatively large a Cloud provider must have reached a certain minimum size and furthermore be able at this minimum size to influence the mixing ratio of the different business models using the platforms.
Placing one’s own IT under the control of capacity management and aspiring to ensure the availability at any given point of only those IT resources that are actually required at the time is a logical step that only requires the jettisoning of existing infrastructure which is not being used. The question of why this has not long been on the agenda of every business has a simple answer: the predicted massive rises in energy costs have for the first time made the possible financial damage caused by dormant IT resources significant enough to be worth looking at.
It can also be seen that the technical implementation of capacity management presupposes initial organisational steps to prevent the creation of further overcapacities and to separate out the budgets for infrastructure and projects.
If the aim is to create an environment under the control of capacity management, the attainment of the desired flexibility presupposes the development of an architecture that has standard interfaces to allow the coordination of different platforms and providers. Furthermore, from a technical point of view, a combination of virtualisation and management technologies for dynamic resource management – Cloud technology – is required for the implementation of such an undertaking.
In order to derive the maximum benefit, users of different business models will of necessity have to share one common physical infrastructure in order to reduce the variance between mean and maximum resource requirements. At this point it is not enough just to deploy Cloud technology – the use of Cloud providers is also essential.
As a final remark it needs to be said that capacity management, with its associated reductions in energy and investment costs, can and should be a key factor behind corporate decisions to turn to Cloud computing: for it is here that fruits are to be found which can easily be reaped and the experience is to be gained that will be required if further positive effects are to be derived from the use of these technologies.