Blog Moved

Future posts related to technology are directly published to LinkedIn

Wednesday, December 9, 2009

sizing a data center management solution


A typical data center management solution will have a "data store" for storing
a. configuration data
b. monitoring data

On regular intervals it will collect the data and upload it to the central data store.

A host, database, listener, application server instance etc., are the managed entities that will need monitoring and management.

Before implementing the solution, every organization will try to "size" the infrastructure requirements for the solution.

The sizing involves estimating the resource consumption for the solution.
1. disk storage
2. memory
3. cpu cycles required
4. network bandwidth requirements

The Problem:
The complexity involved in sizing a data center management solution is mainly due to lack of clarity of definition.
for e.g., customer want to monitor 10 databases on 10 servers. (one DB on each server)
Depending on the database version and weather it uses ASM for storage or it is on a cold failover cluster and several other considerations the number of managed entities will vary. A database can have only one tablespace and a single datafile or a database can have 10K tablespaces with 100K datafiles. If customer want to monitor the database space usage by tablespace a database with 10K tablespaces will produce 10K times more data when compared to a database with 1 tablespace.

Each metric (the monitored datapoint) can also be collected once in every 5 minutes or once in an hour. The collection frequency may vary based on customer requirements.

A simple formula is number of metrics ( number is at lowest granule) * number of collections per day gives the total number of metric values per managed entity. Multiplying this number with bytes required to store an average metric value gives the bytes required to store the metric values per managed entity.

If customer wants to keep this data at this granularity for a week, then the storage requirement is number of days of retention at raw granularity * bytes required to store one day data. This multiplied by the number of managed entities gives the total storage requirement for this type of managed entitiy (e.g., database)

The same exercise need to be repeated for all types of managed entities to get the storage requirement.

Collecting that many bytes over a day means transferring that data over the network from the host where the managed entity resides to the host where the management data store resides. That gives the network bandwidth requirement.

This data need to be rolled up to a hourly average and daily average for keeping it for historical trending. One need to calculate the space requirements and processing requirements in this rolling up process.

The old data need to be purged away from the data store. It needs processing cycles.

All the managed entities should also have a set of configuration data. That need to be collected, compared on a regular basis and kept up-to-date. One need to compute the resources required to collect, compare and store the original and changed values of different configurations as they evolve.

Managing all the "heartbeats" between the central management solution and the monitored hosts requires a proactive two way mechanism. This needs processing capacity.

Monitoring the websites with synthetic transactions by periodically playing a pre-recorded transaction to make sure the application works require additional set of processing, storage, memory etc., This need to be added to the above estimate.

A provisioning solution that is used to deploy new instances of databases, application servers will need the GOLD IMAGES to be stored somewhere. Patching the enterprise applications needs considerable amount of patch download space and prorogation on regular intervals. This also need to be considered.

The other thing to consider is about the periodic tasks (jobs) that get executed to perform routine operations. Each job may produce few KB of output that need to be collect and stored in the central store for a period of time.

The next thing to consider is the number of users that use the application, number of concurrent users and the amount of data they retrieve from the data store for performing their operations.

Collecting all this information, adding the block header, segment header etc., overheads in the phisical structure of the target solution is a complex task. By the time this exercise is almost complete, the tool vendor would have released the next version of the data center management tool with some modifications!!

The solution:
It is nearly impossible to "size" any application to the last byte accuracy for storage, memory, Network and cpu utilization.

No managed entity should generate more than 25MB of data in the monitoring store. No managed entity should have more then 100kb of configuration data.
So, taking a 250GB storage for 1000 monitored entities and configuring the solution in such a way that it will not exceed this requirement is a wise man's solution.

Considering a 1000 monitored entities will have a maximum 100 administrators and at most 30 of them concurrently loggedin, An average db oltp application with 250GB database and 24 * 4 (considering 4 txn's an hour i.e., 15 min collection frequency) * 1000 (no of entities) would require a 2 processor DB server and a 2 processor Application server with 4GB RAM each.

Starting with this configuration, implementing the solution using an iterative model is the best approach to balance between the up-front sizing vs not impacting the service level of this critical service.


In the current world with virtualization and cloud technology it is easier to build scalable Grid like applications and scaling the solutions horizontally is the trend that is going to stay and further pick-up more momentum. The data center management solutions are not exempt from this trend. For every next 1000 monitored entities we will have to add a node to the DB grid and another node to the AppServer Grid along with additional storage to the storage grid.

Post a Comment