Challenges of the Traditional Approach to Management by Jim Metzler

In my last blog I discussed the importance of IT organizations being able to have end-to-end visibility. By end-to-end visibility I mean the ability of the IT organization to monitor both the application and the network infrastructure from once the user hits ENTER or clicks the mouse button until they receive a response back from the application. In this blog I am going to discuss some of the factors that make it difficult for IT organizations to get that visibility.

Due to factors such as the adoption of virtualization, cloud computing and mobility, the IT environment has changed dramatically in the last few years. Unfortunately, over that same time frame the approach to management used by most IT organizations hasn’t changed anywhere near as much. This is summed up in an old cliché in which two IT professionals are talking about some aspect of management. One of the professionals asks the other “Why do we do it that way?” The second professional responds “Because that is the way we have always done it.”

If you look back at the development of most Network Performance Management Systems (NPMS), these systems had their origins in monitoring the performance of telecommunication carriers to verify that IT organizations were getting the service levels that they paid for. These systems are based on a combination of Simple Network Management Protocol (SNMP) and Internet Control Message Protocol (ICMP, also known as “ping”). Traditional NPMS measured how long it took a packet to travel from the data center to the branch office network and back - thus determining the Round Trip Time (RTT). If the return packet did not arrive within a few seconds, the original packet was deemed lost and this is how packet loss was measured. Over time the limitations of these systems became apparent. Those limitations include the fact that early NPMS systems only measured from the central data center to the edge of the branch office network. Problems inside the branch office network typically went unreported until end users complained.

Over the last few years there has been a great growth in the interest that IT organizations have in managing the performance of applications such as n-tier web based applications. In an n-tier web based application, the user interacts with the presentation tier and the presentation tier in turn communicates to the logic tier, which in turn communicates to the data tier. Each tier uses servers that are optimized to the characteristics of their tier. A presentation tier server, for example, is optimized for network I/O and web traffic; e.g. multiple network cards, large network buffers, etc. A logic tier server is optimized for logic computations; e.g. high speed CPUs, large memory size, etc. A data tier server is optimized for database operations; e.g. multiple disk I/O controllers, large disk cache, large memory size, etc.)

Similar to traditional NPMS, traditional application performance management solutions have limitations. Those limitations include the fact that that traditional application performance management solutions cannot attribute CPU, disk I/O, network I/O nor memory utilization to specific classes of transactions. Only aggregate server performance information is available. This makes it difficult to effectively monitor n-tier applications.

While there is no doubt that early network and application management systems had technical limitations, there is also no doubt that process issues are at least as great a barrier to IT organizations being able to have the type of end-to-end visibility that I mentioned in my last blog. For example, traditional application performance management was typically performed separately from network performance management. When application degradation occurs, the triage process typically assigns the incident to either the network or server areas for resolution. Each area then examines their basic internal measurements of network and server performance and a pronouncement is made that the source of the issue is either the network or the application server or both or neither. Since these tasks are typically done by different parts of the IT organization using different toolsets and management frameworks, it is quite possible that conflicting answers are given for the source of application performance issues. The result is a further lengthening of the time it takes to identify and resolve problems.

In my next few blogs I will come back to the issue of end-to-end visibility and I will continue to look at both the challenges that stand in the way of IT organizations having that visibility as well as what II organizations can do to overcome those challenges.

Related IT Networking Resources
Network and Application Root Cause Analysis
Data Center Project Best Practices #1: Assessment
2012 Application and Service Delivery Handbook Part 1: Executive Summary and Challenges

 
 
Powered By OneLink