| 白皮书 |


通过最终用户性能管理实现数据中的的虚拟化和整合化,从而最大程度提高 ROI

随着网络、应用程序和服务的日益复杂化、用户移动性的日益提高以及企业要求在不影响性能的前提下降低成本,因此对企业而言,数据中心的整合就显得日益迫切。然而,要实现这些好处,企业需要严格按照性能指标来管理整合后的架构。传统的性能管理工具需要多个平台,而且缺乏管理整合环境所必需的范围、视角和时间。Additional complexities arise when applications are virtualised, making it difficult to measure performance from the enduser perspective.

高效持续地管理整合的数据中心是开始获得项目投资回报的关键。Organisations need a solution which can manage performance from the perspectives of all stakeholders – business units, IT and end users – to ensure the consolidation project delivers the required ROI.

The drivers of data center consolidation

Data centers are increasingly transforming from a traditional, distributed infrastructure to a consolidated, service-oriented structure. The business case is compelling. Network infrastructures, applications and services are becoming more and more complex, while users are increasingly mobile and demanding, expecting network performance that will enhance their productivity in any task and whatever their location.

At the same time the challenging economic climate is driving organisations to reduce capital and operating costs without sacrificing quality. Engineers need to get the most from every switch, router, server and hypervisor. Data center consolidation enables organisations to implement more advanced protocols and management strategies that maximise bandwidth utilisation and performance of both network and applications. It also creates the opportunity to implement application virtualisation – separating applications from physical servers – which offers further benefits.

Other benefits of data center consolidation include:
  • Improved security through reducing the number of sites and assets that have to bemanaged and laying the foundation for more sophisticated risk mitigation strategies
  • improved compliance through promoting automation and encouraging the implementation of a comprehensive auditing capability
  • reduced hardware and software requirements and reduced power consumption, facility and transport requirements, reducing capital and operating costs and the organisation’s carbon footprint
To achieve these objectives, servers are being virtualized and 40 Gigabit links installed to support the consolidation of bandwidth hungry applications, which drive up the cost of network interfaces and put greater demands on the cabling infrastructure.

Planning consolidation to maximise benefits

Despite the potential benefits of data center consolidation, organisations need appropriate planning and evaluation if they are to achieve the potential return on investment (ROI). Changes must be made seamlessly, with minimal downtime to production business applications, and the resulting consolidated data center must deliver increased performance to justify time and capital required to implement the project.

数据中心每分钟停机的平均损失  5,600 美元
平均报告停机时间 90 分钟
Average cost of incident  505,000 美元
所有数据中心停机损失,平均恢复时间 134 分钟 大约 680,000 美元
For a partial data center outage, averaging 59 minutes, costing approx  258,000 美元

来源:Ponemon Institute 20111.

To achieve the full benefits of consolidation and avoid expensive downtime, organisations need to follow a clear process:

  • Obtain an in-depth understanding of their existing network, applications and services and benchmarking performance
  • Set metrics for the desired performance of the consolidated data center
  • Plan the transition
  • Implement the transition to the new operating environment with minimum downtime
  • Monitor and manage the updated architecture to ensure it achieves the required metrics

According to a survey from analysts Forrester Research, consolidation projects typically take 18-24 months to complete2. During this time, organisations need to dedicate resources and budget to provide staff with the hardware and software to assess the existing operating environment, plan the migration, bring the new architecture online and manage performance.

Clear benchmarks are vital. Without metrics for pre- and post-consolidation performance, organisations cannot measure the ROI. These need to look at the impact on all stakeholders – business unit owners, IT and operations staff, corporate management and end users. If applications are virtualised, measuring performance from an end user perspective becomes more difficult.

So what are the key areas to benchmark? When Forrester asked 147 US enterprises that had completed or were actively implementing a data center consolidation project for the top five metrics they were using to measure success, 52% cited operational cost, followed closely by total cost of ownership (44%), percentage of budget saved (38%), application performance versus infrastructure cost (35%) and performance per CPU core (34%).

To achieve – and demonstrate the achievement of – the ROI for a consolidation project, organisations need to address three areas: reporting, performance management and personnel. In the rest of this paper we will focus on two of these – reporting and performance management, which are closely interlinked.

Implementation challenges

1. 报告
With data center consolidation, resources that were previously distributed across the enterprise are gathered into a common pool. As a result, business units that once managed and maintained their own networks, applications and services have to relinquish control to a central team.

The business units now in effect become internal customers of the consolidated data center. To continue supporting it, they need to be assured that their critical applications are performing at or above the same levels as when they were controlled by the business unit. This means establishing internally facing service level agreements between the data center and the business units. Metrics such as application availability and end user response time for transactions between the desktop and the data center should be compiled, tracked and reported regularly to provide the evidence necessary to keep business unit owners on board.

Why business units care about network performance

  • Is the POS system performance retaining customers or losing them? Forty per cent of customers will abandon a website after one or two bad experiences.
  • In a dealing room in New York, London or Hong Kong, a 1ms latency of network delay can cause a $1 million difference in each transaction.
Service level metrics are also required for usage and billing. Business unit owners will naturally only wish to pay for the resources they actually use, rather than subsidising other business units by paying an evenly divided share of data center costs. Reporting should therefore include usage assessment and the corresponding chargeback for all networks, applications and services consumed by each business unit.

2. 性能管理
While data centre consolidation and the application virtualisation that often accompanies it may streamline enterprise architecture, they introduce management complexity. As more services are virtualised, it becomes increasingly difficult to provide a single view of application usage from data center to desktop, because a single physical server can power multiple machines. With database servers, application servers, email servers, print servers and file servers all potentially sharing the same piece of hardware, tracking network, application and service performance becomes much more difficult.

Finding the right management tool (or tools) is another challenge. Most legacy performance management tools operate best in a silo as they focus on a specific application, service, geographical or logical slice of the network. This approach may be acceptable in a distributed architecture – although problems can hide between an NMS without comprehensive information and complex packet capture tools – but causes problems in a consolidated data center, where the number of silos will grow with the addition of application virtualisation management tools which have not yet been integrated with the legacy performance management tools.

In this situation, network engineers have to rely on a set of disparate tools, each with its own unique capabilities and user interface. They have to use their collective experience and expertise to manually correlate information in order to identify, isolate and resolve problems.

In the best case scenario, performance management is carried out in a similar manner to the distributed environment, bypassing the opportunity to capitalise on collocated information and personnel. In the worst case, it results in finger pointing between the operations and IT teams and lowers the efficiency of anomaly resolution, causing problems for both end users and management.

To address these issues, and unlock the full potential of consolidation, organisations need to find a better way of managing performance and reporting.

Consolidating performance management

A consolidated performance management solution will provide information on all aspects of the network to all parties. This will assist in effective problem resolution without fingerpointing, as well as providing the data to calculate reporting and management metrics such as SLA performance, usage and billing.

However, performance management is the most difficult step in the consolidation process. Legacy performance management tools were designed for a distributed environment and cannot handle the complexities of a consolidated and virtualised architecture. Tools such as application flow monitoring, transactional views, packet analysis, SNMP polling and stream to disc (S2D) archiving require multiple platforms and thus potentially mitigate the advantages of consolidation.

Businesses need an end-to-end solution with the scalability, breadth and depth to acquire, integrate, present and retain information that truly reflects the performance of the networks, applications and services from the business unit, IT and most importantly end-user perspective. To be effective, it needs to three critical characteristics: scope, perspective and timing.

Traditional performance management tools fall into in two categories. Some take a high level approach and skim the surface in data gathering and assessment. They generate dashboards that can be shared with senior management to track overall performance, but do not give visibility into specific areas or assist with problem-solving. The alternatives take a much narrower, deep-dive approach, focusing on a specific segment of the network and capturing packets, examining individual transactions and delivering detailed, real-time analytics.

Ideally, IT teams need a combination of the two approaches. Flow, transactional and SNMP data enables them to examine the overall experience, while packet analysis and S2D capabilities assist in troubleshooting and compliance. They need both the breadth and depth of analysis, but without the manual effort and time associated with point products.

Legacy performance management tools are limited by both the information they provide and the way they present that information.

Network and application viewpoints help to identify the root cause of a problem and resolve it but are not always sufficient, particularly in a consolidated data center where business unit owners require service level metrics.

For example, when an internal or external customer reports unacceptably slow application response times, the best way to confirm the situation and diagnose the problem is for the network engineer to view the network from the user’s perspective. This only becomes possible if the performance management solution has the breadth and depth of analysis discussed above.

In an ideal world, when performance problems arise root case is identified quickly and the situation rapidly resolved. However, this becomes more difficult in the complexity of a consolidated data center, particularly if performance degrades slowly over time or problems are intermittent.

The network engineer needs to gather granular performance information from all data sources across the entire network over an extended time period and present the information from the end user’s perspective. This enables operations and IT staff to carry out realtime analysis and to go back in time to discrete points in order to assess and correlate environments associated with intermittent error reports. It also supports the development of short, medium and long term performance baselines, enabling deviations to be identified and addressed as early as possible.

An end-to-end performance management solution should address all three of these issues. It needs to collect, aggregate, correlate and mediate all data, including flow, SNMP data and information gathered from other devices, with granularity up to one millisecond. This data should be displayed through a single user configurable dashboard. This will enable performance to be measured, issues identified and resolved quickly, and provide the visibility needed to support network optimisation. By implementing an appropriate performance management solution prior to data center consolidation, the IT team can ensure that performance is at a minimum maintained and ideally improved following the consolidation project.

Virtualization adds performance management complexity

The additional layer of abstraction inherent to application virtualisation makes performance management more difficult because there is usually less physical evidence available than in the traditional environment in which servers and applications are tightly coupled.

Migration to a virtual network infrastructure requires network engineers to adopt new configuration and monitoring methodologies, as there are fewer physical switches and routers to support. There is also an on-going debate on whether virtualization makes the system more or less secure. 如果虚拟环境中的一个系统受到破坏,是否会赋予访问所有其他系统的权限?此外,一个硬件平台的物理流量增加会对整个布线结构产生更大的影响。This infrastructure should be tested and fully certified before rolling out virtualised services.

Visibility and security within the virtual environment are huge concerns. Before, during, and after migration, it is critical to use SNMP, Net Flow, and virtual taps to monitor the health, connectivity, and usage of these now-virtual systems. 某些平台中,服务器会自动移动并迁移到使用效率更高的硬件资源。因此,应当密切监控服务器清单和位置。

Using a virtual tap or traffic mirroring port, application traffic should be monitored and analysed for server response time and irregular behaviour. 从某种程度上说,与 IT 组织中的其他系统相比,虚拟化属于新生事物,正是出于这一点,出现问题时经常会归咎于它。为此,应部署全天候监控工具,该工具可快速将问题限定于物理网络或虚拟环境。

There are specific challenges in managing the user experience when accessing applications held in a virtualised Citrix environment. Due to the added complexity of this environment, the entire transaction, from the user through multiple tiers of application structure, must be monitored, baselined and managed. Understanding this requires an understanding of how Citrix changes the application architecture.

As a user enters a VDI (virtual desktop infrastructure) session, they engage with the Citrix XenDesktop/XenApp server or servers, which host virtual sessions with configured access to specific services as defined and configured by the administrator. These rights often rely on outside interaction with Active Directory, a separate transaction where the Citrix access gateway (through the advanced access control web server) is now the client in a separate request. From there, the user gains access to a session with their Citrix solution. This session is largely ‘screen scrapes’, or the emulation layer of traffic.

Within the payload, there is additional insight into how the end user is interacting with their virtual desktop. Those interactions generate additional transactions in the subsequent application tiers with the Citrix server, acting as client to a more standard n-tier application interaction within the established service architecture.

As the user transactions are handed off, end-to-end correlation of transactions can be difficult, due to the proxied nature of the application architecture. Information indicating the user and user actions is contained in ICA traffic to the Citrix XenApp servers, but it is nested within the payload. Once Citrix generates sessions to back-end application infrastructures, the only real way to correlate is by time and the applications that were accessed.

This means that when the user calls the Helpdesk saying that they believe the network to be down, they may actually be experiencing delays with an application hosted through Citrix. It can take an engineer up to an hour to find out what is actually happening. The only way to understand what may be impacting the end user experience is to implement performance monitoring of these applications from the perspective of the network. In a consolidated environment, this is the transport at the back end between the end user and the data center.

Solutions such as those from VMware provide tools to monitor the virtualized environmentand servers but not the end user experience and the network. In contrast, NETSCOUT hasdeveloped solutions which measure from the end user into the data centre, enabling usersto understand what is happening on the network from the end user perspective and henceidentify and resolve issues more quickly.

These solutions have the ability to provide visibility into this front tier performance, as aggregated by site with per user comparisons, as well as views into the interactions the user had within their session, and per published application performance metrics. This is then correlated with transactions that are generated by the Citrix environment to the standard n-tier application architectures. It both saves time when troubleshooting issues and helps network engineers to become proactive in managing the performance of this application delivery scenario.

1Understanding the Cost of Data Center Downtime - Emerson Network Power & Ponemon Institute

2Cost analysis and measurement help ensure consolidation success - Forrester Research, January 2009.

Powered By OneLink