Data Center Architecture

Only available on StudyMode
  • Topic: Resource allocation, Resource, Reliability engineering
  • Pages : 14 (3586 words )
  • Download(s) : 31
  • Published : December 9, 2012
Open Document
Text Preview
A Resilient Architecture for Automated Fault Tolerance in Virtualized Data Centers Wai-Leong Yeow, C´ dric Westphal, e and Ulas C. Kozat ¸

DoCoMo USA Labs, 3240 Hillview Ave, Palo Alto, CA 94304, USA e-mail: wlyeow@ieee.org, {cwestphal,kozat}@docomolabs-usa.com Abstract—Virtualization is a key enabler to autonomic management of hosted services in data centers. We show that it can be used to manage reliability of these virtual entities with virtual backups. An architecture is proposed to autonomously manage and allocate the physical resources, ensure reliability guarantees, and manage the pools of virtual backups for failure recovery and resource conservation. It is fault tolerant by design so that component failures do not bring down the entire data center.

I. I NTRODUCTION Virtualization technology has changed the way hosted services are managed on today’s data centers. Significant cost savings are passed to the service providers as resources in the physical infrastructure are more efficiently utilized when pooled together and shared across the hosted servers. More importantly, virtualization is a key enabler to autonomic management of hosted services in a data center [1]. Planned or unplanned maintenance, asynchronous backups and service migration can be achieved easily with virtualization [2]. For more details on network virtualization, see this survey [3]. The many benefits of virtualization can be extended to managing reliability of the hosted services in a virtualized data center. Typically, load balancing between k service replicas with over-provisioning has been a common and straightforward way to provide fault tolerance. However, this is unsuitable for “stateful” services in which a failure will cause discontinuation in a service. Through asynchronous backups of the virtual hosted entities, states of the active services can be saved to backup nodes that are reserved with complete fail-over bandwidth for reliability guarantees. Furthermore, the backup nodes are essentially virtual entities which reside on the pooled infrastructure as well. This can bring about efficient resource utilization while having redundancy for fault tolerance, and can support different levels of reliability guarantees on the same physical infrastructure. In this position paper, we propose a management architecture that can autonomously manage reliability guarantees and resources of virtual entities (hosted services) in a virtualized data center. Under this architecture, additional virtual backup nodes and their associated links are adjusted for any arbitrary level of reliability guarantee. These pools of redundancies over the entire data center are collectively managed so that more physical resources are available to new incoming services, despite having idle, redundant nodes. Furthermore, the architecture is designed to be resilient against faults such that some component failures do not bring down the entire data center.

However, due to space limitations, we provide an overview of the architecture with as much detail as possible. We continue the paper with a description on the redundancy mechanism for supporting fault tolerance at a per-customer level in a virtualized data center. The subsequent section explains the fault-resilient control architecture that manages these redundancies in an automated manner. We then conclude this paper in Section IV. II. FAULT T OLERANCE FOR A V IRTUALIZED DATA C ENTER This section explains how to manage reliability. In particular, it explains the redundancy mechanism for supporting fault tolerance to customers, and gives an overview of the management of resources used for primary requests as well as additional redundancies. We begin by describing a general model for a resource request. A. Resource Request Model We assume a virtualized data center that leases its physical resources, e.g. Amazon EC2 and other cloud service providers [4], [5]. Rather than leasing independent server instances, we consider...
tracking img