A Control Theory Foundation for Self-Managing Computing Systems Yixin Diao, Member, IEEE, Joseph L. Hellerstein, Senior Member, IEEE, Sujay Parekh, Student Member, IEEE, Rean Grifﬁth, Gail E. Kaiser, Senior Member, IEEE, and Dan Phung
Abstract—The high cost of operating large computing installations has motivated a broad interest in reducing the need for human intervention by making systems self-managing. This paper explores the extent to which control theory can provide an architectural and analytic foundation for building self-managing systems. Control theory provides a rich set of methodologies for building automated self-diagnosis and self-repairing systems with properties such as stability, short settling times, and accurate regulation. However, there are challenges in applying control theory to computing systems, such as developing effective resource models, handling sensor delays, and addressing lead times in effector actions. We propose a deployable testbed for autonomic computing (DTAC) that we believe will reduce the barriers to addressing research problems in applying control theory to computing systems. The initial DTAC architecture is described along with several problems that it can be used to investigate. Index Terms—Actuator, closed loop control, dynamics, resource management, sensor, testbed.
Fig. 1. Architecture for autonomic computing.
HE HIGH COST of ownership of computing systems has resulted in a number of industry initiatives to reduce the burden of operations and management. Examples include IBM’s Autonomic Computing, HP’s Adaptive Infrastructure, and Microsoft’s Dynamic Systems Initiative. All of these efforts seek to reduce operations costs by increased automation, ideally to have systems be self-managing without any human intervention (since operator error has been identiﬁed as a major source of system failures ). While the concept of automated operations has existed for two decades (e.g., ) as a way to adapt to changing workloads, failures, and (more recently) attacks, the scope of automation remains limited. We believe this is in part due to the absence of a fundamental understanding of how automated actions affect system behavior, especially, system stability. Other disciplines such as mechanical, electrical, and aeronautical engineering make use of control theory to design feedback systems. This paper uses control theory as a way to identify a number of requirements for and challenges in building self-managing systems.
Manuscript received June 30, 2005; revised July 20, 2005. The work of the Programming Systems Laboratory is supported in part by the National Science Foundation under Grant CNS-0426623, Grant CCR-0203876, and Grant EIA0202063, and in part by Microsoft Research. Y. Diao, J. L. Hellerstein, and S. Parekh are with the IBM Thomas J. Watson Research Center, Hawthorne, NY 10532 USA (e-mail: email@example.com; firstname.lastname@example.org; email@example.com). R. Grifﬁth, G. E. Kaiser, and D. Phung are with the Computer Science Department, Columbia University, New York, NY 10027-7003 USA (e-mail: firstname.lastname@example.org; email@example.com; firstname.lastname@example.org).
The IBM autonomic computing architecture  provides a framework in which to build self-managing systems. We use this architecture since it is broadly consistent with other approaches that have been developed (e.g., ). Fig. 1 depicts the components and key interactions for a single autonomic manager and a single resource. The resource (sometimes called a managed resource) is what is being made more self-managing. This could be a single system...