Learn-as-you-go with Megh: Efficient Live Migration of Virtual Machines

No Thumbnail Available
Date
2017-04-04
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Cloud providers leverage live migration of virtual machines to reduce energy consumption and allocate resources efficiently in data centers. Each migration decision depends on three questions: when to move a virtual machine, which virtual machine to move and where to move it? Dynamic, uncertain and heterogeneous workloads running on virtual machines make such decisions difficult. Knowledge-based and heuristics-based algorithms are commonly used to tackle this problem. Knowledgebased algorithms, such as MaxWeight scheduling algorithms, are dependent on the specifics and the dynamics of the targeted Cloud architectures and applications. Heuristics-based algorithms, such as MMT algorithms, suffer from high variance and poor convergence because of their greedy approach. We propose a reinforcement learning approach. This approach does not require prior knowledge. It learns the dynamics of the workload as-itgoes. We formulate the problem of energy- and performance efficient resource management during live migration as a Markov decision process. While several learning algorithms are proposed to solve this problem, these algorithms remain confined to the academic realm as they face the curse of dimensionality. They are either not scalable in real-time, as it is the case of MadVM, or need an elaborate offline training, as it is the case of Q-learning. We propose an actor-critic algorithm, Megh, to overcome these deficiencies. Megh uses a novel dimensionality reduction scheme to project the combinatorially explosive state-action space to a polynomial dimensional space with a sparse basis. Megh has the capacity to learn uncertain dynamics and the ability to work in real-time. Megh is both scalable and robust. We implement Megh using the CloudSim toolkit and empirically evaluate its performance with the PlanetLab and the Google Cluster workloads. Experiments validate that Megh is more costeffective, incurs smaller execution overhead and is more scalable than MadVM and MMT. We explicate our choice of parameters through a sensitivity analysis.
Description
Keywords
Citation