Advanced
An Adaptive Virtual Machine Location Selection Mechanism in Distributed Cloud
An Adaptive Virtual Machine Location Selection Mechanism in Distributed Cloud
KSII Transactions on Internet and Information Systems (TIIS). 2015. Dec, 9(12): 4776-4798
Copyright © 2015, Korean Society For Internet Information
  • Received : July 07, 2015
  • Accepted : October 18, 2015
  • Published : December 31, 2015
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
About the Authors
Shukun Liu
School of Information Science and Engineering, Central South University, 410083 Changsha, China
Weijia Jia
Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200240 Shanghai, China

Abstract
The location selection of virtual machines in distributed cloud is difficult because of the physical resource distribution, allocation of multi-dimensional resources, and resource unit cost. In this study, we propose a multi-object virtual machine location selection algorithm (MOVMLSA) based on group information, doubly linked list structure and genetic algorithm. On the basis of the collaboration of multi-dimensional resources, a fitness function is designed using fuzzy logic control parameters, which can be used to optimize search space solutions. In the location selection process, an orderly information code based on group and resource information can be generated by adopting the memory mechanism of biological immune systems. This approach, along with the dominant elite strategy, enables the updating of the population. The tournament selection method is used to optimize the operator mechanisms of the single-point crossover and X-point mutation during the population selection. Such a method can be used to obtain an optimal solution for the rapid location selection of virtual machines. Experimental results show that the proposed algorithm is effective in reducing the number of used physical machines and in improving the resource utilization of physical machines. The algorithm improves the utilization degree of multi-dimensional resource synergy and reduces the comprehensive unit cost of resources.
Keywords
1. Introduction
W ith the growth of network application services, information technology architecture and a variety of resources should be effectively integrated to manage physical resources effectively, improve the utilization rate of resources, and reduce resource unit costs [1] . Virtual machine technology is a key to virtualization, and it is widely applied to distributed cloud [2] . With the rapid popularity of cloud computing, the unlimited use of limited resources can be achieved by users in the future. Under this condition, users can obtain the physical resources they actually require, similar to how people purchase fuel or natural gas for their daily routines. However, users need to select the most appropriate interface to obtain such resources; otherwise, a considerable amount of resources will be wasted. The selection of the proper purchase window has thus become a key and fundamental problem that necessitates urgent solutions. Nowadays, the resources needed by cloud users are mainly embodied in the form of a virtual machine. To make a virtual machine perform efficiently, the host resource is mapped to the application layer, and the resource scheduling process is encapsulated in the search process of the virtual machine [3 , 4] . Thus, the key problem in the resource allocation process is the rapid and proper selection of a virtual machine for the corresponding physical nodes under the premise of satisfying all service-level targets for different applications [5] .
Physical resource utilization and user satisfaction can be greatly improved with an adaptive selection mechanism that allows users to automatically select a virtual machine according to comprehensive factors, which depend on a physical machine, during the allocation of virtual machine resources. Therefore, as the key issue in virtual machine deployment, the location selection of virtual machines needs to be solved urgently.
In this study, we first formulate our optimization problem as a bin packing based on multi-dimensional resource utility to determine the optimal location selection of virtual machines to physical machines, with consideration of the requirements for dependability. This method is different from the traditional way. Second, this study proposes a new population updating method based on immune memory and on a new multi-object genetic algorithm for the location selection of virtual machines based on the structure of a doubly linked list. Finally, during the coding process of virtual and physical machines, we used a mechanism of information grouping mode, through which the similar scale virtual machines can be allocated to the proper locations in shorter time compared with the traditional way.
The rest of this paper is organized as follows. Section 2 surveys related works. Section 3 states the problem that we will address and presents related definitions. Section 4 describes the detail of gene encoding and evaluation function design. Section 5 describes the MOVMLSA algorithm. Section 6 presents the experimental evaluation and analysis. Finally, Section 7 concludes this paper and suggests a future research direction.
2. Related Work
The mapping problem of virtual machines to physical nodes can be regarded as a multi-dimensional vector packing problem [6 8] and as a NP-hard problem. At present, this problem is mainly solved with a heuristic algorithm [8 , 9] . However, the current research on the location selection of virtual machines in a cloud platform is mainly aimed at the optimization of only one particular dimension. For example, existing research only aims to guarantee service-level objectives, minimize the number of physical nodes, and reduce virtual machine migration and energy consumption [10] . The optimization goals in some cases are inconsistent and contradicting. For example, a virtual machine is placed on a minimal number of physical nodes to reduce the number of used nodes; in this way, idle nodes can be saved, and energy consumption and management cost can be reduced [11] . Nevertheless, extensive virtual machine migration occurs. If the goal is to reduce such migration, the number of used physical nodes is likely to increase.
In [12 , 13] , a genetic algorithm was used to tackle the problem of the static placement of virtual machines without consideration of the overhead costs involved in the virtual machine migration. In [14] , node virtualization integration was described as a packing and random optimization problem in the cloud data center, but the considerations focused only on processor resources and not on other dimensions, such as memory and input/output (I/O). In [15 , 16] , the authors proposed a management framework for virtual machine placement in cloud computing, but they failed to consider resource costs and system energy consumption. In [17 , 18] , the authors proposed a scheduling strategy for virtual machines based on a genetic algorithm, the historical data of the cloud computing system, and the current state of the system [19] . This method achieves an ideal load balance and incurs minimal overhead for virtual machine migration; however, it ignores resource utilization and energy consumption in the data center. In [13 , 15] , the problem of the location selection of virtual machines was divided into a multi-objective optimization problem and a bin packing problem but without the consideration of the cost of virtual machine migration; the study focused only on the static placement of virtual machines and disregarded dynamic deployment based on virtual machine migration. In [18 , 20] , the problem was considered as a combinatorial optimization problem based on bin packing; similarly, the authors only considered the static placement of virtual machines and disregarded dynamic placement based on virtual machine migration.
Most optimization methods for virtual machine placement are implemented in several phases to solve a single-objective optimization problem. Multiple targets are rarely optimized simultaneously. Consequently, only a local rather than a global optimization solution is obtained. In sum, extensive research results have been obtained in the cloud placement of virtual machines, but several serious problems, including those enumerated below, have yet to be solved.
(1) Most studies determine data centers by selecting the required physical nodes and not the most appropriate nodes from the cloud data center. A complete strategy for the location selection of virtual machines should be divided into two levels, namely, data center and physical machine.
(2) Many of the existing studies on the virtual machine placement strategy are mainly based on a single dimension of target optimization under certain rules. Consequently, the generated optimal placement method may be based on only one certain condition. An efficient strategy for the location selection of virtual machines should consider the dependencies between dimensional constraints and comprehensive balance.
(3) Many new challenges need to be addressed to dynamically allocate and manage the shared physical and virtual resources of data centers. For example, an adaptive framework for the location selection of effective virtual machines is lacking. In location selection, the cost of resources, system performance, energy consumption, and other factors must be considered. An efficient algorithm for the location selection of virtual machines must also be designed such that it is adaptable to different user goals and business needs.
3. Problem Statement and Related Definitions
The total amount of resources utilized by users in a cloud computing platform is equivalent to the particular resources of a virtual machine. Each user application runs in its own, independent virtual machine. The efficient use of cloud computing resources and the reduction in the cost of such resources are key academic research issues.
- 3.1 Problem Statement
The multi-dimensional collaborative problem of the location selection of virtual machines can be considered a multi-objective combinatorial optimization problem. The available resources of each physical machine, such as the central processing unit (CPU), memory, disk, and I/O devices, can be used as multi-dimensional vectors. Every dimension is a physical resource for physical machines. Each resource required by a virtual machine corresponds to a multi-dimensional vector. The goal of the location selection of virtual machines is to place different virtual machines to multiple physical nodes according to the different needs of multiple users. The location selection process must be based on an effective adaptive framework. The problem of the location selection of virtual machines, which is based on multiple targets, can be described as follows.
In order to describe the problem and definitions conveniently and clearly, we designed two notation tables in which the basic meanings of the main symbols are defined accordingly.
Notation table (1)
PPT Slide
Lager Image
Notation table (1)
Notation table (2)
PPT Slide
Lager Image
Notation table (2)
A cloud platform comprises a number of data centers. In this study, we suppose that H data centers exist in a cloud platform and denote all resources in the data centers as
PPT Slide
Lager Image
One data center comprises n physical nodes. R = { r 1 , r 2 , r 3 ...... ri ......, rn } denotes the total resources of one data center.
PPT Slide
Lager Image
denotes a set of q different dimensions resource of mi . During an entire service period, which is denoted as L , the useful service time obeys the Poisson distribution [21] , and all the resources are independent of one another, i.e.,
PPT Slide
Lager Image
In the resource set of a data center, which is denoted as R , the useful service vector is denoted as S = { s 1 , s 2 ,......, sn }, and the useful service rate of ri is denoted as si (0 ≤ i n ).
A virtual machine set is denoted as V = { v 1 , v 2 , v 3 ,...... vm }, which is placed on physical nodes. The set comprises m virtual machines. In this study, we suppose that vj only has one task tj jtat any time. The kth task, which is denoted as
PPT Slide
Lager Image
includes q different attributes. The task generation rate of a virtual machine can be described as W = { w 1 , w 2 , w 3 , w 4 ,... wj ..., wm }, 0≤ j m where wj denotes the task production rate of the jth virtual machine. All independent tasks satisfy
PPT Slide
Lager Image
If ri can satisfy all the requirements of tj , then tj can be assigned to ri , and all operations can be run. In some cases, many ri can satisfy the requirements of tj . To achieve an effective balance among all resources, tj is allocated to ri with a probability of Pij . The probability distribution matrix of data resources is denoted as Pn×m = ( Pij ) n×m , with the condition that Pij ≥ 0 and
PPT Slide
Lager Image
The cloud resource scheduling matrix is denoted as Sn×m = ( Sij ) n×m . If vj is assigned to mi , then Sij = 1 ; otherwise, Sij = 0. During all the allocation processes, the following conditions must be fulfilled:
PPT Slide
Lager Image
In this work, we discuss the proper placement of virtual machines according to the usage information of multi-dimensional vectors of such virtual machines in the location selection process. We evaluate the resulting performance by referring to Microsoft’s 2008 report on virtual machine management techniques in the evaluation standards for physical server resources, i.e., according to the CPU, memory, network bandwidth, and disk I/O [15] . The problem is described as follows:
(1) In n physical machine nodes ( M ={ m 1 , m 2 , m 3 ,......, mn }), n refers to the number of physical nodes in the physical cluster. The main resources include memory, CPU, disk, bandwidth, and I/O.
(2) In m virtual machine nodes ( V ={ v 1 , v 2 , v 3 ,......, vm }), m refers to the number of virtual machines. The main resource requirements of a virtual machine include memory, CPU, disk, bandwidth, and I/O.
(3) A mapping between virtual and physical machines must be established to satisfy all the requirements of virtual machines and to reduce the physical nodes assigned to these virtual machines. During the mapping process, the sum of the assigned.
- 3.2 Related Definitions
Definition 1: The largest resource vector of mi can supply
PPT Slide
Lager Image
denotes the maximum number of dimension resource supplied by mi , 1 ≤ i n . The remaining resource matrix of the physical machine node set can be defined as follows:
PPT Slide
Lager Image
Definition 2: Given an adequate amount of physical resources, a virtual machine runs appropriately. The required amount of resources of vj can be denoted as
PPT Slide
Lager Image
denotes the required number of qth dimension resource, 1 ≤ j m . The required resource matrix of a virtual machine is denoted as follows:
PPT Slide
Lager Image
Definition 3: Resource cost
The required amount of resources is billed. The billed vector can be marked with
PPT Slide
Lager Image
denotes the cost of obtainingthq dimension resource from mi .
Definition 4: Mapping matrix of vj to mi
PPT Slide
Lager Image
Here, 1 ≤ i n , 1 ≤ j m , and gij ∈ {0,1} . gij denotes whether vj is placed on mi . gij = 1 indicates that Vj is placed on the mi ; otherwise, gij = 0.
Definition 5: Let xi ∈ {0,1}. xi = 1 indicates that a virtual machine migrates from one physical node to another physical node. Otherwise, the virtual machine does not migrate. We suppose that S denotes the number of migrations undertaken by a virtual machine and that
PPT Slide
Lager Image
xi ∈ {0,1}. According to this definition, the overhead of placing a virtual machine on a physical node can be denoted as Y , where Y=F T × G × Ci .
Definition 6:
PPT Slide
Lager Image
pi = 1 indicates that at least one virtual machine is assigned to a physical node. Otherwise, no virtual machine is assigned to a physical node.
Definition 7:
PPT Slide
Lager Image
denotes the balanced load variance of all the physical machines. Di denotes the ith dimension variance, and q denotes the number of dimensions.
PPT Slide
Lager Image
where n denotes the number of physical nodes.
PPT Slide
Lager Image
denotes the average value of the performance of the ith dimension for all physical nodes. pij is the ith performance value of the mi . All the performance values are normalized. In this study, the optimization goal for virtual machine placement is as follows:
PPT Slide
Lager Image
Constrained conditions:
At any time, the total amount of all types of resources required by all virtual machines cannot exceed the total amount of resources of all physical machines. The amount of resources required for each virtual machine should also not exceed that required for each physical machine. The concrete constrained conditions can be formalized as follows:
PPT Slide
Lager Image
Ti,q denotes the threshold of the resource q for mi , and
PPT Slide
Lager Image
denotes the useful ratio of the resource q in vj , which is assigned to mi . The inequality (e) above indicates that the qth dimension resource (i.e. CPU, memory, disk space, bandwidth, and I/O) of the virtual machine that are assigned to the physical machine mi cannot exceed those of the physical machine mi . If a user application can be placed on X virtual machines, then the performance factors of the virtual machines supplied to the application P must satisfy service-level agreements (SLAs).
4. Chromosome Gene Encoding and Evaluation Function Design
In this part, we will introduce the process of chromosome gene encoding and evaluation function design. Using the GA (Genetic Algorithm) technology, we can explain the coding information of virtual machine and physical machine clearly and easily. In addition, evaluation function can be expressed combined with the coding information.
- 4.1 Encoding
Encoding is generally the first factor that can affect the search efficiency of evolutionary algorithms. Encoding can reflect the mapping relations of solutions to chromosomes [16] . For the location selection of virtual machines, the solution coding of the physical nodes to which virtual machines are assigned can be considered a chromosome. The assigned virtual machine is the gene value. The advantage of this coding is that the number of chromosome genes is determined by the total number of physical servers [22] . Therefore, this situation cannot reduce computing speed because of the emergence of extensive coding. During the subsequent crossover and mutation operations, the hardware constraints among all servers remain the same [22] . In the present study, the problem of the location selection of virtual machines can be regarded as a bin packing problem. Three chromosome encoding mechanisms are available: coding based on cases, coding based on items, and coding based on groups [3] . Coding based on cases and coding based on groups focus on individual items. The goal function of bin packing depends on item groups. In this study, the encoding mechanism is mainly based on item groups. According to the combined redundancies of group encoding [23] , a doubly linked list coding method is proposed.
M virtual machines are assigned to N physical nodes. M is generally larger than N . A random virtual sequence that includes M virtual machines is generated. For chromosome encoding, a priority heuristic algorithm is employed to place a random virtual machine sequence on physical nodes. This priority heuristic algorithm selects a physical node from used physical nodes. If the selected node can satisfy the five resource requirements (CPU, memory, network bandwidth, disk, and I/O) of the first virtual machine, then the virtual machine can be placed on the physical node. Otherwise, the subsequent physical nodes are selected until the physical node that satisfies such requirements is found (let the resources of N physical machines satisfy the resource requirements of M virtual machines). If no proper physical node from the used physical nodes can satisfy the resource requirement, then the first new physical node that has not been used is selected for the placement of the virtual machine (when the algorithm is implemented initially, all physical machine nodes are not used). According to the description above, all virtual machines can be placed on physical nodes. The concrete algorithm is described as follows:
PPT Slide
Lager Image
Priority heuristic algorithm
The encoding process of assigning M virtual machines to N physical nodes, which is based on group information and a doubly linked list, is shown in the example. Let us assume the availability of M virtual machines, which need to be allocated at a certain period, as well as a sequence number of 1,2,3,......, m-1,m . The sequence number is random. Virtual machines are assigned to physical nodes randomly according to the priori algorithm, and the initial code can then be generated. The initial code is optimized according to the doubly linked list packing method.
PPT Slide
Lager Image
Encoding process
- 4.2 Evaluation Function
The optimal goals are a minimal number of used physical machines, low unit price costs of resources for users, a high performance of an application system achieved with a balanced load, and a low useful ratio of physical nodes. An evaluation function can be used to evaluate individuals. The first parameter is a key factor to evaluate the chromosome load performance of used physical nodes. A small load variance denotes the good load performance of used physical nodes. The second parameter is the lowest possible resource cost for users. The third parameter is used to evaluate the energy consumption level in a chromosome using the number of physical nodes. A minimal number of used physical nodes denote insignificant energy consumption. The fourth parameter is used to evaluate the physical machine utilization of a specific resource type. In this study, the fitness function is defined as follows:
PPT Slide
Lager Image
- 4.3 Evolution Operators
For the problem of assigning a virtual machine to physical machine, a new coding method based on group information and doubly linked list nodes is used.
- 4.3.1 Crossover operator
The virtual machine code contains the precursor and subsequent nodes. A chromosome gene is expressed in different groups and in an internal bidirectional chain structure. Considering that the arrival of the virtual machine sequence is random, the sequence length is not fixed. The number of virtual machines that each physical machine can accommodate also differs. The variable length of the chromosomes referred by the crossover operator is not fixed as well.
The genetic algorithm involves physical machine coding and virtual machine group coding. Based on the proposed coding method, a single-point crossover method with the longest doubly linked list is put forward in this study. The crossover process in the genetic algorithm mainly allows the offspring to inherit excellent genes from parent generation. The crossover process is divided into two parts. One part is the process based on cross-group coding to minimize the number of used physical machines. The other part is the process based on cross-resource coding to maximize the utility of physical machine resources. In the first part (as shown in the Fig. 2 below), the main steps of the single-point crossover optimization method for sub-individual coding optimization based on the largest chain length (for the optimization of individual species) are as follows:
PPT Slide
Lager Image
Single-point crossover process
Step a) Two parent nodes are selected randomly. The longest group in parent individual A is selected and replaced with the corresponding group of parent individual B.
Step b) After the crossing operation for parent individual B, the corresponding physical nodes that comprise repeated virtual machines are removed from the collection of used physical nodes and are then added to the unused physical node set; in this case, the virtual machines are deleted [16] .
Step c) The unallocated virtual machine series is generated on the basis of the retained virtual machine chain structure of parent individual B.
Step d) The priority heuristic algorithm is used to generate new child entities from the unallocated virtual machine sequence.
Step e) A group is randomly selected from parent individual B and is then used to replace the corresponding group in parent individual A. Individuals B are generated according to Steps a)–d).
For example, 12 virtual machines and 4 physical nodes are simulated. The process of the single-point crossover method [16] is shown in Fig. 2 .
- 4.3.2 Mutation operator
Two scenarios emerge in chromosome variation. One is based on the group encoding variations of a doubly linked list, i.e., deleting a virtual machine in the parent node list randomly. The other scenario refers to resource coding variants, in which we can delete a doubly linked list node of a virtual machine. After such deletion, the precursor and subsequent nodes change simultaneously. The occurrence of these two types of mutation does not follow a specific discipline; they may occur in the form of a single individual or in a specific moment. When mutation occurs, X physical nodes are randomly selected from the chromosome (where X < M / 2). X physical nodes are removed from the collection of used physical nodes and are then added to the collection of unused physical nodes. The virtual machines deployed on the physical nodes are also deleted. The unallocated virtual machine series is then generated on the basis of the chain structure of individual virtual machines, which preserves individual variation. Finally, the unallocated sequence of virtual machines is redistributed in accordance with the distribution of the priority heuristic algorithm.
The value of parameter X, which can affect the performance of the multi-objective evolutionary algorithm, depends on mutation rate. The value is determined with an experiment [16] . We take note of the following details. 1.When virtual machines have been integrated, we will ensure that the amount of resources of the physical machine is less than or equal to the threshold of each type of resource value during the process of crossover or mutation. 2. In the process of crossover or mutation based on resource coding, the remaining resources of the cross section selected by the physical machine must be greater than those of the selected virtual machine. The process of the mutation operator method is shown in Fig. 3 .
PPT Slide
Lager Image
Mutation process
- 4.4.3 Population update method
The ε dominant relation is a weakened form of the Pareto dominance relation [24] . In this study, the ε form is added for the given ε ∈ R k , ε i > 0, ∀i ∈ k . The concrete definition of this form is as follows:
Definition 8: ε dominant relation
Let f : Rm Rk , x 1, x 2 ∈ Ω ⊆ Rm , ε ∈ Rk , ε i > 0, which is called x 1 ε dominate x 2, if and only if
PPT Slide
Lager Image
which is marked as x 1 ε iX 2 .
Definition 9: ε-Pareto optimal approximate solution set
Set X a is called a ε-Pareto optimal approximate solution set of X if and only if for any x X , the attribute x' X a is always true and x' εx is always true [24] .
Definition 10: ε-Pareto solution set
Set Xε is called a ε-Pareto solution set of Xε if and only if Xε is a ε-Pareto optimal approximate solution set of X and Xε X* [24] .
Definition 11: Immune memory
Immune memory is an important characteristic of the immune system. When the body is exposed to an antigen for the second time, the incubation period of the antibodies relative to the first response time is obviously short. In this case, the antibody levels rise rapidly. When the same antigen invades the body again, a strong primary immune system develops; the phenomenon in which antibodies gain a higher degree of affinity is referred to as immune memory [25] .
The current non-dominant individuals are stored on the basis of the memory set. ε−dominant is an effective form of the relaxation mechanism of the Pareto dominance, which is used to maintain the uniform distribution of a solution. This form is widely applied to maintain diversity domain. ε−dominant is adopted to update the memory population. For a multi-goal optimization problem that includes m goal functions, if 1 ≤ fi K , (1 = 1,2,,..., m ), then the goal space can be divided into (( K -1)/ ε ) m sub-spaces according to the rules of ε−dominant . In every sub-space, only one individual exists, and other individuals are deleted from the population space.
The ε−dominant mechanism is sensitive to a variety of problems. Given such variety, different numbers of antibodies are kept.
From a practical viewpoint, deciders cannot know the geometric distribution form of the Pareto frontier in advance. If a target value of multiple antibodies is less than a certain value, deciders do not view two antibodies as different. In this case, the use of the ε−dominant mechanism is appropriate. We adopt a control mechanism to provide decision rights that can meet such a situation and to maintain population diversity. In practice, decision makers can dynamically control the values of a vector. Allocating an identification vector for individuals is a simple method that can be used to divide the antibodies of a living space [25] . The identification vector can be defined as B ( Xa ) = ( B 1 ( Xa ), B 2 ( Xa ), ..., Bm ( Xa )) and Bi ( Xa ) = fi ( Xa )/ ε , ( i =1,2,..., m ). The current memory set can be denoted as MS ( t ) = ( MS 1 , MS 2 , ..., MSn ). A new generation population is generated after the crossover and mutation operations and is then crossed with sub-populations. The result is sorted with the rule of ε non-dominated sorting. According to the domination relationship among individuals, the memory information of antibodies, the Euclidean distance, and super volume [26] , the subsequent generation of species evolves further. The update mechanism of concrete super body populations is described as follows:
a) If the current generation is the first generation, then the non-dominant individual should be selected in the initial population. Otherwise, the non-dominant individual of the current population and the parent population is assigned to the individual population set. The large individual population is sorted after being overlaid according to the dominant relation. Let the grade number be n , and let the non-dominant set be denoted as L 1, L 2,......., Ln . The new identification vector for the non-dominant population is assigned. If some of the identification vectors of some individuals are the same, then they belong to the same super individual. All identification vectors can be used to extract and identify super individuals.
b) Let the population scale be M , and let the individual number of L 1 be M . The set of L 1 can be regarded as the subsequent generation.
c) If the individual number of L 1 is more than M , then all the memory information of the antibodies and the contribution value of the super individuals are calculated in L 1, except for some individuals with goal limit values. The n - 2 contribution values that are relatively large are selected, and the individuals whose values satisfy the goal value are selected as the subsequent generation population and marked as Z ( g +1).
d) If the individual number of L 1 is less than M , then all the individuals of M are added to the subsequent generation marked as Z ( g +1). The individuals of L 2, L 3,......., Ln are added to Z ( g +1) until the scale of the population reaches M . After such addition, if the number of individuals of L i is more than M , then the Euclidean distance of the vector individuals with the same super individuals is calculated. The individuals with small values of Euclidean distance join the subsequent generation until the population scale reaches M .
5. Immune Memory-based Algorithm for the Location Selection of Virtual Machines
The optimization process with the multi-objective optimization algorithm has obvious disadvantages despite the wide use of the non-dominated sorting genetic algorithm 2 and the double F-shaped resonator (DFR). For the problem of multi-target optimization, improved performances and optimal results can be achieved for different frontal shapes. In this study, the basic evolution operators of cloud resource allocation are thus embedded in SMS-EMOA [16] . We propose a new multi-objective virtual machine location selection algorithm (MOVMLSA) according to the multi-objective selection process based on dominated hyper-volume [27] . The main process of the algorithm is described as follows:
a) According to the group information based on doubly linked list encoding, parent individuals are initialized to form the parent population with a scale of M.
b) The fitness evaluation function is calculated in view of the population individuals. The parent population that has been evaluated is sorted with the ε non-dominated rule. The control level of the individuals in the population is divided.
c) Two parent individuals are selected from populations using the tournament selection method, and a hybrid individual is generated using the single-point crossover method for the two parent individuals.
d) Intelligent hybrid individual variation occurs, and individuals are generated. The child individuals are first evaluated with the fitness evaluation function. The child individuals are then added to the population pool. This step is repeated until the population size reaches M.
e) The best M individuals are selected as the subsequent generation of the parent population on the basis of the update mechanism of the immune memory.
f) For the new generation of the parent population, steps c)–f) are repeated until the evolution condition is met (maximum number of iterations). The algorithm is then terminated.
PPT Slide
Lager Image
Algorithm flow
6. Experimental Results and Analysis
The experimental results show that the proposed MOVMLSA based on immune memory can determine the reasonable placement of virtual machines, improve the utilization of physical resources, and reduce the cost of resources for users. The algorithm features a certain degree of feasibility and high accuracy.
A priority heuristic algorithm (i.e., the proportional hazards model (PHM)), the classic NSG-2 algorithm, the multi-dimensional dominant resource fairness (DRF) scheduling algorithm [28] , and the multi-objective evolutionary algorithm based on immune memory are all simulated in this study to verify the performance of the immune system-based multi-dimensional strategy for the location selection of virtual machines. The number of used physical machines, the load balance of the physical machines, and the resource cost are all recorded; these values can be used to compare the performance of the different algorithms. The simulation experimental platform is the CloudSim, which was proposed by the teams from the Grid Laboratory of the University of Melbourne in Australia and the Gridbus Project [6] . Some corresponding classes based on the base classes of CloudSim are modified to implement all the algorithms. Several inherited classes and methods are also designed. The existing useful resource vector of every physical machine can be obtained through the host class. The required resource vector of a virtual machine can be generated from the Datacenter class. The VMProvisioner class is used to achieve the mapping from a physical machine to a virtual machine. The VMAllocation policy is an abstract class that is used to achieve the location selection of virtual machines. The HostForVm allocation method can realize the placement of a special virtual machine to a fixed physical machine.
The inherited VMAllocationPolicy class can provide the assignment strategy of virtual machines. The allocation algorithm of virtual machines based on the multi-objective genetic immune memory can be implemented by editing the user-defined inheritance class named VMAllocation Policy. With this class, virtual machines can select proper physical machines to locate. With the method of extending classes in CloudSim, all classes can be rebuilt.
Forty physical nodes are simulated in the CloudSim simulation platform. Every node is equipped with two processors (Intel(R) Core(TM) i5-3317U 1.7 GHz), a 4 GB memory storage, 8 MB L2 cache, and two disks of 7,200 turns with 500 GB. The virtual numbers are 30, 45, 55, 65, and 80 in turn.
The population size of the evolution algorithm and the evolution number are set to 50 and 1,000, respectively. The crossover and mutation ratios are set to 0.6 and 0.01, respectively.
Crosser and mutation rates have very important influence on the experimental results. For general generic algorithm, the crosser and mutation rates are always constant. However, determining the concrete value of both rates is difficult. If the value is extremely small, an optimal solution is always more approximately achieved; however, ensuring that the solution is the global optimal solution is difficult. On the other hand, if the value of crosser and mutation rates are too large, the number of iterations also become large. Moreover, the searching ability becomes extremely low. At the same time, the algorithm is difficult to converge. To determine the experimental parameters, we conducted experiments, which can be used to determine the correlation between mutation rate and convergence speed, 1000 times.
An evolution experiment is conducted according to the above parameters in Table 3 and Table 4 . The experimental results are shown in Table 5 .
Parameter design of the algorithm for the location selection of virtual machines in the CloudSim platform
PPT Slide
Lager Image
Parameter design of the algorithm for the location selection of virtual machines in the CloudSim platform
Class crossover and mutation rates
PPT Slide
Lager Image
Class crossover and mutation rates
Performance comparison of four algorithms
PPT Slide
Lager Image
Performance comparison of four algorithms
The intuitive data graph shown in Figs. 5 7 is used to further analyze the algorithm performance data shown in Table 4 . The numbers of the enabled physical nodes in the location selection algorithm based on immune memory and the DFR algorithm are approximately similar, but they are lower than those in the PHM and NSG-2 algorithms. A low number of enabled physical nodes indicates considerable savings in energy and resources. For the load performance of a cluster, the load performance variance in the location selection algorithm based on the multi-objective genetic algorithm is less than those in the other three algorithms. A small load performance variance denotes the good effect of the load-balancing server cluster. The preceding analysis implies that the location selection algorithm based on the multi-objective genetic algorithm can greatly reduce the number of used physical servers and can achieve a good load-balancing server cluster effect.
PPT Slide
Lager Image
Comparison of resource utility
To obtain the comparison results of PHM, NSG-2, DFR, and MOVMLSA, we performed five independent group experiments (the required numbers of virtual machines are separately set to 30, 45, 55, 65, and 80). Thus, the physical machine number, resource load balance, and resource utility can be analyzed adequately.
(1) Analysis comparison of comprehensive resource utility: The comprehensive resource denotes CPU, memory, network, and I/O resource. From the comparative experiments of the first group data to the fifth group data, the changes were recorded according to the changes in the number of virtual machines. The resource utility situation of PHM, NSG-2, DRF, and MOVMLSA are shown in Fig. 5 . As shown in the figure, under the same conditions, the physical resource utility is the highest when MOVMLSA is used. When the number of virtual machines is adjusted from 30 to 80, the comprehensive resource utility using MOVMLSA is 10%–14% higher than that of PHM, NSG-2, and DRF. Thus, MOVMLSA can obviously improve resource utility and save energy.
(2)Analysis comparison of the number of enabled physical machine: The number of virtual machines is adjusted, and the required physical machine number is shown in Fig. 6 . The figure reflects the difference between NSG-2, DRF, and MOVMLSA. When different algorithms are used to assign the virtual machine to the physical machine, MOVMLSA can assign the virtual machine efficiently, which is reflected by using physical machines at least under the same conditions. Of course, the number of required physical machines increases with the increase in the number of virtual machines. The experimental results show that compared with PHM, which consumes plenty of resources, MOVMLSA can degrade the comprehensive resource utility from 35% to 62%. Compared with NSG-2 and DRF, MOVMLSA can degrade the comprehensive resource utility as well. Thus, MOVMLSA can improve resource utilization.
PPT Slide
Lager Image
Comparison of the number of enabled physical machines
(3) Analysis comparison of resource load balance: In the experiments, resource threshold value, attributes of physical machines, and virtual machine tasks are consistent with one another, except for the deployment method. The cross and mutation rates are set at 0.6 and 0.01, respectively, and the maximum genetic algebra is set at 1000. The virtual machine deployment process was recorded from 30 virtual machines to 80 virtual machines in the experiment. The main goal of this experiment is to obtain the comparison of the result of resource load balance in a fixed number of virtual machines with different location selection algorithms. The comparison results are presented in Fig. 7 . The experiments show that with the increase in the number of virtual machines, the algorithm can decrease the unbalanced degree of the resources. From the longitudinal contrast, under the same conditions relative to PHM, NSG-2, and DFR, MOVMSLA can reduce the resource imbalance degree from 0.03% to 0.23%. Compared with three other algorithms, MOVMLSA contributes more effectively to the performance of resource load balance. However, this advantage is not particularly obvious because many uncertain factors in determining genetic algorithm parameters exist. In our future work, the aspects of optimization parameters based on the characteristics of the algorithm itself would be improved.
PPT Slide
Lager Image
Comparison of load balance degrees
In theory, the multi-object virtual machine location selection problem, which is proposed in this paper, is a combinatorial optimization problem. In cloud computing environment, however, combinatorial optimization problems are likely to lead to combination explosion. Prioritization process between targets can be avoided because the genetic algorithm can be processed parallel to each target. Thus, the genetic algorithm can effectively reduce the invalid combination. Therefore, the algorithm is suitable for solving a multi-object optimization problem [10] . MOVMLSA is a multi-object location selection algorithm based on immune memory, which is also classified as a kind of genetic algorithm. MOVMLSA is effective for solving multi-object combinatorial optimization problems. However, at present, the genetic algorithm faces the problem of slow convergence speed. To improve the convergence speed of the genetic algorithm, MOVMLSA can choose excellent individuals from each generation population; then, the immune information is extracted from the excellent individuals, and a vaccination for immunization is prescribed to the diagnosed offspring. Immune memory can speed up the breeding of good modes and repair the damaged modes by crossover and mutation model of excellence. Virtual machine population and immune memory information can interact and cooperate with one another, which can greatly improve the convergence speed of MOVMLSA. MOVMLSA can automatically obtain the immunization information of an individual, and this information is dynamically updated with the evolution of the population. In this way, we can find the subspace that may be included in the optimum solution and improve the searching efficiency of the algorithm. The scale of immune memory library is an important factor that can influence the efficiency of the algorithm. However, in this study, we did not examine the relationship between the population size and the immunization type database scale, which will be investigated in our future work.
7. Conclusion
The MOVMLSA based on the multi-objective evolutionary algorithm and that based on immune memory are discussed in this work. To solve the problem of mapping physical machines to virtual machines, we transform the problem of the location selection of virtual machines into a multi-objective packing problem according to the multi-dimensional resource characteristics of virtual machines and the characteristics of optimization goals. The solution of the multi-objective genetic algorithm is obtained on the basis of the genetic memory information. A chromosome evaluation function and a group chain code are designed according to the doubly linked list and group information. The maximum lengths of the cross operator, single-point crossover operator, and mutation operator of the X-point mutation are designed according to the code information. Compared with other evolutionary algorithms, the multi-objective genetic algorithm based on immune memory shows a better performance for different frontal shapes. An algorithm for the location selection of virtual machines based on immune factors is designed in a cloud computing platform. The algorithm update policy is based on the volume of the update mechanism, which can ensure the population quality and diversity. PHM, NSG-2, DFR, and MOVMLSA are simulated to verify the performance of the new algorithm. The experimental results show that the MOVMLSA based on immune operators performs better than the other algorithms in terms of resource utilization, cluster load balance, and resource cost.
BIO
Shukun Liu received the M.S. degree in computer science and technology from University of South China, PR China, in 2007. Currently, He has been a Ph.D. candidate at Central South University since September 2013, China. His major research interests include cloud computing, virtualization technology, performance analysis, computer networks, database technology, data mining and software engineering. He has published nearly fifteen papers in related journals. And he is a member of CCF and a member of ACM.
Prof. Weijia Jia is a Zhiyuan Chair Professor in the Department of Computer Science and Engineering, Shanghai Jiaotong University. He joined German National Research Center for Information Science (GMD) in Bonn (St. Augustine) from 1993 to 1995 as a research fellow. During 1995-2013, he joined Department of Computer Science, City University of HK as a professor. Weijia Jia received BSc and MSc from Center South University, China in 1982 and 1984 and Master of Applied Sci. and PhD from Polytechnic Faculty of Mons, Belgium in 1992 and 1993 respectively, all in Computer Science. He is the guest Professor of Beijing University of Science and Technology of China, Beijing Jiao Tong University, Jinan University, Guangzhou, Chengdu University, China. He has served as the editors of IEEE TPDS and ComCom and PC chairs and members/keynote speakers for various prestige international conferences. He is the Senior Member of IEEE and the Member of ACM. His research interests include next generation wireless communication, protocols and heterogeneous networks; distributed systems, multicast and anycast QoS routing protocols. In these fields, he has a number of publications in the prestige international journals (IEEE Transactions, e.g., ToN, TPDS, TC, TMC etc.), books/chapters and refereed international conference proceedings (e.g. ACM CCS, WiSec, MobiHoc, SenSys, IEEE ICDCS, INFOCOM etc.). He (with W. Zhou) has published a book “Distributed Network Systems” by Springer where the book contains extensive research materials and implementation examples. He has received the best paper award in a prestige (IEEE) conference.
References
Breitgand D. , Epstein A. “Improving consolidation of virtual machines with risk-aware bandwidth oversubscription in compute clouds,” IEEE INFOCOM 2012-IEEE Conference on Computer Communications 2012 2861 - 2865
Alicherry M. , Lakshman T. “Optimizing data access latencies in cloud systems by intelligent virtual machine placement,” IEEE INFOCOM 2013-IEEE Conference on Computer Communications 2013 647 - 655
Chen Xiaojiao , Chen Shihping , Fang Fang 2014 “Virtual machine resource allocation algorithm in cloud computing,” Computer Application Research 31 (9) 2584 - 2587
Stillwell M. , Schanzenbach D. , Vivien F. , Casanova H. 2010 “Resource allocation algorithms for virtualized service hosting platforms,” Journal of Parallel and Distributed Computing 70 (9) 962 - 974    DOI : 10.1016/j.jpdc.2010.05.006
Gahlawat M. , Sharma P. “Survey of virtual machine placement in federated Clouds,” in Proc. of Advance Computing Conference (IACC), 2014 IEEE International 2014 735 - 738
Xu Bo , Zhao Chao , Zhu Yanjun , Peng Zhiping 2014 “Virtual machine resource scheduling multi-objective optimization in cloud computing,” Journal of System Simulation 26 (3) 592 - 595
Kong X. , Lin C. , Jiang Y. , Yan W. , Chu X. 2011 “Efficient dynamic task scheduling in virtualized data centers with fuzzy prediction,” Journal of Network and Computer Applications 34 (4) 1068 - 1077    DOI : 10.1016/j.jnca.2010.06.001
Hao F. , Kodialam M. , Lakshman T. , Mukherjee S. “Online allocation of virtual machines in a distributed cloud,” in Proc. of IEEE INFOCOM 2014 - IEEE Conference on Computer Communications 2014 10 - 18
Warneke D. , Kao O. 2011 “Exploiting dynamic resource allocation for efficient parallel data processing in the cloud,” IEEE Transactions on Parallel and Distributed Systems 22 (6) 985 - 997    DOI : 10.1109/TPDS.2011.65
Maguluri S. T. , Srikant R. , Ying L. “Stochastic models of load balancing and scheduling in cloud computing clusters,” in Proc. of IEEE INFOCOM 2012-IEEE Conference on Computer Communications 2012 702 - 710
Jiang J. W. , Lan T. , Ha S. , Chen M. , Chiang M. “Joint VM placement and routing for data center traffic engineering,” in Proc. of IEEE INFOCOM 2012-IEEE Conference on Computer Communications 2012 2876 - 2880
Sato K. , Samejima M. , Komoda N. “Dynamic optimization of virtual machine placement by resource usage prediction,” in Proc. of 2013 11th IEEE International Conference on Industrial Informatics 2013 86 - 91
Xu J. , Fortes J. A. “Multi-objective virtual machine placement in virtualized data center environments,” in Proc. of Green Computing and Communications (GreenCom), 2010 IEEE/ACM Intʹl Conference on & Intʹl Conference on Cyber, Physical and Social Computing (CPSCom) 2010 179 - 188
Chen M. , Zhang H. , Su Y.-Y. , Wang X. , Jiang G. , Yoshihira K. “Effective VM sizing in virtualized data centers,” in Proc. of Integrated Network Management (IM), 2011 IFIP/IEEE International Symposium on 2011 594 - 601
Li Q. , Hao Q.-F. , Xiao L.-M. , Li Z.-J. 2011 “Adaptive management and multi-objective optimization for virtual machine placement in cloud computing,” Chinese Journal of Computers 34 (12) 2253 - 2264    DOI : 10.3724/SP.J.1016.2011.02253
Ai Haojun , Gong Suwen , Yuan Yuanming 2014 “Research of cloud computing virtual machine allocated strategy on Multi-objective evolutionary algorithm,” Computer Science 41 (6) 48 - 53
Sawant S. 2011 “A genetic algorithm scheduling approach for virtual machine resources in a cloud computing environment,” Master's Projects. Paper 198
Gu J. , Hu J. , Zhao T. , Sun G. 2012 “A new resource scheduling strategy based on genetic algorithm in cloud computing environment,” Journal of Computers 7 (1) 42 - 52    DOI : 10.4304/jcp.7.1.42-52
Li MF , Bi JP , Li ZC 2014 “Resource Scheduling Waiting Aware Virtual Machine Consolidation,” Journal of Software 25 (7) 1388 - 1402
Hamilton G. 2014 “Distributed virtual machine migration for cloud data center environments,” University of Glasgow
Sun Da-wei , Chang Guiran , Li Fengyun , Wang Chuan , Wang Xingwei 2011 “Optimizing multi-dimensional QoS cloud resource scheduling by immune clonal with preference,” Acta Electronica Sinica 39 (8) 1824 - 1831
Yuan Aiping , Wan Canjun 2014 “Virtual machine deployment strategy based on improved genetic algorithm in cloud computing environment,” Journal of Computer Applications 34 (2)
Ülker Ö. , Korkmaz E. E. , Özcan E. 2008 “A grouping genetic algorithm using linear linkage encoding for bin packing,” inProc. of Parallel Problem Solving from Nature–PPSN X, ed: Springer 1140 - 1149
Yang Dongdong , Jiao Llicheng , Gong Maoguo , Yu Hang 2010 “Clonal selection algorithm for solving preference multi-objective optimization,” Journal of Software 21 (1) 14 - 33    DOI : 10.3724/SP.J.1001.2010.03551
Liu Yong , Wang Xinhua , Xing Changming , Wang Shuo 2011 “Resources scheduling strategy based on ant colony optimization algorithms in cloud computing,”
Emmerich M. , Beume N. , Naujoks B. 2005 “An EMO algorithm using the hypervolume measure as selection criterion,” Lecture Notes in Computer Science 3410 62 - 76
Beume N. , Naujoks B. , Emmerich M. 1669 “SMS-EMOA: Multi-objective selection based on dominated hypervolume,” European Journal of Operational Research 181 (3) 1653 - 1669    DOI : 10.1016/j.ejor.2006.08.008
Guo Ping , li Qi 2012 “Load balance scheduling algorithm based on the load on the server status classification,” Journal of Huazhong University of science and technology: Natural Science Edition 40 (1) 62 - 65