Advanced
A Workflow Scheduling Technique Using Genetic Algorithm in Spot Instance-Based Cloud
A Workflow Scheduling Technique Using Genetic Algorithm in Spot Instance-Based Cloud
KSII Transactions on Internet and Information Systems (TIIS). 2014. Sep, 8(9): 3126-3145
Copyright © 2014, Korean Society For Internet Information
  • Received : December 02, 2013
  • Accepted : August 19, 2014
  • Published : September 28, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Daeyong Jung
Dept. of Computer Science Education, Korea University Seoul, Korea
Taeweon Suh
Dept. of Computer Science Education, Korea University Seoul, Korea
Heonchang Yu
Dept. of Computer Science Education, Korea University Seoul, Korea
JoonMin Gil
School of Information Technology Engineering, Catholic University of Daegu Daegu, Korea

Abstract
Cloud computing is a computing paradigm in which users can rent computing resources from service providers according to their requirements. A spot instance in cloud computing helps a user to obtain resources at a lower cost. However, a crucial weakness of spot instances is that the resources can be unreliable anytime due to the fluctuation of instance prices, resulting in increasing the failure time of users’ job. In this paper, we propose a Genetic Algorithm (GA)-based workflow scheduling scheme that can find the optimal task size of each instance in a spot instance-based cloud computing environment without increasing users’ budgets. Our scheme reduces total task execution time even if an out-of-bid situation occurs in an instance. The simulation results, based on a before-and-after GA comparison, reveal that our scheme achieves performance improvements in terms of reducing the task execution time on average by 7.06%. Additionally, the cost in our scheme is similar to that when GA is not applied. Therefore, our scheme can achieve better performance than the existing scheme, by optimizing the task size allocated to each available instance throughout the evolutionary process of GA.
Keywords
1. Introduction
I n recent years, due to the increased interest in cloud computing, many cloud projects and commercial systems such as the Amazon Elastic Compute Cloud (EC2) [1] , and FlexiScale [2] have been implemented. Cloud computing provides high utilization and high flexibility for managing computing resources. In addition, cloud computing services provide a high level of scalability of computing resources combined with Internet technology that are distributed among several customers [3 , 4 , 5] . In most cloud services, the concept of an instance unit is used to provide users with resources in a cost-efficient manner. Generally, instances are classified into three types: on-demand instances, reserved instances, and spot instances. On-demand instances allow a user to pay for computing capacity by the hour, with no long-term commitments. This frees users from the costs and complexities of planning, purchasing, and maintaining hardware, and transforms what are usually large fixed costs into much smaller variable costs [1] . Reserved instances allow a user to make a low, one-time payment to reserve instance capacity and further reduce the user’s cost. While in reserved instances a user pays a yearly fee and receives a discount on hourly rates, in on-demand instances a user pays one hourly rate [6] . On the other hand, spot instances allow customers to bid on unused Amazon EC2 capacity and run those instances for as long as their bid exceeds the current spot price. Based on the supply and demand of spot instances, the spot price is changed. Customers whose bid exceeds the spot price can gain access to the available spot instances. If the applications executed are time-flexible, spot instances can significantly decrease the Amazon EC2 costs [7] . Therefore, spot instances may incur lower costs while performing tasks than on-demand instances.
Spot market-based cloud environment configures the spot instance. The environment affects the successful completion or failure of tasks depending on the changing spot prices. Because spot prices have a market structure and follow the law of demand and supply, cloud services in Amazon EC2 can provide a spot instance when a user’s bid is higher than the current spot price. Furthermore, a running instance stops when a user’s bid becomes less than or equal to the current spot price. After a running instance stops, it restarts when a user’s bid becomes greater than the current spot price [8] .
Scientific applications, in particular, make the current common of workflow. However, spot instance-based cloud computing has variable performance, because the available execution time of spot instances depends on the spot price. The completion time for the same amount of a task varies according to the performance of an instance. In other words, the failure time of each instance differs according to the user’s bid and the performance in an instance. Thereby, we infer that the completion time of a task in an instance increases when a failure occurs. For efficient execution of tasks, we analyze the task and instance information from the price history data of spot instances, and estimate the task size and instance availability from the analyzed data. A workflow is created based on each available instance and the task size. However, the created workflow has a problem in that it does not consider the failure time of each instance. To solve this problem, we propose a scheme to change the task size of each instance using a Genetic Algorithm (GA). Our proposed scheduling scheme uses workflow mechanisms and GA to handle job execution. In our scheme, a user’s job is executed within selected instances and the user’s budget is stretched.
2. Related Works
Two different environments in cloud computing have been considered in recent research studies: reliable environments (with on-demand instances [9 , 10] ) and unreliable environments (with spot instances [8 , 11 , 12 , 12] ). Our study falls into the latter category of cloud computing environments.
Typically, studies on spot instances have mainly focused on performing tasks at low monetary costs. The spot instances in the Amazon EC2 offer cloud services to users at lower prices at the expense of reliability [1] . Cloud exchange [14] provides the actual price history of EC2 spot instances. There has been numerous studies on resource allocation [12 , 13] , fault tolerance [8 , 11 , 12 , 15] , workflow scheduling [16 , 17] , and the use of Genetic Algorithms [18 , 19 , 20] for spot instances.
In the field of resource allocation, Voorsluys et al. [12] provided a solution for running computation-intensive tasks in a pool of intermittent VMs. To mitigate potential unavailability periods, the study proposed a multifaceted fault-aware resource provisioning policy. Voorsluys et al’s solution employed price and runtime estimation mechanisms. The proposed strategy achieved cost savings and stricter adherence to deadlines. Zhang et al. [13] demonstrated how best to match customer demand in terms of both supply and price, and how to maximize provider revenue and customer satisfaction in terms of VM scheduling. Their model was designed to solve the problem of discrete-time optimal control. It achieved higher revenues than static allocation strategies and minimized the average request waiting time. Our work differs from [12] and [13] in that we focus on reducing the rollback time after a task failure, achieving cost savings and reducing the total execution time.
In the fault tolerance category, two similar studies ( [11] and [8] ) proposed enforcing fault tolerance in cloud computing with spot instances. Based on the actual price history of EC2 spot instances, these studies compared several adaptive checkpointing schemes in terms of monetary costs and job execution time. Goiri et al. [10] evaluated three fault tolerance schemes - checkpointing, migration, and job duplication - assuming fixed communication costs. The migration-based scheme showed a better performance than the checkpointing or the job duplication-based scheme. Voorsluys et al. [12] also analyzed and evaluated the impact of checkpointing and migration on fault tolerance using spot instances. Our paper differs from [8 , 10 , 11 , 12] in that we utilize two thresholds for fault tolerance.
A workflow is a model that represents complex problems with structures such as Directed Acyclic Graphs (DAG). Workflow scheduling is a kind of global task scheduling as it focuses on mapping and managing the execution of interdependent tasks on shared resources. The existing workflow scheduling methods have limited scalability and are based on centralized scheduling algorithms. Consequently, these methods are not suitable for spot instance-based cloud computing. In spot instances, the available time and instance costs have to be considered for job execution. Fully decentralized workflow scheduling systems use a chemistry-inspired model to determine the instance in a community cloud platform [16] . The throughput maximization strategy is designed for transaction-intensive workflow scheduling that does not support multiple workflows [17] . Our proposed scheduling scheme guarantees an equal task distribution across available instances in spot instance-based cloud computing.
A GA is a popular search technique that uses the concept of evolution in the natural world to solve optimization problems [18 , 19 , 20] . It has been used for cloud scheduling in various studies [21 , 22 , 23] . The synapsing variable-length crossover (SVLC) algorithm provides a biologically inspired method for performing meaningful crossover between variable-length genomes [18] . However, traditional GAs [19 , 20] operate on a population of fixed-length genomes. In additional, those have the problem to relate a set of potential solutions. Our GA utilizes the crossover to adopt the variable-length. In [21] , the scheduling of VM resources in a load balanced manner was based on GA. In a private cloud environment, a Modified Genetic algorithm (MGA) was utilized for task scheduling with a combination of Shortest Cloudlet to Fastest Processor (SCFP) and Longest Cloudlet to Fastest Processor (LFCP) methods [22] . In this study, authors also used a meta-heuristic GA as an optimization method. Fatma et al. [23] adapted task scheduling to use two fitness functions. The first fitness function is concerned with minimizing the total execution time, and the second is concerned with the load balance. However, the existing studies did not consider the failure time of instances that makes task execution stop. Our GA-based scheduling method considers the failure time of instances, the task execution time, and task execution costs.
3. System Architecture
Our proposed scheme expands on our previous work [24] and includes a workflow scheduling algorithm. We make workflow tasks based on available instances and propose how to handle tasks. Each available instance operates workflow tasks and uses checkpointing scheme in Section 3.3. Fig. 1 presents the mapping relationship between workflows and instances, and illustrates the roles of the instance information manager, the workflow manager, and the resource scheduler. The instance information manager obtains the information required for job allocation and resource management. This information includes the VM specifications for each instance and execution related information such as the execution costs, execution completion time, and failure time. The execution-related information is calculated by using the selected VM and is based on the spot history. The workflow manager and resource scheduler extract the necessary execution-related information from the instance information manager. First, the workflow manager generates the workflow for the request job. The generated workflow determines the task size according to the VM performance, the execution time and costs, and the failure time in the selected instance. Second, the resource scheduler manages the resource and allocates the task to handle the job. Resource and task managements perform reallocation when the resource cannot get the mapping-related information for the task, or when the task has a fault during execution.
PPT Slide
Lager Image
Mapping between workflows and instances
In the above model, our proposed scheme uses the workflow in spot instances to minimize job processing time within the user’s budget. The task size is determined by evaluating the availability and performance of each instance in order to minimize the job processing time. The available time is estimated by calculating the execution time and cost using the price history of spot instances. This helps to improve the performance and stability of task processing. The estimated time data is used in assigning the amount of task to each instance. Our proposed scheme reduces out-of-bid situations and improves the job execution time. However, the total cost is higher than the cost when workflow scheduling is not used.
- 3.1 Instance types
An instance means a VM rented by a cloud user. Instances are classified into two types: on-demand instances and spot instances. In on-demand instances, users can use VM resources after paying a fixed cost on an hourly basis. On the other hand, when using spot instances, users can use VM resources only when the price of the instances is lower than their bid. The difference between the two instance types is as follows. In on-demand instances, a failure does not occur during task execution, but the cost is comparatively high. In contrast, the cost of spot instances is lower than that of on-demand instances. However, in the case of spot instances, there is a risk of task failures when the price of the instance becomes higher than the user’s bid.
Amazon allows users to bid on unused EC2 capacity that includes 42 types of spot instances [1] . Their prices, which are referred to as spot prices, change dynamically based on the supply and demand. Fig. 2 shows an example of spot price fluctuation during seven days in December 2010 for c1-xlarge (High-CPU Spot Instances - Extra Large) [14] . Our proposed system model is based on the characteristics of Amazon EC2's spot instances, which are as follows:
PPT Slide
Lager Image
Price history of EC2's spot instances for c1-xlarge
  • • The system provides a spot instance when a user bid is higher than the current price.
  • • The system immediately stops the spot instance, without giving any notice, when a user bid becomes less than or equal to the current price. We refer to this situation as an out-of-bid event or a failure.
  • • The system does not charge for the last partial hour when the system stops the spot instance.
  • • The system charges for the last partial hour when the user voluntarily terminates the spot instance.
  • • The system provides the spot price history.
- 3.2 Estimated Interval
Our checkpointing operation is performed by analyzing the price variations during selected time intervals in the past. We estimate the job execution time and cost from the analyzed data. These estimations are combined with the failure probability in order to calculate the thresholds for the checkpointing operation. The proper estimation of the execution time and cost is crucial for maintaining the credibility of service providers with customers.
In this paper, we introduce a terminology referred to as Estimated Interval (EI). Fig. 3 shows an illustrative definition of the EI. The detailed definition is as follows:
PPT Slide
Lager Image
EI illustration
  • • Pure task time: The time taken to execute a task on a selected instance when there are no failures.
  • • Past pure task time: The sum of the time taken for task execution on the selected instance in the past, excluding failure times. The time information is extracted from the price history.
  • • Past failure time: The sum of failure times for task execution in the past. A failure occurs when the current user bid goes below the past spot price.
  • • Estimated interval (EI): The sum of the past pure task time and the past failure time.
  • • Expected cost: The average of costs charged for task execution in EIs.
- 3.3 Our Proposed Checkpointing Scheme
Fig. 4 illustrates our proposed checkpointing scheme [24] . This scheme performs checkpointing using two kinds of thresholds - price threshold and time threshold - depending on the expected execution time based on the price history. Let tstart and tend denote a start point and an end point in the expected execution time, respectively. Based on tstart and tend , we obtain the price threshold ( PriceTh ) and the time threshold ( TimeThPi ), which are used as thresholds in our proposed checkpoint scheme. The price threshold, PriceTh , can be calculated as
PPT Slide
Lager Image
PPT Slide
Lager Image
Our proposed checkpointing scheme
where Userbid represents the bid suggested by the user. P min is the available minimum price that the function PriceMin extracts in the period between tstart and tend . This is given as follows:
PPT Slide
Lager Image
The time threshold of the price Pi , TimeThPi , can be calculated as
PPT Slide
Lager Image
where FPi is the failure probability of price Pi , and AvgTimePi ( tstart , tend ) represents the average execution time of Pi , in the period between tstart and tend .
4. Genetic Algorithm Based Job Scheduling
The main goal of our scheduling method is to minimize the execution time and the cost of running applications. Our scheduling method is based on the genetic algorithm (GA) that is a well-known and robust search technique for solving large-scale optimization problems. The GA consists of the following five steps:
  • (1) Randomly generate an initial population.
  • (2) Generate offspring using genetic operators such as crossover and mutation.
  • (3) Rank chromosomes using the defined fitness function.
  • (4) Update and evaluate the best-ranked chromosomes based on the selection.
  • (5) Repeat steps 2–4 if the number of pre-determined constraints cannot satisfy the processing of GA generation.
- 4.1 Initial Population
Initially, all chromosomes in a population are randomly constructed without any knowledge of experts. Before describing the representation of a chromosome, let us consider the definitions of both instances and jobs that each chromosome consists.
  • •N: the number of instance types;
  • •M: the number of jobs;
  • •I= {I1,I2,⋯,IN} : a set of instances;
  • •Ii: an instance of typei(1 ≤i≤N);
  • •J= {J1,J2, ⋯,JM} : a set of jobs;
  • •Lj: the number of tasks inj-th job;
  • •Jj= {tj,1,tj,2, ⋯,tj,Lj} : a set of tasks inj-th job (1 ≤j≤M) ;
  • •tj,k:k-th task inj-th job (1 ≤k≤Lj).
To avoid the complexity of task allocations to instances, we assume two constraints as follows: one is that all tasks have an identical size. The other is that there is no sequence for task execution on an instance. Under this assumption, we construct the structure of two-dimensional chromosomes. Fig. 5 illustrates the two-dimensional chromosome represented as a grid with N rows and M columns. In this figure, a gene g (Ii, Jj) means the i -th row and the j -th column (1 ≤ i N , 1 ≤ j M ) in the chromosome and is interpreted as follows.
PPT Slide
Lager Image
Chromosome representation
PPT Slide
Lager Image
where k and l respectively represent the values of the adjacent two genes, g (Ii-1, Jj) and g (Ii, Jj) (1 ≤ k l Lj ).
To meet the requirements for task execution, n instances satisfying the condition in Eq. (5) are chosen when allocating tasks to initial instances. Our task distribution method determines the task size in order to allocate a task to a selected instance. Based on a compute-unit and an available state, the task size of an instance Ii for Jj (
PPT Slide
Lager Image
) is calculated as
PPT Slide
Lager Image
where
PPT Slide
Lager Image
represents the total size of tasks in Jj required for executing a user request. In an instance Ii , UIi and AIi represent the compute-unit (the product of CPU and cores) and the available state, respectively. The available state AIi can be either 0 (unavailable) or 1 (available). The Ubaseline represents the compute-unit of the selected instance to request the user. The task size is decided by the compute-unit rate based on the baseline.
Each gene g (Ii, Jj) shows the number of task sequences according to the compute-unit of the selected instance and the variation of task size. The task sequences are related to the task size. Eq. (6) represents the number of task sequences in gene g (Ii, Jj) .
PPT Slide
Lager Image
The task size of each instance in the chromosome adjusts based on Eq. (7). The variation of task size
PPT Slide
Lager Image
is randomly set by Eq. (8).
PPT Slide
Lager Image
PPT Slide
Lager Image
When an initial population is constructed, all genes in the chromosome are randomly set to an integer value adopting Eq. (6).
- 4.2 Fitness Function
The fitness function defines the criterion for ranking potential hypotheses and for probabilistically selecting them for inclusion in the next generation of the population. Additionally, the fitness function is the criterion for determining the quality of chromosomes in the population. It directly reflects the performance of task distribution. In our paper, the standard deviation is used as the fitness function to evaluate the performance of chromosomes. Based on the standard deviation for each chromosome Ck , our fitness function f ( σCk ) is defined as follows.
PPT Slide
Lager Image
In Eq. (9), EETIi is the total estimated execution time of the instance Ii . AvgCk is the average of total estimated execution time and can be calculated as
PPT Slide
Lager Image
where, P represents population size.
- 4.3 Selection
The fittest chromosomes have higher probability to be selected for the next generation. We use two selection mechanisms. One is the elitism method that forces the GA to retain some number of the best chromosomes at each generation. The other is a roulette wheel method. Given the population size P , P - 1 chromosomes in a new population are made by the roulette wheel method. The elitism method is applied to produce the remaining one chromosome, which is the best of chromosomes in population pool.
- 4.4 Genetic Operations (Crossover, Mutation, and Reproduction)
The successive generation in GA is determined by a set of operators that recombine and mutate selected members of the current population. The crossover operator produces two new offsprings from two parents, leading to improving the fitness of chromosomes in the current population. We adapt two-point crossover method to the spot instance environment. In the typical two-point crossover, a pair of chromosomes is selected according to the crossover probability pc , and two crossover points are randomly selected. Fig. 6 illustrates an example of two-point crossover operation in our genetic algorithm.
PPT Slide
Lager Image
Example of two-point crossover
The mutation operation changes the value of genes in the chromosome according to the mutation probability pm . Our mutation operation is depicted in Fig. 7 . The number of mutation points is randomly selected by row. After selecting the number of rows, the number of column mutations is randomly selected in each row. After the mutation operation is applied, the gene g' (Ii, Jj) is calculated by
PPT Slide
Lager Image
PPT Slide
Lager Image
Example of mutation
where g (Ii, Jj) is a previous gene in the chromosome Ck . r 1 and r 2 represent the variation to the gene g (Ii, Jj) . There are two cases. In the first case (i.e. r 1 ), if the EETIi of the instance Ii is given less than the average of EET of available instances, the gene g (Ii, Jj) is changed into the new gene g' (Ii, Jj) greater than the current value. The r 1 is extracted from the value ranging between g (Ii, Jj) and g (Ii+1, Jj) . In the second case (i.e. r 2 ), if the EETIi of the instance Ii is greater than the average of EET of available instances, the gene g (Ii, Jj) is changed into the new gene value less than the current value. The r 2 is extracted from the value ranging between g (Ii-1, Jj) and g (Ii, Jj) .
The above-mentioned two cases are applied to only instances between I 1 and IN-1 . The genes of last instance ( g (IN, Jj) ) can not be changed, because they have the final task number in a job; i.e, the mutation operation is not applied to these genes.
The reproduction operation is used to obtain the modifed chromosome for the next generation. The reproducted chromosomes selects chromosomes to consider the low ranking of the fitness according to the reproduction probability pr . Each chromosome calculates the ranking of the fitness based on Eq. (9). The selected chromosome deletes current chromosome and than reproduces new chromosome. The selected chromosome adjusts the task size based on the elitism chromosome of the selection phase. The task size of each instance is determined according to the standard deviation of each instance in the elitism chromosome.
- 4.5 GA-based Workflow Scheduling (GAWS) Algorithm
Our GA-based Workflow Scheduling (GAWS) algorithm is described in Fig. 8 . The Task_flag represents the occurrence of a task execution, and its initial value is false. When the task execution is normal (i.e., Task_flag is true), the scheduler performs a workflow operation (lines 3-27). The GA_flag represents the execution of GA, and its initial value is true; Lines 17-24 are initial population, fitness function, and genetic operations of GA, respectively. Lines 28-35 show the workflow function.
PPT Slide
Lager Image
GAWS algorithm
5. Performance Evaluation
In this section, we evaluate and analyze the performance of the proposed scheme through simulations. The performance comparision for the scheme is also presented.
- 5.1 Simulation Environment
Before performing our simulations, we describe spot-instance environments and the simulation parameters used in our GA. Our simulations were conducted using the history data obtained from Amazon EC2 spot instances [14] . The history data before 10-01-2010 was used to extract the expected execution time and failure occurrence probability for our checkpointing scheme. The applicability of our scheme was tested using the history data after 10-01-2010.
In the simulations, one type of spot instance was applied to demonstrate the effect of task execution time analysis on the performance. Table 1 shows the various resource types used in Amazon EC2. In this table, resource types are comprised of different instance types. First, standard instances offer a basic resource type. Second, high-CPU instances offer more compute units than other resources, and can be used for computation-intensive applications. Finally, high-memory instances offer more memory capacity than other resources, and can be used for high-throughput applications such as databases and memory caching applications. In the simulation environments, we compared the performance of our proposed scheme with that of the existing schemes without considering task distribution based on the task execution time.
Resource type information
PPT Slide
Lager Image
Resource type information
Table 1 shows the information related to resource types in each instance, and Table 2 shows the parameters and values for our simulation. Table 3 shows the parameters and values for our genetic algorithm. The spot price information was extracted from the spot history data from 11-30-2009 to 01-23-2011. The user’s bid was taken as the average of the spot prices from the spot history data. Using Eq. (5), the task size was decided by the compute-unit rate based on the baseline m1.xlarge.
Parameters and values for simulation
PPT Slide
Lager Image
Parameters and values for simulation
Parameters and values for GA
PPT Slide
Lager Image
Parameters and values for GA
- 5.2 Effect of GA Analysis
In this section, we show the size variations of requested task and the fitness variation according to GA. Fig. 9 shows the size variations of requested tasks in each instance before and after using GA. In this figure, “B” and “A” denote the before and after GA, respectively, and each instance type (m1.small, m1.large, etc.) indicates the task size allocated to each instance among the requested tasks of users. As shown in Fig. 9 , when GA is applied, the task size is varied for each instance. This variation directly relates to the failure time of each instance; i.e, it is because more tasks are allocated to instances with low failure time by the evolutionary process of GA.
PPT Slide
Lager Image
Size variations in requested tasks
Fig. 10 shows the fitness curve of the best chromosome at each generation when GA is applied in case that the task size is 259,200. In this figure, we can observe that the value of fitness becomes higher as the generation increases. Therefore, we can see that our genetic algorithm-based scheme can sufficiently find the optimum solution.
PPT Slide
Lager Image
Fitness curve
Fig. 11 shows the performance comparison between with and without crossover operation. In this figure, “Without” and “With” denote without and with the crossover operation, respectively. Except for the crossover operation, the remaining operations such as initial population, fitness functions, mutation, and reproduction, are the same as in the previous simulation. As we can see in this figure, the total execution time is reduced by an average of 3.08% with crossover operation compared to without crossover operation. We can see from the result of this figure that crossover operation can improve more performance rather than without crossover operation.
PPT Slide
Lager Image
Comparison with and without crossover operation
- 5.3 Effect of the Estimated Execution Analysis
In this simulation, we examined the performance of the estimated execution analysis according to the task size before and after GA. Figs. 12 , 13 , 14 , and 15 show the total estimated execution time, estimated cost, estimated failure time, estimated rollback time, respectively. TotalT is the sum of the estimated execution time in each instance, task distribution time, and task merge time. TotalC denotes the sum of estimated costs for task execution in each instance. Fig. 12 shows the total estimated execution time according to the task size before and after GA, respectively. After GA was applied, the total estimated execution time was reduced by an average of 33.59%. The total estimated execution times before and after GA were different because the instance failures occurred at different times. Fig. 13 shows the estimated failure times before and after GA, respectively. The estimated failure time of all instances was reduced by an average of 15.45% after applying GA as compared to when GA was not applied. In Fig. 14 , the estimated rollback time after GA showed an average performance improvement of 19.61% when compared to the rollback time before GA. The rollback time is calculated from a failure point to the last checkpoint time. Fig. 15 shows the estimated cost before and after applying GA, respectively. The cost after GA was decreased by an average of $0.12 as compared to the cost before GA. The difference between costs before and after applying GA is a little due to handling the equal task size. From the above results, we observe that the estimated execution times and failure times after applying GA were reduced as compared to before GA. Moreover, the costs were similar. The actual execution times and costs were compared based on above information. The improved results is had the reallocated task size according to GA. Because, our proposed GA is obtained relatively low the standard deviation about the estimated execution time than the standard deviation without GA.
PPT Slide
Lager Image
Total estimated execution time
PPT Slide
Lager Image
Estimated failure time
PPT Slide
Lager Image
Estimated rollback time
PPT Slide
Lager Image
Estimated costs
- 5.4 Effect of the Execution Analysis
In this simulation, we examined the performance of the execution analysis according to the task size before and after GA. Figs. 16 , 17 , 18 , and 19 show the execution results of the actual data based on the estimated data, before and after GA. In the figures, TotalT denotes the total time taken for the distribution and merging of tasks. TotalC denotes the sum of costs of task execution in each instance. Fig. 16 shows that the total execution time is reduced by an average of 7.06% after GA as compared to before GA. Fig. 17 shows that the total costs after GA decreased the average by $0.17 when compared to the cost before GA. Fig. 18 shows that the failure time after GA was increased on average by 13.93% as compared to before GA. In Fig. 19 , the rollback time after GA showed an average performance improvement of 6.52% when compared to the rollback time before GA. Different fluctuation of spot price contributes to difference between estimation and actual performances.
PPT Slide
Lager Image
Total execution time in task distribution
PPT Slide
Lager Image
Costs in task distribution
PPT Slide
Lager Image
Failure time in task distribution
PPT Slide
Lager Image
Rollback time in task distribution
6. Conclusion
In this paper, we proposed a GA-based workflow scheduling technique for task distribution in unreliable cloud computing environments. In our environment, the resources can be unreliable anytime due to the fluctuation of instance prices, resulting in increasing the failure time of users’ job. In order to solve the problem, we proposed GA-based workflow scheduling technique. The scheme proposed in this study reduced the failure time and the rollback time. The rollback time in our scheme was less than that of the existing scheme (without GA) because our scheme adaptively performs task distribution according to the estimated execution time of available instances. The simulation results showed that the execution time in our scheme was improved on average by 7.06% after GA as compared to before GA. Additionally, the failure time after applying GA was reduced on average by 6.52% as compared to before GA. Therefore, our scheduling method achieved minimizing the execution time and the cost of running applications. In future, we plan to expand our environment with an efficient GA operation that takes into consideration the current state of available instances.
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2012R1A1A4A01015777).
BIO
Daeyong Jung is Integrated Master and Ph.D. candidates in the Department of Computer Science Education of Korea University. He received his B.S. degree in Department of Electronic Engineering from Hanbat National University, Daejeon, Korea, in 2007, His main research interests are cloud computing, grid computing, distributed computing, and fault-tolerance systems.
Taeweon Suh received the Ph.D. degree in Electrical and Computer Engineering from the Georgia Institute of Technology. He is currently an associate professor in the Department of computer science and engineering at Korea University. His research interests include embedded systems, computer architecture, parallel computer architecture & programming, and computer science education.
Heonchang Yu received the B.S., M.S., and Ph.D. degrees in computer science and engineering from Korea University, Seoul, Korea, in 1989, 1991, and 1994, respectively. He has been a Professor of computer science and engineering with Korea University since 1998. From February 2011 to January 2012, he was a Visiting Professor of electrical and computer engineering in Virginia Tech. Since 2011, he has been the Director of Korea Information Processing Society, Korea. Prof. Yu is the Vice President of the Korean Association of Computer Education and an Editor of the Korean Institute of Information Technology. He was awarded the Okawa Foundation Research Grant of Japan in 2008. His research interests include cloud computing, grid computing, distributed computing, and fault-tolerant systems.
JoonMin Gil received the B.S. and M.S. degrees in computer science from Korea University, Korea, in 1994 and 1996, respectively, and the Ph.D. degree in computer science and engineering from Korea University, Korea in 2000. From 2001 to 2002, he was a Visiting Research Associate with the Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA. From October 2002 to February 2006, he was a Senior Research Engineer with the Supercomputing Center, Korea Institute of Science and Technology Information (KISTI), Daejeon, Korea. In March 2006, he joined the Catholic University of Daegu, Korea, where he is currently an Associate Professor with the School of Information Technology Engineering. His recent research interests include cloud computing, distributed systems, wireless and sensor networks, and Internet computing.
References
2013 Elastic Compute Cloud (EC2) http://aws.amazon.com/ec2
Ferraris F.L. , Franceschelli D. , Gioiosa M.P. , Lucia D. , Ardagna D. , Di Nitto E. , Sharif T. 2012 “Evaluating the Auto Scaling Performance of Flexiscale and Amazon EC2 Clouds” in Proc. of Proceedings of 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) Article (CrossRef Link). 423 - 429
Van H.N. , Tran F.D. , Menaud J.M. 2009 “SLA-Aware Virtual Resource Management for Cloud Infrastructures” IEEE Computer Society in Proc. of Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology vol. 2, Article (CrossRef Link). 357 - 362
Komal M. , Ansuyia M. , Deepak D. 2013 “Round Robin with Server Affinity: A VM Load Balancing Algorithm for Cloud Based Infrastructure” Journal of Information Processing Systems Article (CrossRef Link). 9 (3) 379 - 394    DOI : 10.3745/JIPS.2013.9.3.379
Sabbir Hasan , Huh Eui-Nam 2013 “Heuristic based Energy-aware Resource Allocation by Dynamic Consolidation of Virtual Machines in Cloud Data Center” KSII Transactions on Internet & Information Systems Article (CrossRef Link). 7 (8) 1825 - 1842    DOI : 10.3837/tiis.2013.08.005
Shen Siqi , Deng Kefeng , Iosup Alexandru , Epema Dick 2013 “Scheduling jobs in the cloud using on-demand and reserved instances” in Proc. of Proceedings of the 19th international conference on Parallel Processing (Euro-Par'13) Article (CrossRef Link). 242 - 254
2013 Amazon EC2 spot Instances http://aws.amazon.com/ec2/spot-instances/
Yi S. , Kondo D. , Andrzejak A. 2010 “Reducing Costs of Spot Instances via Checkpointing in the Amazon Elastic Compute Cloud” IEEE Computer Society in Proc. of Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing Article (CrossRef Link). 236 - 243
Singer G. , Livenson I. , Dumas M. , Srirama S. N. , Norbisrath U. 2010 “Towards a model for cloud computing cost estimation with reserved resources” Springer in Proc. of Proceedings. of 2nd ICST International Conference on CloudComp 2010 Barcelona, Spain. Article (CrossRef Link).
Mazzucco M. , Dumas M. 2011 “Reserved or On-Demand Instances? A Revenue Maximization Model for Cloud Providers” in Proc. of Proceedings of the 4th IEEE International CLOUD 2011 Article (CrossRef Link). 428 - 435
Yi S. , Heo J. , Cho Y. , Hong J. 2007 “Taking point decision mechanism for page-level incremental checkpointing based on cost analysis of process execution time” Journal of Information Science and Engineering Article (CrossRef Link). 23 (5) 1325 - 1337
Voorsluys William , Buyya Rajkumar 2012 “Reliable Provisioning of Spot Instances for Compute-intensive Applications” in Proc. of IEEE 26th International Conference on Advanced Information Networking and Applications Article (CrossRef Link).
Zhang Qi , Gurses Eren , Boutaba Raouf , Xiao Jin 2011 “Dynamic resource allocation for spot markets in clouds” in Proc. of the 11th USENIX conference Hot-ICE'11 Article (CrossRef Link). 1 - 6
2013 Cloud exchange http://cloudexchange.org
Julia F. , Guitart J. , Torres J. 2010 “Checkpoint-based Fault-tolerant Infrastructure for Virtualized Service Providers” 12th IEEE/IFIP NOMS'10 Article (CrossRef Link). 455 - 462
Fernandez H. , Obrovac M. , Tedeschi C. 2012 “Decentralised Multiple Workflow Scheduling via a Chemically-coordinated Shared Space” INRIA Research Report, RR-7925 Article (CrossRef Link). 1 - 14
Liu K. , Chen J. , Yang Y. , Jin H. 2008 “A throughput maximization strategy for scheduling transaction-intensive workflows on SwinDeW-G” Concurrency and Computation: Practice and Experience Article (CrossRef Link). 20 (15) 1807 - 1820    DOI : 10.1002/cpe.1316
Hutt B. , Warwick K. 2007 “Synapsing Variable-Length Crossover: Meaningful Crossover for Variable-Length Genomes” IEEE Transactions on Evolutionary Computation Article (CrossRef Link). 11 (1) 118 - 131    DOI : 10.1109/TEVC.2006.878096
Holland John H. 1975 “Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence” U. Michigan Press Article (CrossRef Link).
Miikkulainen Risto 1992 “Using marker-based genetic encoding of neural networks to evolve finite-state behavior” in Proc. of Toward a Practice of Autonomous Systems: Proceedings of the First European Conference on Artificial Life Article (CrossRef Link). 252 - 262
Gu J. , Hu J. , Zhao Tianhai , Sun Guofei 2012 “A new resource scheduling strategy based on genetic algorithm in cloud computing environment” Journal of Computers Article (CrossRef Link). 7 (1) 42 - 52    DOI : 10.4304/jcp.7.1.42-52
Kaur S. , Verma A. 2012 “An Efficient Approach to Genetic Algorithm for Task Scheduling in Cloud Computing Environment” International Journal of Information Technology and Computer Science (IJITCS) Article (CrossRef Link). 4 (10) 74 - 79    DOI : 10.5815/ijitcs.2012.10.09
Omara Fatma A. , Arafa Mona M. 2010 “Genetic algorithms for task scheduling problem” Journal of Parallel and Distributed Computing (JPDC) Article (CrossRef Link). 70 (7) 758 - 766    DOI : 10.1016/j.jpdc.2010.03.011
Jung D. , Chin S. , Chung K. , Yu H. , Gil J. 2011 “An Efficient Checkpointing Scheme Using Price History of Spot Instances in Cloud Computing Environment” in Proc. of Proceeding of NPC2011 Article (CrossRef Link). 185 - 200