Learning an Artificial Neural Network Using Dynamic Particle Swarm Optimization–Backpropagation: Empirical Evaluation and Comparison

Journal of Information and Communication Convergence Engineering.
2015.
Jun,
13(2):
123-131

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/li-censes/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

- Received : October 23, 2014
- Accepted : December 17, 2014
- Published : June 30, 2015

Download

PDF

e-PUB

PubReader

PPT

Export by style

Share

Article

Metrics

Cited by

TagCloud

Training neural networks is a complex task with great importance in the field of supervised learning. In the training process, a set of input–output patterns is repeated to an artificial neural network (ANN). From those patterns weights of all the interconnections between neurons are adjusted until the specified input yields the desired output. In this paper, a new hybrid algorithm is proposed for global optimization of connection weights in an ANN. Dynamic swarms are shown to converge rapidly during the initial stages of a global search, but around the global optimum, the search process becomes very slow. In contrast, the gradient descent method can achieve faster convergence speed around the global optimum, and at the same time, the convergence accuracy can be relatively high. Therefore, the proposed hybrid algorithm combines the dynamic particle swarm optimization (DPSO) algorithm with the backpropagation (BP) algorithm, also referred to as the DPSO-BP algorithm, to train the weights of an ANN. In this paper, we intend to show the superiority (time performance and quality of solution) of the proposed hybrid algorithm (DPSO-BP) over other more standard algorithms in neural network training. The algorithms are compared using two different datasets, and the results are simulated.
E
), such as a sum of squares function, which is a differentiable function of the network outputs, then this error function is itself a differentiable function of the weights. We can therefore evaluate the derivatives of the error with respect to the weights, and these derivatives can then be used to find the weight values that minimize the error function, by using any of the learning algorithms such as the BP, conjugate gradient, quasi-Newton, and LevenbergMarquardt (LM) algorithms
[2]
. From the perspective of mathematical programming, supervised batch training of a neural network is a classical nonlinear optimization problem: find the minimum of the error function given some set of training data. Traditionally, this is accomplished by a suitable local descent technique, such as BP. The independent variables are the weights
w
, and the objective function is usually the sum of squared errors (although other measures of error are also used). The objective function is formulated mathematically in Eq. (1).
Here,
f
denotes the transfer function;
w
_{o}
, the output weights;
w
_{h}
, the hidden layer weights;
x_{k}
, the input training data;
y_{k}
, the desired output; and
z_{k}
, the activations of hidden neurons. Despite its popularity, BP has been widely criticized for its inefficiency, and more advanced minimization techniques, such as conjugate gradient and LM methods, are available. Yet, all these techniques converge to the closest local minimum of the error function, which is very unlikely to be the global one. As a consequence, a network trained with a local algorithm may exhibit marginal performance. In this connection, the primitive BP may result in a better solution than more sophisticated methods, because its disadvantages turn into the benefits of avoiding some shallow local minima. Thee problem of many local minima has been widely addressed in the past. It was shown that training even a simple perceptron with a nonlinear transfer function may result in multiple minima. The remedies include starting the local descent from several random points, using tabu search, simulated annealing, and genetic algorithms
[3]
. The new stochastic optimization algorithms significantly outperform the local methods, yet they do not provide any guarantee that their solution is the global minimum indeed. Moreover, thee number of local minima of the error function grows exponentially with the number of neurons, and the likelihood that these stochastic methods will find the global minimum is not very high.
Artificial neural network neuron.
The BP algorithm is a classical domain-dependent technique for supervised training. It works by measuring the output error, calculating the gradient of this error, and adjusting the ANN weights
[4]
(and biases) in the descending gradient direction.
Hence, BPP is a gradient descent local search procedure (expected to stagnate in the local optima in complex landscapes). The squared error of the ANN
[5]
for a sset of patterns is calculated using Eq. (2).
The actual value of the previous expression depends on the weights of the network. The basic BP algorithm calculates the gradient of
E
(for all the patterns) and updates the weights by moving them along the direction of the gradient descent. This can be summarized with the expression △
w
= -
η
∇
E
, where the paramenter
η
> 0 is the learning rate that controls the learning speed. The pseudocode of the BP algorithm is shown in
Fig. 2
.
Pseudo-code of the backpropagation algorithm.
i
-th particle is represented as
X_{i}
= (
x_{il}
,
x_{i}
_{2}
, ...,
x_{iD}
). The best previous position (the position giving the best fitness value) of the
i
-th particle is recorded and represented as
P_{i}
= (
p_{il}
,
p_{i}
_{2}
, ...,
p_{iD}
). The index of the best particle among all the particles in the population is denoted by the symbol
g_{b}
representing the global best value. The index of the best position for each particle in the population is denoted by the symbol
i_{b}
representing the individual's best value. The rate of the position change (velocity) for particle
i
is represented as
V_{i}
in the following equation: (
v_{il}
,
v_{i}
_{2}
, ...,
v_{iD}
). The particles are manipulated according to the following equation.
The algorithm can be summarized as follows (in
Fig. 3
):
Particle swarm optimization flowchart.
where
c
_{1}
and
c
_{2}
are the acceleration constants with positive values and rand() is a random number between 0 and 1. Further, w is the inertia weight
[12]
, until a new-generation set of particles is generated; then, these new particles are used to search the global best position in the solution space. Finally, the BP algorithm is used to search around the global optimum. Thus, this hybrid algorithm may find an optimum more quickly.
The procedure for this PSO-BP algorithm can be summarized as follows:
This algorithm has a parameter called the learning rate
[13]
that controls the convergence of the algorithm to an optimal local solution; however, obtaining a good value for this parameter is difficult.
g
_{best}
position after one iteration, then stagnation occurs in the population. Then, the solution may be a local optimal solution. That is, due to its stochastic behavior, it is not possible to find a way to the global optimum. The drawback of PSO is that the swarm may converge prematurely. The underlying principle behind this problem is that for the global best PSO, particles converge to a single point, which is on the boundary between the global best and the personal best positions. This point is not guaranteed for a local optimum. Another reason of this problem is the fast rate of information flow between particles, resulting in the creation of similar particles with a loss in diversity that increases the possibility of being trapped in the local optima. A further drawback is that stochastic approaches have a problem-dependent performance. This dependency usually results from the parameter settings in each algorithm. The different parameter settings for a stochastic search algorithm result in high performance variances. In general, no single parameter setting can be applied to all problems. Increasing the inertia weight (
w
) will increase the particle speed, resulting in more exploration (global search) and less exploration (global search). On the other hand, reducing the inertia weight will decrease the particle speed, resulting in more exploitation and less exploration. Thus, finding the best value for the parameter is not an easy task and may differ form one problem to another. Therefore, from the above, it can be concluded that the PSO performance is problem dependent. The problem-dependent performance can be addressed through a hybrid mechanism that combines different approaches in order to benefit from the advantages of each approach.
To overcome such limitations, a multiple-swarm PSO algorithm called dynamic multiple swarms in PSO is proposed in which the number of swarms is adaptively adjusted throughout the search process via d dynamic swarm strategy. The strategy allocates an appropriate number of swarms required to support the convergence and diversity criteria among the swarms. The main objective of this is to develop a multiple-swarm PSO that eliminates the need to estimate an initial number of swarms to improve the computational efficiency without compromising the performance of the algorithm. Once the swarm template (
x_{new}
) is generated, the template is perturbed to generate a swarm of particles. In order for perturbation to happen, the perturbation region centered around
x_{new}
needs to be defined; in addition to
x_{new}
, a particle from every swarm is randomly selected. For example,
Fig. 4
shows eight randomly selected particles from eight different swarms in addition to
x_{new}
. The black circle in
Fig.. 4
represents the coordinate of
x_{new}
, i.e., (
x_{new}
,
j_{a}
,
x_{new}
,
j_{b}
). Since there are more than two corners around the coordinate of x
_{new}
(i.e., represented by the '
x
' symbol and labeled as Z1-Z6 in
Fig. 4
), a corner is randomly selected. In
Fig. 4
, the selected corner is Z2 and is denoted as v. The distanced between the center ans Z2, i.e.,
D
, is computed to form the perturbation region of
x_{new}
. The proposed algorithm, DPSO, involves two key strategies: a swarm growing strategy to allocate more swarms if necessary and justified, and a swarm declining strategy to eliminate swarms that fail to contribute in the search of a Pareto front. Additional designs are included to support the aforementioned two strategies. These designs include the following:
In DPSO adding and removing the swarms throughout the search process will directly affect the swarm population distribution. Instead of applying the Pareto ranking method to update the Pareto rank of the particles and applying a niching strategy to estimate the density of the particles when the swarm population progresses at every iteration.
In the DPSO-BP algorithm, to know when the search process is transited from the particle swarm search to the gradient descent search, a heuristic way was introduced.
Generation of swarmtemplated x_{new} .
That is, when the best fitness value in the history of all particles does not change for some generations (i.e., ten generations), the search process is transferred to the gradient descent search. When the best fitness does not change for some generations, all the particles may lose the ability to find a better solution; at this time, a gradient descent search can be used to obtain better results. If the rank values of a current swarm and its recorded swarm leader have the same rank value, then the pure Pareto ranking method is applied to both the swarm leaders. If the current swarm dominates the recorded swarm leader, then the current one will replace the recorded one. If both do not dominate each other, one of them is randomly chosen to update the local best archive of swarms. The new velocity and position are given in Eq. (5).
where
is the
j
-dimensional velocity of swarm member
i
of swarm
n
in iteration
t
;
the
j
-dimensional position of swarm member
i
of swarm
n
in iteration
t
;
the
j
-dimensional personal best (
p
_{best}
) position of the swarm members
i
of swarm
n
in iteration
t
;
the
j
-dimensional global best (
g
_{best}
) selected from the archive in iteration
t
;
the
j
-dimensional local best position of the swarm leader of swarm
n
in iteration
t
;
and (with a superscript
the
j
-dimensional local best position of any swarm leaders other than their own swarm leader in iteration
t
. Further,
r
_{1}
,
r
_{2}
, and
r
_{3}
are random numbers within [0,1] that are regenerated every time they occur;
w
is the inertial weight that varies between 0.1 and 0.5 at every iteration; and
c
_{1}
,
c
_{2}
, and
c
_{3}
are the acceleration constants. Each of the acceleration constants is randomly varied between 1.5 and 2 to place different emphasis on the components. When the objective space compression strategy
[14]
is applied several times in the early iterations, there is a possibility that the objective space is overly compressed and can cause the boundaries of the objective to not cover the true Pareto front.
c
_{1}
and
c
_{2}
be 2.0,
r
_{1}
and
r
_{2}
be two random numbers in the range of [0,1]. The maximum velocity was assumed as 10 and the minimum velocity as -10. The initial velocities of the initial particles were generated randomly in the range of [0,1]. After each iteration, if the calculated velocity was larger or smaller than the maximum velocity or the minimum velocity, velocity would be reset to 10 or -10. The population size was a variable that was set according to the dataset. In this example, we trained an ANN with the structure of 1–S1–1. The training algorithms used were the DPSO-BP, DPSO, and BP algorithms. Assuming that S1 = 3, 4, 5, 6, 7,
x
was obtained from the range of [0,
p
] and the training set was obtained at an identical sampling interval of 0.03 from [0,
p
]; further, the test set was obtained at an identical sampling interval of 0.1 from 0.02 to p. For every fixed hidden unit number, we ran the training algorithms for each dataset. We set the maximum number of training iterations at 7,000 times for the algorithms.
Algorithms for training ANNs were compared. Tests were conducted on gradient descent algorithms such as BP, and population-based heuristics such as PSO. Experimental results showed that DPSO–BP outperformed all other algorithms in training neural networks. In this study, the DPSO–BP algorithm, which is a new, simple, and robust optimization algorithm, was used to train the standard and e-learning datasets for classification purposes. Training procedures involved the selection of the optimal values of the parameters, such as the weights between the hidden layer and the output layer, spread parameters of the hidden layer base function, center vectors of the hidden layer, and bias parameters of the neurons of the output layer.
PSO and PSO-BP algorithms showed better performance than derivative-based methods; however, these algorithms had the disadvantage of a slow convergence rate. Trapping a local minimum was a disadvantage for these algorithms. When the learning performances were compared, experimental results showed that the performance of the proposed algorithm was better than that of the others.
The success of the classification results of the test problems was superior and correlated with the results of many research
[9
,
10
,
12]
. In real-time applications, the number of neurons might affect the time complexity of the system. The results of the e-learning classification problem were reasonable and might help training algorithms in other e-learning applications.
Fig. 5
shows the recognition rate of different algorithms while training an ANN in the thyroid dataset. Here, the red line denotes the proposed algorithm, while the blue and black lines denote the results of PSO and PSO-BP, respectively. It can be inferred from
Fig. 5
that PSO-BP has a better recognition rate than PSO. Apparently, the PSO algorithm has a very low recognition rate while learning the ANN, but when it is combined with the BP algorithm, the mean recognition rate increases, as shown in
Fig. 5
. However, the proposed algorithm again increases the rate of recognition due to its dynamic nature. This shows that the DPSO-BP algorithm is more stable, while in the training process, the DPSO-BP algorithm uses less CPU time than the PSO-BP algorithm and the PSO algorithm.
Recognition rate of PSO, PSO-BP, and DPSO-BP. PSO: particle swarm optimization, BP: backpropagation, DPSO: dynamic PSO.
Fig. 6
illustrates the curves of the training errors and the testing errors for the three training algorithms using the elearning dataset.
Fig. 6
(a), (c), and (e) show the training error curves of the PSO, PSO-BP, and DPSO-BPA algorithms, respectively.
Fig. 6
(b), (d), and (f) illustrate the testing error curves of the PSO, PSO-BP, and DPSO-BPA algorithms, respectively. When the value of
x
is smaller, there are fewer training samples and testing samples. Further, it can be seen from
Fig. 6
that the DPSO algorithm has better training and testing errors than the BP algorithm. When the value of
x
is larger, there are more training samples and testing samples. Moreover, it can be seen that the BP algorithm has better training and testing errors. The hybrid algorithm combines the advantages of the DPSO algorithm and the BP algorithm; therefore, it can be observed that the DPSO-BP algorithm has better training and testing errors.
Training and testing error curves for e-learning dataset using PSO, PSO-BP, and DPSO-BP. PSO: particle swarm optimization, BP: back-propagation, DPSO: dynamic PSO.
Swagatika Devi
received her B.Tech and M.Tech degrees in Computer Science and Engineering. She is working as an assistance professor in the department of Computer Science and Engineering in SOA University, Bhubaneswar, Orissa. She got gold medal during her M.Tech career. She contributed many international journals and presented research papers in different international conferences and also has tremendous contribution towards research book chapters. Her research interests include soft computing, data mining and bio-informatics.
Alok Kumar Jagadev
is working as an Associate Professor in the Department of Computer Science & Engineering. He has obtained his Ph.D. degree for his work in the field of Wireless Ad-hoc Networks from Siksha ‘O’ Anusandhan University in 2011. He has contributed more than 30 papers in various journals and conferences of international repute. He has contributed three book chapters. He has authored/co-authored four text books in the field of computer science. He has also edited two books for different international publications like IGI global, Springer. He has involved in organizing many international conferences. His research interest includes soft computing, data mining, bio-informatics, etc.
Srikanta Patnaik
is a Professor in the Department of Computer Science and Engineering, SOA University, Bhubaneswar, India. He has received his Ph.D. (Engineering) on Machine Intelligence from Jadavpur University in 1999 and supervised 12 Ph.D.theses and more than 30 M. Tech theses in the area of machine intelligence, soft computing applications and reengineering. He has published around 100 technical papers in international journals and conference proceedings. He is author of 2 textbooks and edited 12 books and few invited book chapters, published by leading international publisher like Springer-Verlag, Kluwer Academic, etc. He was the Principal Investigator of AICTE sponsored TAPTEC project “Building Cognition for Intelligent Robot” and UGC sponsored Major Research Project “Machine Learning and Perception using Cognition Methods”. He is the Editors-in-Chief of International Journal of Information and Communication Technology and International Journal of Computational Vision and Robotics published from Inderscience Publishing House, England and also Editors-in-chief of Book Series on “Modeling and Optimization in Science and Technology” published from Springer, Germany.

I. INTRODUCTION

Artificial neural networks (ANNs) are nonlinear mapping structures based on the function of the human brain. They are powerful tools for modeling, particularly because the underlying data relationship is unknown. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. An ANN is configured for a specific application, through a learning process. Learning in a biological system involves adjustment to the synaptic connections that exist between the neurons. ANNs can identify and learn the correlated patterns between input datasets and the corresponding target values. After training, ANNs can be used to predict the input data. ANNs imitate the learning process of the human brain and can process problems involving nonlinear and complex data even if the data are imprecise and noisy. Thus, they are ideal for the modeling of complex and often nonlinear data. ANNs have great capacity for predictive modeling; i.e., all the characters describing the unknown situation can be presented to the trained ANNs, and then, the prediction of systems is guaranteed. ANNs are capable of performing many classification, learning, and function approximation tasks, yet in practice, they sometimes deliver only marginal performance. Inappropriate topology selection and weight training are frequently blamed for this. Neural networks are adjusted, or trained, such that a particular input leads to a specific target output. The ANN weights are adjusted on the basis of a comparison of the output and the target, until the network output matches the target. Typically, many such input/target pairs are needed to train a network. Increasing the number of hidden layer neurons helps to improve the network performance, yet many problems can be solved with very few neurons if only the network takes its optimal configuration. Unfortunately, the inherent nonlinearity of an ANN
[1]
results in the existence of many suboptimal networks, and a considerable majority of training algorithms converge to these suboptimal configurations. The problem of multiple local minima in neural networks has been widely addressed. The proposed solutions include multiple starts from randomly chosen initial points, simulated annealing, random perturbation, diffusion techniques, and evolutionary computing. Most of these methods are probabilistic in nature: They can find the globally optimal solution with a certain probability, which depends on the number of iterations of the algorithm. In this study, an ANN is trained using the proposed hybrid approach. The rest of this paper is organized as follows: Section II presents a brief introduction to different existing ANN learning algorithms with their pros and cons. The proposed hybrid DPSO-BP approach is introduced in Section III. Simulation results and comparisons are provided in Section IV to demonstrate the effectiveness and potential of the proposed hybrid algorithm. Finally, several conclusions are presented in Section V.
II. NEURAL NETWORK LEARNING ALGORITHMS

Training neural networks is a complex task of great importance in the field of supervised learning. ANNs have been shown to have the potential to perform well for classification problems in many different environments, including business, science, and engineering. A majority of the studies on this topic rely on a gradient algorithm, typically a variation of backpropagation (BP), to obtain the weights of the model. Various learning techniques are introduced to optimize the weights of an ANN. Although the limitations of gradient search techniques applied to complex nonlinear optimization problems, such as the ANN, are well known, many researchers still choose to use these methods for network optimization. ANNs can identify and learn the correlated patterns between input datasets and the corresponding target values. After training, ANNs can be used to predict input data. ANNs imitate the learning process of the human brain and can process problems involving nonlinear and complex data even if the data are imprecise and noisy. Different learning algorithms are described below.
- A. BP Algorithm

In an ANN, activation functions of the output units become differentiable functions of the input variables and of the weights and biases, as shown in
Fig. 1
. If we define an error function (
PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

- B. Particle Swarm Optimization Algorithm

The particle swarm optimization (PSO) algorithm was first introduced by Kennedy and Eberhart
[6]
. Instead of using evolutionary operators to manipulate the individuals, as in the case of other evolutionary computational algorithms, each individual in the PSO flies in the search space with a velocity that is dynamically adjusted according to its own flying experience and its companions' flying experience. Each individua is treated as a volume-less particle (a point) in the D-dimensional search space. The
PPT Slide

Lager Image

- 1. Initialize the position and velocity of all the particles randomly in the N-dimensional space.
- 2. Evaluate the fitness value of each particle, and update the global optimum position.
- 3. According to the changing gathering degree and the steady degree of the particle swarm, determine whether all the particles are re-initialized or not.
- 4. Determine the individual's best fitness value. Compare thepivalue of every individual with its current fitness value. If the current fitness value is better, assign the current fitness value topi.
- 5. Determine the current best fitness value in the entire population. If the current best fitness value is better thanpg, assign the current best fitness value topg.
- 6. For each particle, update the particle velocity.
- 7. Update the particle position.
- 8. Repeat Steps 2–7 until a stop criterion is satisfied or a predefined number of iterations are completed.

PPT Slide

Lager Image

- C. PSO-BP Algorithm

PSO-BP is an optimization algorithm combining PSO with BP
[7
,
8]
.The PSO algorithm
[9]
is a global algorithm that has a strong ability to find the global optimistic result. However, this algorithm has a disadvantage that the search around the global optimum is very slow. In contrast, the BP algorithm has a strong ability to find the local optimistic result, but its ability to find the global optimistic result is weak. The fundamental idea for this hybrid algorithm is that at the beginning stage of searching for the optimum, PSO
[10]
is employed to accelerate the training speed. When the fitness function value has not changed for some generations, or the change in value is smaller than a predefined number, the search process is switched to the gradient descent search according to this heuristic knowledge. The PSO-BP
[11]
algorithm’s search process also starts by initializing a group of random particles. First, all the particles are updated according to Eq. (4).
PPT Slide

Lager Image

- 1. Initialize the positions and velocities of a group of particles randomly in the range of [0,1].
- 2. Evaluate each initialized particle’s fitness value, and setPbas the positions of the current particles andPgas the best position of the initialized particles.
- 3. If the maximal iterative generations are arrived, go to Step 8; else, go to Step 4.
- 4. The best particle of the current particles is stored. The positions and velocities of all the particles are updated according to Eq. (4), and then, a group of new particles is generated. If a new particle flies beyond the boundary [Xmin, Xmax], the new position will be set asXminorXmax; if a new velocity is beyond the boundary [Vmin, Vmax], the new velocity will be set asVminorVmax.
- 5. Evaluate each new particle’s fitness value, and replace the worst particle by the stored best particle. If thei-th particle’s new position is better thanPib,Pibis set as the new position of thei-th particle. If the best position of all new particles is better thanPg, thenPgis updated.
- 6. Reduce the inertia weights w according to the selection strategy.
- 7. If the currentPgis unchanged for ten generations, then go to Step 8; else, go to Step 3.
- 8. Use the BP algorithm to search aroundPgfor some epochs. If the search result is better thanPg, output the current search result; else, outputPg. This is only the first type of condition; we can also use Steps 9–11 to replace the above Steps 6–8, and then, obtain the second type of condition.
- 9. Use the BP algorithm to search aroundPgfor some generations. If the search result is better thanPg,Pgis set for the current search result. Else, compare it with the worst particle of the current particles; if it is better than the best particle, use it to replace the worst particle; else, go to Step 7.
- 10. Reduce the inertia weightsw.
- 11. Output the global optimumPg. The parameterwin the PSO-BP algorithm also reduces gradually as the iterative generation increases.

III. PROPOSED ALGORITHM (DYNAMIC PSO-BP)

PSO combined with BP gives good result in learning but due to some cons of basic PSO, we modified it with some dynamic constraints where during the learning process of the ANN objective space can be compressed or expanded. In PSO, each particle should be kept in a confined space corresponding to the parameter limitations. This decreases the diversity of the particle. If the global best particle does not change its
- 1) cell-based rank density estimation scheme to keep track of the rank and density values of the particles;
- 2) objective space compression and expansion strategy to adjust the size of the objective space whenever needed to progressively search for a high-precision true parteto fornt;
- 3) PSO updating equation modified to exploit its usefulness and to accommodate the multiple swarm concept; and
- 4) local best archive of swarms updated on the basis of the progression of its swarm representatives, the swarm leader.

- 1. If the local best archive of swarms is empty or the reinitialized parameter is triggered, record rank values of all swarm leaders, including their corresponding positions and the positions of their respective swarm members.
- 2. If the local best archive of swarms is nonempty, compare the rank values of the swarm leaders in the current iteration with those recorded in the local best archive of swarms. Any current swarm leader that has a relatively low rank value is identified, and its rank value, its position, and the positions of its corresponding swarm members will replace the recorded values. For this purpose, BP is uses.

PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

IV. EXPERIMENTAL RESULT AND DISCUSSION

- A. Standard Datasets

We choose two different datasets of small and large dimensions for the experiment. It is assumed that the proposed hybrid learning algorithm works in both these environments. The datasets are the e-learning dataset (number of patterns = 90) and the thyroid dataset (number of patterns = 7,200).
- B. Results and Algorithm Settings

In the following experiments, two datasets are chosen for comparing the performances of the BP, PSO, PSO-BP, and DPSO-BP algorithms in evolving the weights of the ANN. Suppose that every weight in the network was initially set in the range of [-50,50], and all thresholds in the network were 0 s. Further, suppose that every initial particle was a set of weights generated at random in the range of [0,1]. Let the initial inertial weight w be 1.8, the acceleration constants, both
PPT Slide

Lager Image

PPT Slide

Lager Image

V. CONCLUSIONS

In this paper, a hybrid DPSO-BP algorithm is proposed, which combines the PSO algorithm’s strong ability of global learning and the BP algorithm’s strong ability of local learning. Hence, we can obtain better training results by using this hybrid algorithm. Some heuristic knowledge is adapted to transit from the DPSO algorithm search to the BP algorithm search. That is, when the best fitness value in the history of all particles does not change for some generations (i.e., ten generations), the search process is transferred to the gradient descent search. The heuristic way is used to avoid wasting too much CPU time on a vain search (as used in the other compared algorithms); therefore, the training efficiency of the DPSO-BP algorithm is improved considerably. A different selection strategy is introduced for updating the inertial weight w. In the initial searching stage, the searching inertial weight is reduced rapidly in order to rapidly achieve the global optima. Then, around the global optimum, we reduce the inertial weight more smoothly by using BP so that a higher accuracy can be achieved.
From the conducted experiments, we conclude that for the same goal, the DPSO-BP algorithm uses less CPU time and provides higher training accuracy than the PSO algorithm and the BP algorithm. A comparative study shows that the performance of the variant is competitive in comparison with the selected algorithms on standard benchmark problems. It is concluded that the DPSO-BP algorithm is more stable than the BP algorithm and the PSO algorithm. In future research works, we shall focus on how to apply this hybrid PSO algorithm to solve more practical problems.
BIO

Salerno J.
“Using the particle swarm optimization technique to train a recurrent neural model,”
in Proceedings of IEEE 9th International Conference on Tools with Artificial Intelligence
Newport Beach, CA
1997
45 -
49

Mangasarian O. L.
1993
“Mathematical programming in neural networks,”
ORSA Journal on Computing
5
(4)
349 -
360
** DOI : 10.1287/ijoc.5.4.349**

Kuan C. M.
,
Hornik K.
1991
“Convergence of learning algorithms with constant learning rates,”
IEEE Transactions on Neural Networks
2
(5)
484 -
489
** DOI : 10.1109/72.134285**

Ergezinger S.
,
Thomsen E.
1995
“An accelerated learning algorithm for multilayer perceptrons: optimization layer by layer,”
IEEE Transactions on Neural Networks
6
(1)
31 -
42
** DOI : 10.1109/72.363452**

Angeline P. J.
,
Saunders G. M.
,
Pollack J. B.
1994
“An evolutionary algorithm that constructs recurrent neural networks,”
IEEE Transactions on Neural Networks
5
(1)
54 -
65
** DOI : 10.1109/72.265960**

Kennedy J.
,
Eberhart R. C.
2001
Swarm Intelligence.
Morgan Kaufmann
San Francisco, CA

Gori M.
,
Tesi A.
1992
“On the problem of local minima in backpropagation,”
IEEE Transactions on Pattern Analysis and Machine Intelligence
14
(1)
76 -
86
** DOI : 10.1109/34.107014**

Weir M. K.
1991
“A method for self-determination of adaptive learning rates in back propagation,”
Neural Networks
4
(3)
371 -
379
** DOI : 10.1016/0893-6080(91)90073-E**

Van den Bergh F.
,
Engelbrecht A. P.
2004
“A cooperative approach to particle swarm optimization,”
IEEE Transactions on Evolutionary Computation
8
(3)
225 -
239
** DOI : 10.1109/TEVC.2004.826069**

Shi Y.
,
Eberhart R. C.
“A modified particle swarm optimizer,”
in Proceedings of IEEE World Congress on Computational Intelligence
Anchorage, AK
1998
69 -
73

Kennedy J.
,
Eberhart R. C.
“A discrete binary version of the particle swarm algorithm,”
in Proceedings of IEEE International Conference on Systems, Man, and Cybernetics
Orlando, FL
1997
4104 -
4108

Abraham A.
,
Nath B.
2001
“ALEC: an adaptive learning framework for optimizing artificial neural networks,” inComputational Science-ICCS 2001.
Springer
Heidelberg
171 -
180

Gupta H. V.
,
Hsu K. L.
,
Sorooshian S.
“Superior training of artificial neural networks using weight-space partitioning,”
in Proceedings of International Conference on Neural Networks
Houston, TX
1997
1919 -
1923

Tang K. S.
,
Chan C. Y.
,
Man K. F.
,
Kwong S.
“Genetic structure for NN topology and weights optimization,”
in Proceedings of the 1st International Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications (GALESIA)
Sheffield, UK
1995
250 -
255

Citing 'Learning an Artificial Neural Network Using Dynamic Particle Swarm Optimization–Backpropagation: Empirical Evaluation and Comparison
'

@article{ E1ICAW_2015_v13n2_123}
,title={Learning an Artificial Neural Network Using Dynamic Particle Swarm Optimization–Backpropagation: Empirical Evaluation and Comparison}
,volume={2}
, url={http://dx.doi.org/10.6109/jicce.2015.13.2.123}, DOI={10.6109/jicce.2015.13.2.123}
, number= {2}
, journal={Journal of Information and Communication Convergence Engineering}
, publisher={The Korean Institute of Information and Commucation Engineering}
, author={Devi, Swagatika
and
Jagadev, Alok Kumar
and
Patnaik, Srikanta}
, year={2015}
, month={Jun}