Neural fuzzy system (NFS) is basically a fuzzy system that has been equipped with learning capability adapted from the learning idea used in neural networks. Due to their outstanding system modeling capability, NFS have been widely employed in various applications. In this article, we intend to discuss several ideas regarding the learning of NFS for modeling systems. The first issue discussed here is about structure learning techniques. Various ideas used in the literature are introduced and discussed. The second issue is about the use of recurrent networks in NFS to model dynamic systems. The discussion about the performance of such systems will be given. It can be found that such a delay feedback can only bring one order to the system not all possible order as claimed in the literature. Finally, the mechanisms and relative learning performance of with the use of the recursive least squares (RLS) algorithm are reported and discussed. The analyses will be on the effects of interactions among rules. Two kinds of systems are considered. They are the strict rules and generalized rules and have difference variances for membership functions. With those observations in our study, several suggestions regarding the use of the RLS algorithm in NFS are presented.
1. Introduction
System identification is a very important issue in system engineering. In recent decades, neural networks
[1

3]
and fuzzy systems
[4

7]
are often used to model complicated systems. In these modeling approaches, the task is to obtain a set of fuzzy rules or a neural network that can overall act like the system to be modeled. These approaches can construct systems directly from the inputoutput relationship without the use of any domain knowledge. Thus, they are often referred to as modelfree estimators
[1]
. Neural fuzzy system (NFS)
[8

10]
is basically a fuzzy system that has been equipped with learning capability adapted from the learning idea used in neural networks. Due to their outstanding system modeling capability, NFS have been widely employed in various applications. Basically, the main concern of NFS is learning. As mentioned, NFS is a modelfree estimator. Thus, it is natural to consider learning as the main issue for system performance. In fact, the structure used to learn may also bring significant differences for those approaches. This can be seen from
[11
,
12]
about the comparison of fuzzy systems and neural networks. In this article, we intend to discuss several ideas regarding the learning of NFS for modeling systems.
The first issue discussed here is about structure learning techniques. It is easy to see that when the number of input variables or the number of fuzzy sets for a variable increase the fuzzy rule numbers will exponentially increase. But, usually, meaningful data patterns do not spread out in the whole region. Thus, it is not necessary to use all possible rules in the system for learning. Then to define which rules should be used can be conducted through the socalled selforganization process. This process is usually referred to as the structure learning stage. It is to define rules from data. Such an idea is first proposed in
[13]
and later on, various approaches were proposed. Those ideas are introduced in this article.
The second issue is about the use of recurrent networks in NFS to model dynamic systems. This kind of approach is often called recurrent NFS or RNFS in the literature. Recurrent networks are those networks of which the inputs of some nodes are from following layers so that it forms loops. RNFS is to have feedback links from some layers to some nodes (usually are input nodes). In this approach, the feedback links are with delay and it can be found that in this case, there is no signal chasing phenomenon as expected for recurrent networks. Thus, RNFS does not have various problems considered in recurrent networks, like stability. To distinguish this difference, in our study, it is referred to as delay feedback NFS instead of recurrent NFS. In this article, the discussion about the performance of such systems will be given. It can be found
[14]
that such a delay feedback can only bring one order to the system not all possible order as claimed in the literature.
Finally, we will consider the use of the issue of using recursive least square (RLS) algorithms for the learning of fuzzy rule consequences and the rule correlation effects in NFS. Usually the consequence parts of NFS are characterized by singletons or linear functions
[13
,
15
,
16]
. When linear functions are considered, two different kinds of update rules, Backpropagation (BP) and RLS algorithms can be used for updating those parameters. The BP algorithm is adapted from the learning concept of neural network
[11
,
17]
and is easy to implement. In BP, the current gradient, which can be viewed as the local information is used to update parameters. Thus, BP algorithm may suffer from low convergence speed and/or being trapped in local minima. On the other hand, adaptive neuronfuzzy inference system (ANFIS)
[15]
and self constructing neural fuzzy inference network (SONFIN)
[13]
are to use the RLS algorithm originally proposed in
[16]
in the learning process. However, in practical applications, there are problems and then various remedy mechanisms may be needed while using RLS for the learning process in NFS. We have analyzed the effects of those approaches in our previous work
[18
,
19]
. In this article, the interaction between rules on consequent part and RLS algorithm will be analyzed. Furthermore, the operation of resetting the covariance matrix is also discussed.
2. General Description of NFS
NFS have been widely used in last two decades. ANFIS
[15]
is the mostoften mentioned NFS. In the learning phase, ANFIS uses all possible combinations of fuzzy sets in defining rules and it can be expected that some rules may be useless. As being equipped with structure learning capability, the SONFIN is proposed in
[13]
. The structure of SONFIN is created dynamically in the learning process and the system only uses rules that are necessary. For the other parts, there is no different between SONFIN and ANFIS. They are to realize a fuzzy model of the following form:
where
A_{ij}
is a fuzzy set for input
x_{j}
and
a_{ij}
’s for
j
= 1, . . . ,
n
are the consequent parameters for the
i
th rule. The output of the fuzzy system is to compute the overall output as the weighting sum of all incoming signals as
where
f_{i}
and
w_{i}
are the output and the firing strength, respectively, of the
i
th rule. By using a hybrid learning procedure
[11]
, ANFIS can tune both the membership function parameters of the premise part and all
a_{ij}
’s of the consequent part of the fuzzy rules. As mentioned before, SONFIN and ANFIS have the same structure. Since the membership function is tuned in the learning process, those membership functions used are Gaussian functions. With Gaussian functions, the system will tune their means (membership centers) and variances (membership widths).
In the learning phase of SONFIN, rules are created dynamically as learning proceeds upon receiving training data. Three learning processes are conducted simultaneously in SONFIN to define both the premise and consequent structure identification of a fuzzy ifthen rule. They are (A) input/output space partitioning, (B) construction of fuzzy rules, and (C) parameter identification. Processes A and B serve as the structure learning, and process C belongs to the parameter learning phase. In the structure learning phase, when a new fuzzy set is needed, its width (variance) can be defined by different initial values. When a small value is considered, it can be regarded as using the linguistic hedge “very.” This kind of discussion about fuzzy membership functions is given in
[20]
, in which different parameter identification methods are considered and discussed. The learning process (A) is to partition the input space based on the input data distribution. In the process, when the current existing rules of the system cannot sufficiently cover the new input pattern, the system will created a new rule from this input pattern. To sufficiently cover means the current input pattern cannot have a sufficiently large firing strength from all existing rules. This firing strength threshold will decide the number of input and output clusters generated in the SONFIN and in turn will determine the number of rules used in the system. In other words, this threshold will define the complexity of the system. However, as reported in our previous work
[20]
, different thresholds (or different variances) affect not only the complexity of the system, but also the performance of the RLS algorithm. In this study, the correlation terms in the covariance matrix used in the RLS algorithm will be studied. As mentioned, the dimension of the RLS algorithm covariance matrix is [(input number+1)×(rule number)]
^{2}
. An obvious disadvantage of using RLS is the computational burden when the input number and/or the rule number are large. This is the reason that some approaches may tune the consequence parameters by assuming those rules are independent
[13]
. But our study shows that such ignorance of the correlation terms may degrade the learning performance. We will further discuss this issue in the next section. After a new rule is generated, the learning process (B) is to define fuzzy rules based on the current input data. The process is straightforward and the details can be found in
[13]
.
Finally, the parameteridentification process is done concurrently with the structure identification process. For simplicity, a singleoutput case is considered here. The goal is to minimize the cost function
, where
y^{d}
(
t
) is the desired output and
y
(
t
) is the current output. SONFIN tunes the parameter of the consequent part (i.e.,
a
in Eq. (1) ) with the RLS algorithm
[8]
as
where
t
is the iteration number,
u
is the current input vector,
P
is referred to as the covariance of the estimation for
a
, and
λ
is a forgetting factor in the range between 0 to 1
[21
,
22]
. In fact, the forgetting factor is originally employed to deal with time varying problems. However, the problem here is the parameters in the premise part are also tuned and thus the system matrix is no longer a constant and then a forgetting factor is employed to cope with this problem
[11]
. But, it can be expected that such a approach may also introduce other problems. Thus, a usual mean to avoid being trapped in local minima is to reset the covariance matrix
P
after a period of training.
As mentioned, in SONFIN, the parameters of the premise part (i.e., those means (
m_{ij}
) and variances (
σ_{ij}
) of the membership functions) are also tuned. The tuning algorithm used for those parameters is simply the BP learning algorithm. The details can be found in
[13]
. In this study, in order to reduce the effects from the change of the premise part, a small value of the learning constant in the BP algorithm is selected. The same idea is discussed in our previous work
[20]
.
3. Structure Learning for NFS
In this section, several selforganization learning ideas used in the structure leaning for of NFS will be introduced. As mentioned, it is not necessary to use all possible rules in the system for learning. To construct a fuzzy model, the fuzzy subspaces required for defining fuzzy partitions in premise parts and the parameters required for defining functions in consequent parts must both be obtained. In the original TakagiSugenoKang (TSK) modeling approach
[16]
, users must define fuzzy subspaces in advance, and then, the parameters in consequences are obtained through the RLS algorithm. This simple idea is then employed in ANFIS
[15]
. It can be found that those approaches must use all possible rules. In the following, we shall discuss ways of defining rules.
As mentioned in the above section, in the learning phase of SONFIN
[13]
, rules are created dynamically as learning proceeds upon receiving training data. Three learning processes are conducted simultaneously in SONFIN to define both the premise and consequent structure identification of a fuzzy ifthen rule. This kind of learning approach is somehow said to be an online learning approach. In that approach, even though in this structure learning phase, the process can be on line, the later learning algorithm, like BP is not suitable for online learning due to slow convergent property of the BP learning algorithm, which usually needs hundreds or thousands of epochs to converge. This effect can be seen from
[23]
that even a simple directgenerationfromdata approach
[6]
can outperform SONFIN in an online learning situation. Thus, it can be found that various structure learning approaches that are not online approaches were proposed in the later studies.
A simple idea is to use the original structure learning in
[17]
. In that approach, the input space is first divided into fuzzy subspaces through a clustering algorithm called the Kohonen selforganized network
[24]
according to only the input portion of training data. It can be found that the obtained fuzzy subspaces may not cover the entire input space. Note that this approach is similar to that in SONFIN and the fuzzy partition is only based on input data. In other words, rules are defined only based on the distribution of input portion of training data. Usually, in the literature, this structure learning stage is called the coarse tuning stage. After the fuzzy subspaces are defined, the system is approximated in each subspace by a linear function, through supervised learning algorithms, such as BP or leastsquare learning algorithms
[2
,
6
,
7
,
11]
. In the meantime, the fuzzy subspaces may also be tuned usually through BP learning algorithms. This stage is called the fine tuning stage. Note that each subspace corresponding to one fuzzy rule is supposed to have a simple geometry in the inputoutput space, normally having the shape of ellipsoid
[25]
. In fact, other fuzzy clustering algorithms, such as the fuzzy Cmean (FCM)
[26
,
27]
are also suitable to define fuzzy subspaces for fuzzy modeling. In general, the above mentioned approaches partition fuzzy subspaces based on only the clustering in the input space of training data and do not consider whether the output portion of the training data supports such clustering or not. In other words, such approaches do not account for the interaction between input and output variables.
Similar to the use of the AND operation in the fuzzy reasoning process, the match of a rule for a data set requires that the input portion and the output portions must be both matched with the premise part and the consequence part of the rule. Thus, to construct a rule, the input data and the output data must be both considered. Hence, the authors in
[25
,
27]
considered the product space of input and output variables instead of only the input space in classical clustering algorithms for fuzzy modeling. However, these approaches and the above approaches still define fuzzy subspaces in a clustering manner and do not take into account the functional properties in TSK fuzzy models. In other words, in those approaches, training data that are close enough instead of having a similar function behavior are said to be in the same fuzzy subspace. Thus, if the consequence part is a fuzzy singleton, it is nice. But, if the consequence part is a linear function as usual be, such a structure learning behavior may not be proper. As a result, the number of fuzzy subspaces may tend to be more than enough.
In order to account for the linear function property, another approach is proposed in
[28]
. In the approach, fuzzy subspaces and the functions in consequent parts are simultaneously identified through the use of the fuzzy Cregression model (FCRM) clustering algorithm. Thus, not like to calculate a distance to a point (cluster center) in clustering algorithms, the approach is to calculate the distance to a line in a linear regression approach. This distance is then used to define an error function used in the cost function. The idea of this kind of approaches is to find a set of training data whose inputoutput relationship is somehow a linear function, and then, those training data can be clustered into one fuzzy subspace. Similar to other approaches, the structure learning behavior does not incorporate the optimization process in modeling and hence, the fine tuning stage supervised learning algorithms can further be used to adjust the model.
It should be note that in the above clustering or regression algorithms, users must assign the cluster number, which is supposed to be unknown. Another idea proposed in
[29]
is to employ the socalled robust competitive agglomeration (RCA) clustering algorithm used in computer vision and pattern recognition
[30]
to form subspace. In this approach, the cluster number is determined in the clustering process. Besides, the clustering process begins from the whole data set and then can reduce the effects of data sequence. As a result, it can have fast convergent speed and better performance as claimed in the literature.
Another consideration is about outliers in training data. The intuitive definition of an outlier
[31]
is “
an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism
.” Outliers may occur due to various reasons, such as erroneous measurements or noisy data from the tail of noise distribution functions. When outliers exist, the networks may try to fit those improper data and thus, the obtained systems may have the phenomenon of overfitting
[32

34]
. The above structure learning algorithms are all based on the principle of least square error minimization and are easily affected by outliers, which should be degraded in the clustering process
[28
,
30
,
35

37]
. In the finetuning process, classical supervised learning algorithms such as gradient descent approaches are used. When training data are corrupted by large noise, such as outliers, traditional BP learning schemes usually cannot come up with acceptable performance
[38
,
39]
. Based on the principle of robust statistics, various robust learning algorithms have been proposed in the neural network community
[40

45]
. Most of them are to replace the square term in the cost function by a socalled loss function. For the FCRM approach, it is difficult to adopt such robust learning concept. In
[29]
, a novel approach termed as the robust fuzzy regression agglomeration (RFRA) clustering algorithm is shown to have robust learning effects against outliers. While the RFRA clustering algorithm determines the parameters in both premise and consequent parts, the approach also employs a robust learning algorithm to finetune the obtained fuzzy model. The simulation results shown in
[29]
have indeed shown superior performance of the proposed algorithm.
4. Delay Feedback and Dynamic Modeling
It can be easily found that NFS can only model static system due to no memory in keeping previous states. When a dynamic system is the modeling target, an oftenused approach is to use all necessary past inputs and outputs of the system as explicit inputs. Such a structure is referred to as the nonlinear autoregressive with exogenous inputs (NARX) model
[14
,
46]
. Another kind of approaches
[14
,
47

49]
is to feedback the outputs of internal nodes in networks with a time delay. Those methods have shown to have nice modeling accuracy on modeling dynamical systems in some examples. Such networks are usually called recurrent networks in the literature
[47

49]
. However, it can be found that even though the system indeed have loop in the connections, the feedback is with a time delay and then the system does not have actual loop. Thus, in our study, it is called the delay feedback networks
[14]
.
Delay feedback networks are to use internal memories to catch internal states. We can also introduce delay feedbacks to account for dynamical behaviors in SONFIN. However, where to put those delay links is not so straightforward because there are semantics associated with those layers. Different from that used in
[47
,
48]
, another delay feedback approach for SONFIN, termed as the additive delay feedback neural fuzzy network (ADFNFN) is proposed in
[14]
. The basic idea is to adopt the autoregression and movingaverage (ARMA) type
[50]
of modeling approaches. In an ARMA model, the output is predicted as a linear combination of the current input and previous inputs and outputs. In other words, previous outputs are included into the prediction model in an additive manner. In fact, NARX models can generally be viewed as a nonlinear extension of ARMA models because they mix all inputs in an additive manner (linear integration functions in neural networks and linear consequence functions in SONFIN). From simulations, it is evident that the proposed ADFNFN can have better learning performances than those approaches proposed in
[47]
and in
[48]
do.
It is noted that the prediction for the next step in dynamic systems is always based on previous data for a seriesparallel identification scheme
[50]
. Then, when the step size is small, the possible error generated in one step will also be small. This correspondence will make the traditional root mean square errors (RMSE) not able to truly capture the ideas about how accurate the current model can predict. Another evaluating index called the nondimensional error index (NDEI) is considered in
[51]
to evaluate modeling errors. The NDEI is defined as:
where
N
is the number of data,
T
(
i
) is the desired output,
O
(
i
) is the predicted output and
δ
(
T
) is the averaged change in one sampling time in the target series. In some reports such as
[47
,
52]
, the performances shown in their applications seem nice. However, their NDEI is 4.2370 (RMSE = 0.2585 (MSE = 0.0668) and
δ
(
T
) = 0.0610) in
[52]
and 3.4426 (RMSE = 0.21 and
δ
(
T
) = 0.0610) in
[47]
. That means their averaged prediction errors are about 423.7% and 344.26%, respectively, of the averaged change in one step. It is unacceptable. In other words, their learning in fact did not converge. As a matter of fact, in
[14]
, by tuning some parameters, the NDEI can converge to 0.4344 (as shown in
Table 1
) (RMSE = 0.0265,
δ
(
T
) = 0.0610), which is only 43.44%. Such a result still is not good enough. In
[14]
, there is another approach, in which the NDEI is only 0.2148 (21.48%), which then can be viewed as a nice prediction.
Learning errors (RMSE) forSincwith change algorithms
RMSE, root mean square errors.
Intuitively, the use of delay feedback could model any order of dynamic systems because those delay elements can be al kind of signals including those delayed ones. However, from
[14]
, it can be found that those delay feedback models can only achieve the accuracy level of order2 NARX models. In other words, those delay feedback networks seem not able to model the systems as accurately as the NARX models with proper orders do. It is because delay feedback models only use one delay in the modeling approaches, it is somewhat similar to NARX models with order 2. As shown in
[14]
, if two feedback connections are used for each internal state; one is with one delay and the other is with two delays, it can be found that the errors have been significantly reduced when the considered systems are order3 and order4 systems. However, the modeling accuracy for the order4 system is still not good enough. It is clearly evident that the role of the used delay number in a delay feedback model is similar to that of the system order in an NARX model. Thus, it can be concluded that delay feedback network is not necessary because it is more complicated than an NARX model with a proper order and cannot provide better performance.
5. Analysis of the Use of RLS
As mentioned, ANFIS
[15]
and SONFIN
[13]
employ the RLS algorithm originally proposed in
[16]
in the learning process. However, in practical applications, there are problems and then various remedy mechanisms may be needed while using RLS for the learning process in NFS. In this article, the interaction between rules on consequent part and RLS algorithm will be analyzed and the operation of resetting the covariance matrix is also discussed. In the literature, RLS algorithms have been widely used in adaptive filtering, selftuning control systems and system identification
[53]
. From the literature, it can be found that there are several advantages in using RLS algorithms, such as fast convergence speed and small estimation errors, especially while the system considered is simple and time invariant. Theoretically, the estimation of RLS is the best estimation under the assumption of Gaussian noise in ideal cases. In practical applications, there are some possible remedies. The advantage of those approaches can be obviously but not always while the system considered becomes complicated. Recently, IRSFNN
[49]
also present two types of parameter identification steps, which we referred to as the reduced and full covariance matrices for RLS algorithms. To explain the results in
[49]
, the overlap coefficient is employed to define the intersection between fuzzy sets. Similarly in our previous work
[20]
, two types of membership functions are considered to manifest what the differences between full and reduce covariance matrix are. When the system has more interaction between fuzzy sets, it can be found that the offdiagonal parts of the covariance matrix should not be ignored.
In our study, SONFIN
[13]
is employed as the NFS. In this section, the learning algorithms used in the original SONFIN are considered as a basis. The structure of SONFIN is created dynamically in the learning process and only uses rules that are necessary. It is easy to see that SONFIN can have very nice performance especially when the number of input variables is large. In the learning process, when the firing strength of the current input feature is lower than a threshold, the system will generate a new fuzzy rule for this input feature. In other words, if necessary, SONFIN can generate new fuzzy rules for new input features. After constructing fuzzy rules, the parameters of the premise part and of the consequent part of the NFS are updated through BP and RLS, respectively. It can be found that there are two kinds of interactions; interaction between input data and interaction between consequent structures. In order to have the efficient computation, the consequence parameters of SONFIN are calculated independently among rules
[13]
. However, it is expected that with the consideration of those rule interactions, it can help in discovering new knowledge among rules and improving the reasoning accuracy
[13]
. If the membership functions are of less overlapping, no matter what kinds of RLS algorithms are used, there is no much difference in the learning performance
[20]
.
As mentioned in
[54

56]
, those predefined membership functions can be modified (or tuned) by some operations, like “very” or “moreorless”. When membership functions are tuned, linguistic variables can bring more humanlike thinking with different kinds of membership functions. In SONFIN, the system also has the capability of changing the cores and shapes of membership functions. Consequently, SONFIN can have nice performance in data learning. However, those tuning effects may destroy original features of those membership functions. Since SONFIN only creates rules that are necessary, those original membership functions may contain some characteristics for that data pair. While the membership functions are tuned in the later learning process, those characteristics may be altered and more complex interactions emerge. As mentioned in
[57]
, redundancy interaction usually cannot have significant improvement in performance. Besides, the correlation terms in the covariance matrix required in RLS is another problem. When the dimension of the system to be modeled becomes very large, the computational time required may become infeasible because of the dimension of the covariance matrix is (the number of rules×the number of input variable+1)
^{2}
. Also, the interaction between BP (used for the tuning of input fuzzy membership functions) and RLS (used for the tuning of the consequence parameters) will be unexpected. In this study, the interaction between rules on consequent part and RLS algorithm will be analyzed and the operation of resetting the covariance matrix is also discussed.
In this section, we will rearrange the analysis in
[20]
to analyze the performance of using the RLS algorithm with the full covariance matrix and with a reduced covariance matrix. The RLS algorithm with the full covariance matrix is the original approach as Eqs. (3) and (4) where
P
is a full matrix. To use a reduced matrix is to assume the consequence parameters among rules are independent and the correlation terms between rules are all assumed zeros. Define [
a
_{o1}
a
_{11}
⋯
a
_{1I}
a
_{o2}
a
_{21}
⋯
a
_{2I}
⋯
a
_{oJ}
a
_{J1}
⋯
a
_{JI}
]
^{T}
as the parameter vector, where
I
is the input dimension + 1 and
J
is the rule number. For Eq. (3),
u
is the input vector for the whole consequence part and can be written as.
where
is the firing strength of the ith rule and the supscript (5) means it is on the layer 5 of the whole structure
[13]
. For the reduced matrix case, the consequence parameters among rules are assumed to be independent. Thus, for Eq. (3), the input vector of u for the ith rule is
Then the covariance matrix
P
for each rule is with dimension Jind×Jind, where Jind=
I
=(the input dimension + 1). There are
J
(=the rule number) such covariance matrices and the total dimension is
J
×
I
^{2}
while it is (
J
×
I
)
^{2}
in the full matrix case.
In
[57]
, three kinds of interaction for sensory inputs are defined. They are
redundancy/negative synergy
,
complementarity/positive synergy
and
independency
. Those interaction types define different situations of the actual value compared to the combination of individual sensors. Those types can also be considered in rule interactions of SONFIN while using the RLS algorithm. While it is in the case of redundancy/negative synergy, the system learning performance will better using the independent RLS algorithm than that of using the fullrule RLS algorithm. While it is in the case of complementarity/positive synergy the fullrule RLS algorithm will perform better than the independent RLS algorithm does. Finally, while they are Independent, two RLS algorithms have no significant difference.
SONFIN updates the membership function and consequent parts simultaneously in the learning process. Interaction in each rule is difficult to manipulate. When BP changes the membership function and the consequent part change by RLS algorithm in one step, new membership functions will change the Layer 5 input, which makes the premise part used in previous RLS calculation changed. Somehow such a change can be viewed as the system is time varying. Thus, there are several methods proposed in the literature to resolve this problem, like to use a forgetting factor in the RLS algorithm, to reduce the learning constant in BP.
Another issue in the use of forgetting factor in the RLS algorithm is about resetting the covariance matrix P. It can be found that resetting the covariance matrix can somehow improve the training performance from our previous study
[18]
. However the overfitting phenomenon may occur in some cases. By setting the covariance matrix with a large diagonal values, the RLS algorithm will have the capability to focus on the present data. It is called bootstrap
[58]
. In other words, we set the system turn into local learning phase to reduce the effect from previous learning.
Two functions are considered for illustration for this part of study. First, a simply function
Sinc
is considered. The second one is the identification of the
MackyGlass
chaotic series. It is know that by using different thresholds in SONFIN. SONFIN can generate several NFS with different rule complexity. Here we will have two types of fuzzy rules; generalized rules and strict rules. They can be viewed as the interactions in those two types are heavy and slight, respectively.
We continue the simulations in
[19]
. In the case of
sinc
, 7 and 25 rules are considered for two type fuzzy set membership functions with initial variance value 0.1 and 1. Their corresponding learning performances are given in
Table 2
. It can easily be found that the same performance for strict 7rule system. This system type they have less interaction on rules. However, in 25rule system, using the full
P
matrix can have better performance than that of using the reduce
P
matrix. For the generalized system, It is clearly evident that using the full
P
matrix has better performance than that of using the reduce system. Thus, when the system has less interaction between rule, two different RLS algorithms with the whole system or calculating for each rule individually will have similar performance. But it could not be confirmed the system is in local minimum. Next, the resetting
P
is taken into account and the results are given in
Table 3
. From the results, it is observed that except the strict 7 rules system, all systems have better performance than those in
Table 2
.
Learning errors (RMSE) forSinc
RMSE, root mean square errors.
Learning errors (RMSE) forSincwith reset
RMSE, root mean square errors.
While the rule system is of generalized rules cases and the interaction among rules are significant, the original (Full) covariance matrix must be used. However, it can be expected that when the full matrix is considered in the RLS algorithm, the computation problem owing to a large dimension of the covariance matrix may be there. In order to have a better computational efficiency, the learning can use different RLS algorithms after resetting the
P
matrix. Two simulations are shown in
Table 1
. Except the strict 7rule system, for the change from full to reduce
P
matrix case after resetting, the performance in
Table 1
is between with reset and nonreset full
P
matrix system compared to those in
Tables 2
and
3
. But in the case of change from reduce to full
P
matrix after resetting, it has very nice performance in generalized cases.
The second simulation is the identification of the
MackyGlass
chaotic series. It has 200 2dimension points for input feature. The same cases situations as above are considered.
Tables 4

6
show those learning performances after 1100 epochs. The same conclusions can be observed in this example too. Thus, we can claim that the use of change from reduce to full
P
matrix after resetting can have nice learning performance on error and can achieve computational efficiency also.
Learning errors (RMSE) forMackyGlass
RMSE, root mean square errors.
Learning errors (RMSE) forMackyGlasswith reset
RMSE, root mean square errors.
Learning errors (RMSE) forMackyGlasswith change algorithm
RMSE, root mean square errors.
Now, more complexity for the MackyGlass function is considered. In
[49]
, four past values are used for training. 1000 patterns are generated in this study. First 500 patterns are used for training and the other 500 patterns are reserved for testing. After 100 epochs, the error for the generalized system with the use of full matrix is significantly better than others. In other words, most researchers recommend to use the reduce covariance matrix to save computational burden, it can be found that the error difference is not slight. In practice, the change of using different covariance matrices after resetting can be employed to improve learning performance. In this study, the covariance matrix is reset to the initial status at 10 epochs. The results are shown in
Table 7
. Comparing the performance between resetting and nonresetting, two cases become worse; strict rules with full covariance matrix system and generalized rules with reduce covariance matrix system. Some approached are considered here. First, the initial value of the covariance matrix diagonal parts is reduced to increase the effects from previous learning. The simulation results are shown in
Table 8
. Secondly, we reset the system with the full covariance matrix and the simulation result shown in
Table 9
.
Learning errors (RMSE) for 4inputMackyGlasswith reset after 10 epochs
RMSE, root mean square errors.
Learning errors (RMSE) for 4inputMackyGlasswith reset and reduce the covariance matrix diagonal values from 1000 to 1
RMSE, root mean square errors.
Learning errors (RMSE) for 4inputMackyGlasswith full covariance matrix reset
RMSE, root mean square errors.
From those results, it can be found that in this case only resetting the covariance matrix is not enough to improve the learning performance. In
Table 8
, the learning performance at 100 epochs 0.004281 is better than that in
Table 7
. Also for generalized rules with reduce covariance matrix system case, to reduce the value of covariance matrix diagonal parts can indeed prevent the system unstable so as to catch up with the generalized rules with the reduce covariance matrix system results in
Table 10
.
Learning errors (RMSE) for 4inputMackyGlass
RMSE, root mean square errors.
6. Conclusions
NFS is a nice modeling technique. In this article, we make a brief survey about its use in various aspects. For the structure learning, from the basic idea used in the original approach to the several different approaches are introduced. Hopefully, readers can understand those ideas and can select a suitable approach for their applications. For the dynamic system modelling, the idea of recurrent network, which has been widely used in the literature, is discussed. From the results reported in
[14]
, it is clearly evident that such a methodology is not a good approach even though lots of researchers have used this idea in their approaches. A simple approach use a sufficient order in a traditional NARX model will have the best results. Finally, the effects on rule interaction in the use or RLS are reported in this article. It can be observe that to use the reduced matrix, the computational burden can be reduced significantly but the error may be large especially for complicated systems. To reserve the computational efficiency and to have nice learning performance on error, we propose to add one final step in SONFIN by resetting the P matrix and then performing the RLS algorithm with the full covariance matrix.
Conflict of Interest No potential conflict of interest relevant to this article was reported.
BIO
ShunFeng Su received the B.S. degree in electrical engineering, in 1983, from National Taiwan University, Taiwan, R.O.C., and the M.S. and Ph.D. degrees in electrical engineering, in 1989 and 1991, respectively, from Purdue University, West Lafayette, IN.
He is now a chair professor of the Department of Electrical Engineering, National Taiwan University of Science and Technology, Taiwan, R.O.C. He is an IEEE fellow and CACS fellow. He has published more than 190 refereed journal and conference papers in the areas of robotics, intelligent control, fuzzy systems, neural networks, and nonderivative optimization. His current research interests include computational intelligence, machine learning, virtual reality simulation, intelligent transportation systems, smart home, robotics, and intelligent control.
Dr. Su is very active in various international/domestic professional societies. He is now the president of the Taiwan Association of System Science and Engineering and the presidentelect of International Fuzzy Systems Association. He now is also in the boards of governors of the Chinese Automatic Control Society, the Taiwan Society of Robotics, and the Taiwan Fuzzy System Association of. Dr. Su also acted as program chair, program cochair, or PC members for various international and domestic conferences. Dr. Su currently serves as associate editors of IEEE Transactions on Cybernetics, IEEE Transactions on Fuzzy Systems, and IEEE Access. Since 2015, he will be the editorinchief of International Journal of Fuzzy Systems.
JenWei Yeh was born in Taipei, Taiwan, R.O.C., in 1986. He received the B.S. degree in National Taipei University of Technology, Taipei, Taiwan, R.O.C. He is currently working toward the Ph.D. degree at National Taiwan University of Science and Technology, Taipei, Taiwan. His research interests include fuzzy logic systems, adaptive control, and intelligent control.
Kosko B.
1992
Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence
Prentice Hall
Englewood Ciffs, NJ
Lin C. T.
,
Lee C. S. G.
1996
Neural Fuzzy Systems: A NeuroFuzzy Synergism to Intelligent Systems
Prentice Hall PTR
Upper Saddle River, NJ
Hornik K.
,
Stinchcombe M.
,
White H.
1989
“Multilayer feedforward networks are universal approximators,”
Neural Networks
http://dx.doi.org/10.1016/08936080(89)900208
2
(5)
359 
366
DOI : 10.1016/08936080(89)900208
Zadeh L. A.
1973
“Outline of a new approach to the analysis of complex systems and decision processes,”
IEEE Transactions on Systems, Man and Cybernetics
http://dx.doi.org/10.1109/TSMC.1973.5408575
SMC3
(1)
28 
44
DOI : 10.1109/TSMC.1973.5408575
Pedrycz W.
1985
“Structured fuzzy models,”
Cybernetics and Systems
http://dx.doi.org/10.1080/01969728508927757
16
(1)
103 
117
DOI : 10.1080/01969728508927757
Wang L. X.
,
Mendel J. M.
1992
“Generating fuzzy rules by learning from examples,”
IEEE Transactions on Systems, Man and Cybernetics
http://dx.doi.org/10.1109/21.199466
22
(6)
1414 
1427
DOI : 10.1109/21.199466
Yager R. R.
,
Filev D. P.
1994
Essentials of Fuzzy Modeling and Control
Wiley
New York, NY
Horikawa S.
,
Furuhashi T.
,
Uchikawa Y.
1992
“On fuzzy modeling using fuzzy neural networks with the backpropagation algorithm,”
IEEE Transactions on Neural Networks
http://dx.doi.org/10.1109/72.159069
3
(5)
801 
806
DOI : 10.1109/72.159069
Tanaka K.
,
Sano M.
,
Watanabe H.
1995
“Modeling and control of carbon monoxide concentration using a neurofuzzy technique,”
IEEE Transactions on Fuzzy Systems
http://dx.doi.org/10.1109/91.413233
3
(3)
271 
279
DOI : 10.1109/91.413233
Lin Y.
,
Cunningham G. A.
1995
“A new approach to fuzzyneural system modeling,”
IEEE Transactions on Fuzzy Systems
3
(2)
190 
198
DOI : 10.1109/91.388173
Jang J. S. R.
,
Sun C. T.
,
Mizutani E.
1997
NeuroFuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence
Prentice Hall
Upper Saddle River, NJ
Su S. F.
,
Chen K. Y.
2005
“Conceptual discussions and benchmark comparison for neural networks and fuzzy systems,”
Differential Equations and Dynamical Systems
13
(1)
35 
61
Juang C. F.
,
Lin C. T.
1998
“An online selfconstructing neural fuzzy inference network and its applications,”
IEEE Transactions on Fuzzy Systems
http://dx.doi.org/10.1109/91.660805
6
(1)
12 
32
DOI : 10.1109/91.660805
Su S. F.
,
Yang F. Y. P.
2002
“On the dynamical modeling with neural fuzzy networks,”
IEEE Transactions on Neural Networks
http://dx.doi.org/10.1109/TNN.2002.804313
13
(6)
1548 
1553
DOI : 10.1109/TNN.2002.804313
Jang J. S. R.
1993
“ANFIS: adaptivenetworkbased fuzzy inference system,”
IEEE Transactions on Systems, Man and Cybernetics
http://dx.doi.org/10.1109/21.256541
23
(3)
665 
685
DOI : 10.1109/21.256541
Takagi T.
,
Sugeno M.
1985
“Fuzzy identification of systems and its applications to modeling and control,”
IEEE Transactions on Systems, Man and Cybernetics
http://dx.doi.org/10.1109/TSMC.1985.6313399
SMC15
(1)
116 
132
DOI : 10.1109/TSMC.1985.6313399
Lin C. T.
,
G. Lee C. S.
1991
“Neuralnetworkbased fuzzy logic control and decision system,”
IEEE Transactions on Computers
http://dx.doi.org/10.1109/12.106218
40
(12)
1320 
1336
DOI : 10.1109/12.106218
Yeh J. W.
,
Su S. F.
,
Jeng J. T.
,
Chen B. S.
“On learning analysis of neural fuzzy systems,”
Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ)
Barcelona, Italy
July 1823, 2010
http://dx.doi.org/10.1109/FUZZY.2010.5584389
1 
6
DOI : 10.1109/FUZZY.2010.5584389
Yeh J. W.
,
Su S. F.
,
Rudas I.
“Analysis of using RLS in neural fuzzy systems,”
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics
Anchorage, AK
October 912, 2011
http://dx.doi.org/10.1109/ICSMC.2011.6083937
1831 
1836
DOI : 10.1109/ICSMC.2011.6083937
Yeh J. W.
,
Su S. F.
“Learning analysis for correlation of fuzzy rules in applying RLS for neural fuzzy systems,”
Proceedings of the IEEE International Conference on Granular Computing
Hangzhou, China
August 1113, 2012
http://dx.doi.org/10.1109/GrC.2012.6468690
609 
613
DOI : 10.1109/GrC.2012.6468690
Zhang Y.
,
Li X. R.
1999
“A fast UD factorizationbased learning algorithm with applications to nonlinear system modeling and identification,”
IEEE Transactions on Neural Networks
http://dx.doi.org/10.1109/72.774266
10
(4)
930 
938
DOI : 10.1109/72.774266
Cho Y. S.
,
Kim S. B.
,
Powers E. J.
1991
“Timevarying spectral estimation using AR models with variable forgetting factors,”
IEEE Transactions on Signal Processing
http://dx.doi.org/10.1109/78.136549
39
(6)
1422 
1426
DOI : 10.1109/78.136549
Huang S. R.
1999
“Analysis of modelfree estimators: applications to stock market with the use of technical indices,” M.S. thesis
National Taiwan University of Science and Technology
Taipei, Taiwan
Kohonen T.
1989
Selforganization and Associative Memory
3rd ed.
SpringerVerlag
New York, NY
Dickerson J. A.
,
Kosko B.
1996
“Fuzzy function approximation with ellipsoidal rules,”
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
http://dx.doi.org/10.1109/3477.517030
26
(4)
542 
560
DOI : 10.1109/3477.517030
Kroll A.
1996
“Identification of functional fuzzy models using multidimensional reference fuzzy sets,”
Fuzzy Sets and Systems
http://dx.doi.org/10.1016/01650114(95)001409
80
(2)
149 
158
DOI : 10.1016/01650114(95)001409
Klawonn F.
,
Kruse R.
1997
“Constructing a fuzzy controller from data,”
Fuzzy Sets and Systems
http://dx.doi.org/10.1016/01650114(95)003509
85
(2)
177 
193
DOI : 10.1016/01650114(95)003509
Kim E.
,
Park M.
,
Ji S.
,
Park M.
1997
“A new approach to fuzzy modeling,”
IEEE Transactions on Fuzzy Systems
http://dx.doi.org/10.1109/91.618271
5
(3)
328 
337
DOI : 10.1109/91.618271
Chuang C. C.
,
Su S. F.
,
Chen S. S.
2001
“Robust TSK fuzzy modeling for function approximation with outliers,”
IEEE Transactions on Fuzzy Systems
http://dx.doi.org/10.1109/91.971730
9
(6)
810 
821
DOI : 10.1109/91.971730
Frigui H.
,
Krishnapuram R.
1999
“A robust competitive clustering algorithm with applications in computer vision,”
IEEE Transactions on Pattern Analysis and Machine Intelligence
http://dx.doi.org/10.1109/34.765656
21
(5)
450 
465
DOI : 10.1109/34.765656
Hawkins D. M.
1980
Identification of Outliers
Chapman and Hall
New York, NY
Smith M.
1993
Neural Networks for Statistical Modeling
Van Nostrand Reinhold
New York, NY
Sarle W. S.
“Stopped training and other remedies for overfitting,”
Proceedings of the 27th Symposium on the Interface of Computing Science and Statistics
Pittsburgh, PA
June 2124, 1995
352 
360
Bartlett P. L.
1996
“For valid generalization, the size of the weights is more important than the size of the network,”
Advances in Neural Information Processing Systems
9
134 
140
Bezdek J. C.
1981
Pattern Recognition with Fuzzy Objective Function Algorithms
Plenum Press
New York, NY
Jain A. K.
,
Dubes R. C.
1988
Algorithms for Clustering Data
Prentice Hall
Englewood Cliffs, NJ
Dave R. N.
,
Krishnapuram R.
1997
“Robust clustering methods: a unified view,”
IEEE Transactions on Fuzzy Systems
http://dx.doi.org/10.1109/91.580801
5
(2)
270 
293
DOI : 10.1109/91.580801
Cichocki A.
,
Unbehauen R.
1993
Neural Networks for Optimization and Signal Processing
J. Wiley
New York, NY
Rousseeuw P. J.
,
Leroy A. M.
1987
Robust Regression and Outlier Detection
Wiley
New York, NY
Chen D. S.
,
Jain R. C.
1994
“A robust backpropagation learning algorithm for function approximation,”
IEEE Transactions on Neural Networks
http://dx.doi.org/10.1109/72.286917
5
(3)
467 
479
DOI : 10.1109/72.286917
Connor J. T.
,
Martin R. D.
,
Atlas L. E.
1994
“Recurrent neural networks and robust time series prediction,”
IEEE Transactions on Neural Networks
http://dx.doi.org/10.1109/72.279188
5
(2)
240 
254
DOI : 10.1109/72.279188
Liano K.
1996
“Robust error measure for supervised neural network learning with outliers,”
IEEE Transactions on Neural Networks
http://dx.doi.org/10.1109/72.478411
7
(1)
246 
250
DOI : 10.1109/72.478411
Sáanchez A V. D.
1995
“Robustization of a learning method for RBF networks,”
Neurocomputing
http://dx.doi.org/10.1016/09252312(95)00000V
9
(1)
85 
94
DOI : 10.1016/09252312(95)00000V
Huang L.
,
Zhang B.L.
,
Huang Q.
1998
“Robust interval regression analysis using neural networks,”
Fuzzy Sets and Systems
http://dx.doi.org/10.1016/S01650114(96)003259
97
(3)
337 
347
DOI : 10.1016/S01650114(96)003259
Chuang C. C.
,
Su S. F.
,
Hsiao C. C.
2000
“The annealing robust backpropagation (ARBP) learning algorithm,”
IEEE Transactions on Neural Networks
http://dx.doi.org/10.1109/72.870040
11
(5)
1067 
1077
DOI : 10.1109/72.870040
Haykin S. S.
1999
Neural Networks: A Comprehensive Foundation
2nd ed.
Prentice Hall
Upper Saddle River, NJ
Juang C. F.
,
Lin C. T.
1999
“A recurrent selforganizing neural fuzzy inference network,”
IEEE Transactions on Neural Networks
http://dx.doi.org/10.1109/72.774232
10
(4)
828 
845
DOI : 10.1109/72.774232
Lee C. H.
,
Teng C. C.
2000
“Identification and control of dynamic systems using recurrent fuzzy neural networks,”
IEEE Transactions on Fuzzy Systems
http://dx.doi.org/10.1109/91.868943
8
(4)
349 
366
DOI : 10.1109/91.868943
Lin Y. Y.
,
Chang J. Y.
,
Lin C. T.
2013
“Identification and prediction of dynamic systems using an interactively recurrent selfevolving fuzzy neural network,”
IEEE Transactions on Neural Networks and Learning Systems
http://dx.doi.org/10.1109/TNNLS.2012.2231436
24
(2)
310 
321
DOI : 10.1109/TNNLS.2012.2231436
Narendra K. S.
,
Parthasarathy K.
1990
“Identification and control of dynamical systems using neural networks,”
IEEE Transactions on Neural Networks
http://dx.doi.org/10.1109/72.80202
1
(1)
4 
27
DOI : 10.1109/72.80202
Espinosa J.
,
Vandewalle J.
2000
“Constructing fuzzy models with linguistic integrity from numerical dataAFRELI algorithm,”
IEEE Transactions on Fuzzy Systems
http://dx.doi.org/10.1109/91.873582
8
(5)
591 
600
DOI : 10.1109/91.873582
Sastry P. S.
,
Santharam G.
,
Unnikrishnan K. P.
1994
“Memory neuron networks for identification and control of dynamical systems,”
IEEE Transactions on Neural Networks
http://dx.doi.org/10.1109/72.279193
5
(2)
306 
319
DOI : 10.1109/72.279193
Haykin S. S.
1991
Adaptive Filter Theory
2nd ed.
Prentice Hall
Englewood Cliffs, NJ
Zadeh L. A.
1975
“The concept of a linguistic variable and its application to approximate reasoningI,”
Information Sciences
http://dx.doi.org/10.1016/00200255(75)900365
8
(3)
199 
249
DOI : 10.1016/00200255(75)900365
Zadeh L. A.
1975
“The concept of a linguistic variable and its application to approximate reasoningII,”
Information Sciences
http://dx.doi.org/10.1016/00200255(75)900468
8
(4)
301 
357
DOI : 10.1016/00200255(75)900468
Zadeh L. A.
1975
“The concept of a linguistic variable and its application to approximate reasoningIII,”
Information Sciences
http://dx.doi.org/10.1016/00200255(75)900171
9
(1)
43 
80
DOI : 10.1016/00200255(75)900171
Grabisch M.
1996
“The representation of importance and interaction of features by fuzzy measures,”
Pattern Recognition Letters
http://dx.doi.org/10.1016/01678655(96)000207
17
(6)
567 
575
DOI : 10.1016/01678655(96)000207
Jones R. D.
,
Lee Y. C.
,
Barnes C. W.
,
Flake G. W.
,
Lee K.
,
Lewis P. S.
,
Qian S.
“Function approximation and time series prediction with neural networks,”
Proceedings of the IJCNN International Joint Conference on Neural Networks
San Diego, CA
June 1721, 1990
http://dx.doi.org/10.1109/IJCNN.1990.137644
649 
665
DOI : 10.1109/IJCNN.1990.137644