Advanced
A Clustering-Based Fault Detection Method for Steam Boiler Tube in Thermal Power Plant
A Clustering-Based Fault Detection Method for Steam Boiler Tube in Thermal Power Plant
Journal of Electrical Engineering and Technology. 2016. Jul, 11(4): 848-859
Copyright © 2016, The Korean Institute of Electrical Engineers
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • Received : March 23, 2016
  • Accepted : April 26, 2016
  • Published : July 01, 2016
Download
PDF
e-PUB
PubReader
PPT
Export by style
Article
Author
Metrics
Cited by
About the Authors
Jungwon Yu
Dept. of Electrical and Computer Engineering, Pusan National University, Busan, Korea.
Jaeyel Jang
Technology & Information Department, Technical Solution Center, Korea East-West Power Co., Ltd., Dangjin, Korea.
Jaeyeong Yoo
CTO, XEONET Co., Ltd, Seongnam, Korea
June Ho Park
Dept. of Electrical and Computer Engineering, Pusan National University, Busan, Korea.
Sungshin Kim
Corresponding Author: Dept. of Electrical and Computer Engineering, Pusan National University, Busan, Korea.

Abstract
System failures in thermal power plants (TPPs) can lead to serious losses because the equipment is operated under very high pressure and temperature. Therefore, it is indispensable for alarm systems to inform field workers in advance of any abnormal operating conditions in the equipment. In this paper, we propose a clustering-based fault detection method for steam boiler tubes in TPPs. For data clustering, k -means algorithm is employed and the number of clusters are systematically determined by slope statistic. In the clustering-based method, it is assumed that normal data samples are close to the centers of clusters and those of abnormal are far from the centers. After partitioning training samples collected from normal target systems, fault scores (FSs) are assigned to unseen samples according to the distances between the samples and their closest cluster centroids. Alarm signals are generated if the FSs exceed predefined threshold values. The validity of exponentially weighted moving average to reduce false alarms is also investigated. To verify the performance, the proposed method is applied to failure cases due to boiler tube leakage. The experiment results show that the proposed method can detect the abnormal conditions of the target system successfully.
Keywords
1. Introduction
The importance of condition monitoring and fault detection (FD) techniques has been growing for effective operation and performance improvement of various industrial processes such as aircraft, train, automobile, chemical factory and power plant. Fault is defined as an unpermitted deviation of at least one characteristic property or variable of a system from acceptable/usual/standard behavior [1] . Fault can result in system malfunctions and failure. In particular, system failures in thermal power plant (TPP) equipment with very high operating pressure and temperature can cause severe loss of life and materials. Monitoring and FD systems that can detect in advance the abnormal conditions of power plant units are essential for ensuring the safety, reliability and availability of power plants. The main objective of FD systems is to detect the abnormal operation conditions of power plants by analyzing complex and nonstationary behaviors of process parameters, and help field workers execute proper actions at the initiatory stage of faults.
Recently, as distributed control systems (DCSs) are built in power plants, massive operation data can be collected and managed efficiently. In DCS, historical operation data composed of various process variables is stored in discrete time intervals. The explosive growth of historical data has boosted efforts to extract useful knowledge from the data related to equipment health and maintenance information.
As described in Fig. 1 , process monitoring procedures are basically performed in four steps [2] : FD, fault identification, fault diagnosis and system recovery. FD determines whether a fault has occurred. Fault identification confirms process variables in connection with the fault. Fault diagnosis identifies the type of fault. Finally, after removing the cause of the fault, the monitoring loop is closed. In this paper, the focus is on FD only.
PPT Slide
Lager Image
General monitoring scheme for industrial processes [2]
The following summarizes several previous studies on condition monitoring and FD methods for TPPs using data mining techniques. Ajami and Daneshvar [2] used multivariate statistical signal processing techniques, such as principal component analysis and independent component analysis (ICA), for FD and diagnosis of TPP turbine systems. Hsu and Su [3] developed a method that combines ICA and exponentially weighted moving average (EWMA) for early detection of TPP malfunctions at Taiwan Power Company. Cai et al. [4] introduced an on-line performance monitoring method for coal-fired power units, such as boilers and turbines, using support vector machine (SVM). Chen et al. [5] proposed a SVM-based method with dimension reduction schemes based on correlation analysis and decision tree to analyze turbine failures in thermal power facilities. Shashoa et al. [6] presented a data-driven FD and isolation approach for a steam separator at TEKO B1 Kostolac TPP using robust process identification procedures and Neyman-Pearson hypothesis test. Li et al. [7] presented a monitoring and fault diagnosis method for leak detection of feedwater heaters in coal-fired plants using group method of data handling based on ridge regression. Prasad et al. [8] proposed a performance monitoring strategy based on neural network and histogram plots to economize the operation of a 200 MW oil/gas-fired TPP. Guo et al. [9] reported a condition monitoring and FD method for tube-ball mills of coal-fired plants using a multi-segment mathematical model whose parameters are identified by genetic algorithms.
Although various statistical and machine learning techniques have been applied for condition monitoring and FD of TPP components, what seems to be lacking is attempts to use clustering-based FD methods for TPPs. In this paper, we propose a clustering-based FD method for tube leakage of steam boilers in TPPs. In classification-based approaches (e.g., SVM), labeled learning samples should be prepared to train binary classifiers. To prepare the labeled samples, experts determine whether arbitrary samples are normal or fault after checking historical operation data. When there are many monitored variables, this labeling procedure is a difficult and time-consuming process. On the other hand, clustering-based methods can find the hidden structure of unlabeled learning samples, and perform FD in unsupervised mode. Clustering-based methods that do not need pre-labeled samples have been widely applied to engineering fields, such as financial domain [10] , network intrusion detection [11 , 12] , anomaly detection in surveillance videos [13] and steel industry [14] . In the proposed method, it is assumed that normal samples are close to cluster centroids and abnormal samples are far from the centroids. For data clustering, k -means algorithm with Euclidean distance is employed and slope statistic [15] proposed by Fujita is used to systematically determine the number of clusters. Slope statistic can handle situations when there is a dominant cluster in training samples, when the samples are not a mixture of Gaussian distributions, and when the dimensions of the samples are high.
After partitioning training samples gathered from normal target systems into several groups, fault scores (FSs) are assigned to unseen samples based on the distances between the samples and their closest centers. Using 95th, 97th and 99th percentiles, threshold values of FSs are calculated and alarm signals occur when the FSs of unseen samples are larger than the threshold values. The validity of EWMA for reducing false alarms is also investigated. In order to evaluate the performance, the proposed method is applied to collected dataset from the DCS of 200 MW TPP. The dataset corresponds to two failure cases due to boiler tube leakage. The simulation results show that the proposed method can detect the tube leakage in the early stages.
The remainder of this paper is organized as follows. Section 2 briefly summarizes k -means clustering algorithm and silhouette and slope statistics. In Section 3, FS assignments, their threshold settings and EWMA that consider the trends of FSs are explained. Section 4 describes the target system, a 200 MW coal-fired TPP, and the tube leakage in steam boiler. Section 5 shows the simulation results of the two failure cases and finally, Section 6 presents concluding remarks.
2. Data Clustering Algorithm
Data clustering techniques classify similar training samples into several groups or clusters and can find the hidden structure of unlabeled training samples.
- 2.1k-Means Clustering Algorithm
The k -means algorithm [16 , 17] partitions n given vectors x j , j = 1,..., n , into c groups (also called as clusters) Gi , i = 1,..., c , and finds cluster centers c i , i = 1,..., c , that minimize the objective function defined as follows:
PPT Slide
Lager Image
where ||·|| denotes Euclidean distance and Ji is an objective function value of the i th cluster, which depends on its geometrical data structure and center position. The partitioned samples are described by c by n binary membership matrix U whose i th row and j th column, uij , is 1 if the j th sample, x j , belongs to the i th cluster, and 0 otherwise. Matrix U satisfies the following properties:
PPT Slide
Lager Image
and
PPT Slide
Lager Image
After cluster centers c i , i = 1,..., c , are fixed, uij is defined as
PPT Slide
Lager Image
In other words, if the i th center is the closest center of the j th sample, the latter is included in the i th group. After determining uij , optimal centers c i that minimize the objective function are calculated as
PPT Slide
Lager Image
where |·| denotes the size of a set. As explained above, in k -means algorithm, cluster centers c i and membership matrix U are determined through iterative procedures (see [16] for more on this).
- 2.2 Silhouette statistic
The concept of silhouette [18] proposed by Rousseeuw is a useful tool for verifying how well the training samples are grouped. The silhouette plot not only provides validity to the clustering results but also outlines the target data structure. Silhouette statistic, an averaged value of each sample’s silhouette value, can be employed to determine the proper number of clusters.
Suppose that n training samples x j , j = 1,..., n , are grouped into c clusters Gi , i = 1,..., c , and the j th sample belongs to the ith cluster. In order to calculate silhouette value s ( j ) for the j th sample, let us define average dissimilarity a ( j ) (i.e., inner dissimilarity) between the j th sample and all elements of the i th cluster, with the exception of the j th sample, as
PPT Slide
Lager Image
where x x j . The average dissimilarity between the j th sample and all elements of the k th cluster, with the exception of the i th cluster, d ( x j , Gk ), for k = 1,..., c , k i , is also defined as
PPT Slide
Lager Image
After calculating d ( x j , Gk ), their minimum value (i.e., inter dissimilarity) is denoted by
PPT Slide
Lager Image
where Gk , whose d ( x j , Gk ) is minimum, is called as the second-best choice cluster. The silhouette value s ( j ) of the j th sample is calculated as
PPT Slide
Lager Image
Silhouette values s ( j ), j = 1,..., n , are in the range of [−1, 1] and they can be combined into a silhouette plot that graphically represents clustering results. Let us exemplify three extreme situations to deeply understand the meaning of a silhouette value. The first case is that where s ( j ) is close to 1. This implies that inner dissimilarity is much smaller than inter dissimilarity, i.e.,
PPT Slide
Lager Image
. In this case, we can conclude that the j th sample is included in a proper cluster. In the second situation, s ( j ) is approximately 0, i.e.,
PPT Slide
Lager Image
, and thus it is uncertain whether the j th sample should be assigned to the i th or second-best choice cluster. Lastly, the third case is the worst case, where s ( j ) is close to −1. In this situation, it is valid to not classify the j th sample into the ith cluster but the second-best choice cluster.
The silhouette statistic used to determine the proper number of clusters is defined as
PPT Slide
Lager Image
After calculating the silhouette statistics for c = 2, 3,..., the number of clusters that maximizes them is finally selected.
- 2.3 Slope statistic
Slope statistic [15] , proposed by Fujita et al., is based on the silhouette statistic described in the previous subsection. The basic idea of slope statistic is that the optimal cluster number has the maximum silhouette statistic and if the cluster number is larger than the optimum, the silhouette statistic decreases sharply. Based on this idea, slope statistic is defined as
PPT Slide
Lager Image
where p is a positive constant and controls the weight size for the two terms,
PPT Slide
Lager Image
. The reason for employing the silhouette approach in the construction of slope statistic is that the former considers both the inner and inter dissimilarity of each target sample. Using slope statistic, the proper number of clusters is determined as
PPT Slide
Lager Image
In this paper, the cluster number that maximizes slope statistic is determined for k -means clustering.
3. Clustering-Based Fault Detection
In the clustering-based FD method, after applying k -means algorithm to normal samples, unseen samples that do not match with the normal samples are regarded as fault samples. The advantages of clustering-based techniques are that FD can be performed in unsupervised mode and the computation time in the test phase is fairly short [19] .
- 3.1 Fault score
In order to detect fault samples, FSs are assigned to unseen samples according to the distances between the samples and their closest cluster centers [20] . The following describes the procedures for imposing FS on a new sample x new . First, among c cluster centers, the nearest center c k to x new is found by
PPT Slide
Lager Image
Subsequently, the average distance lk between c k and the training samples included in Gk is calculated as
PPT Slide
Lager Image
Finally, the FS of x new is defined as
PPT Slide
Lager Image
FS measures the ratio of the dissimilarity between x new and its nearest center c k to the average distance lk . The larger FS, the farther x new is from its closest center. As explained in the next subsection, if FS exceeds the predefined threshold values, the corresponding samples are determined as fault samples and alarm signals are generated.
- 3.2 Threshold setup for fault score
This subsection provides the procedures to set up threshold value T for alarm signal generation. First, as presented in (14), mean distances lk , k = 1,..., c , are calculated. Subsequently, FS is imposed on each training sample using (15). In other words, FSs, FSj , j = 1,..., n , of the training samples are calculated by substituting the j th training sample x j into (15) instead of x new . Finally, threshold values of FS are determined using FSj .
In clustering-based FD, only the upper threshold value is considered because the possibility of fault increases when FS is large. Upper threshold T = m + ζσ is generally used when FSj follow Gaussian distribution, where m and σ are the mean and standard deviation of FSj , respectively, and ζ is a positive integer [1] . The probability that FS of an arbitrary sample exceeds the upper threshold is equal to 0.07933, 0.011375 and 0.000675 for ζ = 1, 2, and 3, respectively. As shown in Section 5, FSj follow a distribution where the right tail is longer than the left. In this paper, the 95th, 97th and 99th percentiles of FSj are employed as threshold values for FD.
- 3.3 Exponentially weighted moving average
In the test phase, if an alarm signal is generated based only on the current FS, it is assumed that the current and previous FS are independent. In this case, the false alarm rate could increase because alarm signals occur regardless of the historical trend. In this paper, EWMA, which is widely used to smooth a time-series data, is employed to consider the trend of FSs. EWMA gives more weights to latest time-series and these weights decrease exponentially for older data. Using EWMA, the smoothed version of FSs at time t , EWMAw ( t ), is calculated as
PPT Slide
Lager Image
where FS ( t ) is FS at time t , α is a smoothing factor commonly calculated by
PPT Slide
Lager Image
and w is window size. After calculating EWMA, an alarm signal is generated if it exceeds predefined threshold values. Alarm signal generation using EWMA could reduce false alarms because historical trends of FSs are considered.
- 3.4 Overview of the proposed approach
Table 1 summarizes the procedure for the proposed clustering-based FD approach. The procedure is divided into “training” and “test” phases. In the “training” phase, after determining the proper number of clusters, the training samples collected from the normal target system are partitioned using k -means algorithm. Subsequently, the FS of each training sample is calculated and three threshold values (i.e., 95th, 97th and 99th percentiles) are set up. Depending on the strength of FS, three different alarm signals (i.e., “Caution”, “Alert” and “Critical”) are generated. In the “test” phase, after assigning FS to each test sample, alarm signals occur sequentially. If EWMA is not used, an alarm signal for each test sample is generated independently according to the strength of FS. If EWMA is employed, after calculating EWMAw ( t ) using (16), three different alarms are generated.
Clustering-based FD algorithm
PPT Slide
Lager Image
Clustering-based FD algorithm
4. Description of Target System: 200 MW Coal-Fired Power Plant
The target system of this study is a 200 MW coal-fired TPP. Fig. 2 shows an example of the DCS screen in the plant. To verify the performance, the proposed method is applied to two failure cases collected from the target DCS.
PPT Slide
Lager Image
Example of DCS screen in 200 MW coal-fired power plant
- 4.1 Coal-fired thermal power plant
In the coal-fired power plant, after transforming feedwater into steam through the thermal energy produced from the combustion of bituminous coal, electricity is generated by driving the steam turbine and generator. Fig. 3 shows a simplified schematic diagram of the target TPP. The steam boiler raises steam by heating feedwater using thermal energy converted from fossil fuel. The steam boiler follows the thermodynamic steam cycle, i.e., Rankine cycle, which is a practical implementation of the ideal Carnot cycle [21] . Steam, an important medium for producing mechanical energy, can be generated from abundant water, does not react much with the materials of the power plant equipment and is stable at the required operation temperature in the power plant [22] .
PPT Slide
Lager Image
Simplified schematic diagram for target TPP
Bituminous coal pulverized in advance is transformed into thermal energy at the steam boiler furnace. Before flowing into the drum, feedwater is preheated by passing through a series of low- and high-pressure heaters and economizer. The heater and economizer raise feedwater by extraction steam from the turbine and high-temperature flue gas, respectively. These preheating steps improve the efficiency of the entire cycle. The drum supplies feedwater that will be converted to steam and temporarily stores the steam produced by the evaporator. The saturated steam by evaporator contains a small amount of moisture. A superheater converts the steam into the high-purity and high-pressure and temperature superheated steam that will be supplied to the turbine.
In the turbine, the superheated steam expands, turbine blades are rotated and thermal energy is transformed into mechanical energy. The rotating turbine blades drive the electric generator and three-phase electric power is generated. After performing mechanical works at the high-pressure turbine, the steam is reheated by a reheater and supplied to the intermediate-pressure turbine. The steam that exits from the low-pressure turbine is condensed into condensate water and stored at a condenser’s hotwell. The condensate water is boosted by a condensate pump and it passes through a low-pressure feedwater heater. Subsequently, the water is deaerated by a deaerator and boosted by the feedwater pump. The boosted water passes through a high-pressure heater and economizer and it is fed into the boiler again.
- 4.2 Boiler tube leakage
Failure from one or more tubes in the boiler can be detected by sound and either by an increase in the make-up water requirement (indicating a failure of the water-carrying tubes) or by an increased draft in the superheater or reheater areas (due to failure of the superheater or reheater tubes) [23] . The boiler tubes can be influenced by several damage processes such as inside scaling, waterside corrosion and cracking, fireside corrosion and/or erosion, stress rupture due to overheat and creep, vibration-induced and thermal fatigue cracking, and defective welds [24] .
Tube leakage from a pin-hole might be tolerated because of an adequate margin of feedwater and the leakage can be corrected after suitable scheduled maintenance. However, if the boiler is continuously operated with the leakage, much of the pressurized fluid will eventually leak and cause severe damage to neighboring tubes. Tube leakage of boiler, superheater and reheater could result in a serious efficiency decline. In the short term, tube leakage of superheater and reheater is more serious than that of boiler. When severe tube leakage occurs, maintaining the boiler drum level properly is difficult. If leaking water is spilled onto the furnace, coal combustion is disturbed. In these cases, the plant should be shut down immediately.
In this paper, two unplanned shutdown cases due to boiler tube leakage are employed to demonstrate the validity of the proposed method.
5. Experiment Results
This section provides the results of applying the proposed clustering-based FD approach to the two unscheduled shutdown cases.
- 5.1 Data preparation
Table 2 lists the number of training and test samples and number of monitored variables for the two failure cases. In Table 2 , each sample is recoded in 5-minute intervals and the training samples are gathered from a normally operating target system. After applying the “training” phase in Table 1 to the training samples, the performance of the proposed approach is evaluated by the test samples. Among hundreds of variables, 13 monitored variables are selected based on expert knowledge to detect boiler tube leakage at an early stage. The same variables are selected in Cases 1 and 2. Table 3 summarizes the 13 monitored variables selected by human experts.
Summary from two unplanned shutdown cases due to boiler tube leakage
PPT Slide
Lager Image
Summary from two unplanned shutdown cases due to boiler tube leakage
Summary of monitored variables for boiler tube leakage in 200 MW TPP
PPT Slide
Lager Image
Summary of monitored variables for boiler tube leakage in 200 MW TPP
Before performing data clustering, z -score standardization is applied to each variable as
PPT Slide
Lager Image
where X and X * are the original and standardized values, respectively, and E [·] and STD [·] are the expectation and standard deviation operators, respectively. After standardization, the mean and variance of each variable are equal to 0 and 1, respectively. One of the reasons for applying standardization is that the values of the mean and variance of each variable are different from each other.
- 5.2 Determining proper number of clusters
In this study, as described in subsection 2.3, slope statistic is employed to determine the proper number of clusters. After partitioning the training samples and calculating the silhouette statistic with an increase in the number of clusters from c = 2,..., C max , the slope statistic is computed, where C max and positive constant p in (11) are set to 10 and 2, respectively. Figs. 4 and 5 show the plot of the number of clusters versus the silhouette and slope statistics in Cases 1 and 2, respectively. As shown in Figs. 4 and 5 , the numbers of clusters that maximize the slope statistic in Cases 1 and 2 are 3 and 2, respectively. We can confirm that the slope statistic drops sharply in Cases 1 and 2 when the numbers of clusters increase from 3 to 4 and 2 to 3, respectively. In Cases 1 and 2, the training samples are respectively grouped into 3 and 2 clusters using k -means algorithm. Fig. 6 shows the silhouette plots for the results of clustering in Cases 1 and 2. In Fig. 6 , the samples whose silhouette values are negative are indicated by dotted red lines. It is appropriate for these samples to be classified into second-best choice clusters. Fig. 7 shows the results of applying k -means clustering to the training samples of Cases 1 and 2 in a three-dimensional space (i.e., X 5 , X 10 and X 11 ).
PPT Slide
Lager Image
Silhouette and slope statistics for Case 1: (a) Silhouette statistic; (b) Slope statistic
PPT Slide
Lager Image
Silhouette and slope statistics for Case 2: (a) Silhouette statistic; (b) Slope statistic
PPT Slide
Lager Image
Silhouette plots: (a) Case 1; (b) Case 2
PPT Slide
Lager Image
Results of applying k-means algorithm to training samples of: (a) Case 1; (b) Case 2
- 5.3 Threshold values of fault score
As described in subsection 3.2, after performing k -means clustering, the FS of each training sample is calculated and the threshold values are also determined. The distribution of the FSs is asymmetric, where the right tail is longer than the left. In this paper, 95th, 97th and 99th percentiles are employed for setting the threshold values. Fig. 8 shows the histograms of FSs of the training samples and their percentiles in Cases 1 and 2. In Fig. 8 , solid red lines indicate nonparametric kernel smoothing of the histograms of FSs and vertical dotted yellow-green, blue and red lines correspond to the 95th, 97th and 99th percentiles of FSs, respectively. In Cases 1 and 2, the calculated 95th, 97th and 99th percentiles are 1.6986, 1.8499 and 2.2570, and 1.6499, 1.8821 and 2.2766, respectively. The reason for calculating three different percentiles is to generate diverse alarms based on the strength of FSs. For example, for an unseen sample, if its FS exceeds the 95th, 97th or 99th percentile, the “Caution”, “Alert” or “Critical” alarm occurs.
PPT Slide
Lager Image
Histograms of FS and threshold values: (a) Case 1; (b) Case 2
- 5.4 Results of fault detection
In Case 1, EWMA is not employed because of its low false alarm rate. As explained in the previous subsection, after setting the threshold values, the FSs of unseen samples are calculated and alarm signals are generated when FSs exceed the threshold values. FS of a normal sample do not exceed threshold values. Fig. 9 shows the FSs of test samples in Case 1 and their alarm signals. In Fig. 9 , unscheduled shutdown time due to boiler tube leakage is indicated by vertical solid dotted red lines. The horizontal dotted yellow-green, blue and red lines shown in Fig. 9 (a) represent the 95th, 97th and 99th percentiles, respectively, and “Caution”, “Alert” and “Critical” alarms are indicated by yellow-green circles, blue triangles and red points, respectively. In Fig. 9 (b) , there are several improbable false alarms ignored in the real DCS. Fault regions where alarm signals occur intensively are indicated by shaded red regions and enlargements of such regions and their neighborhood are presented in Figs. 10 and 11 .
PPT Slide
Lager Image
FD results for Case 1: (a) FSs (b) alarm signals
PPT Slide
Lager Image
Enlargement of “fault region 1” in Fig. 9: (a) FSs; (b) alarm signals
PPT Slide
Lager Image
Enlargement of “fault region 2” in Fig. 9: (a) FSs; (b) alarm signals
As shown in Figs. 9 (a) and 10 , for a period that lasts approximately 3 hours, alarm signals occur intensively approximately 66 hours before the unplanned shutdown due to tube leakage. In Fig. 11 , because of the dramatic increases of FSs, alarm signals are generated for 30 minutes immediately before the unscheduled shutdown. Fig. 12 shows fault samples that correspond to “Critical” alarms of the fault regions 1 and 2 in a three-dimensional space (i.e., X 5 , X 10 and X 11 ). As shown in Fig. 12 , the behavior of the fault samples with “Critical” alarms is extremely inconsistent with that of the normal samples. In the fault region 1, intensive “Critical” alarms occur because of abnormal patterns of the reheater pressure and feedwater flow. The fault region 2 is an early warning region where the condenser make-up flow increases enormously.
PPT Slide
Lager Image
Fault samples with “Critical” alarms in Case 1: (a) fault region 1; (b) fault region 2
In Case 2, EWMA is employed for alarm signal generation. In EWMA, window size w in (16) is set to 6, i.e., the six most recent FSs from past to present are considered for calculating the present EWMA value for FD. Fig. 13 represents FSs and their EWMA values for the test samples in Case 2 and their alarm signals. In Fig. 13 , vertical solid dotted red lines indicate the unplanned shutdown time caused by tube leakage. In Fig. 13 (a) , the 95th, 97th and 99th percentiles are denoted by yellow-green, blue and red dotted horizontal lines, respectively, and EWMA values of FSs are indicated by a solid purple line. Figs. 13 (b) and (c) correspond to alarm signals without and with EWMA, respectively. Compared with Fig. 13 (b) , numerous implausible false alarms are removed in Fig. 13 (c) . The main reason is that the trend of FSs is considered in the EWMA-based FD. The shaded red regions in Fig. 13 (a) designate two fault regions where considerable alarm signals occur and magnification of the regions and their vicinity is illustrated in Figs. 14 and 15 .
PPT Slide
Lager Image
FD results for Case 2: (a) FSs; (b) alarm signals without EWMA (c) alarm signals with EWMA
PPT Slide
Lager Image
Enlargement of “fault region 1” in Fig. 13: (a) FSs; (b) alarm signals without EWMA; (c) alarm signals with EWMA
PPT Slide
Lager Image
Enlargement of “fault region 2” in Fig. 13: (a) FSs; (b) alarm signals without EWMA; (c) alarm signals with EWMA
As illustrated in Figs. 13 (a) and 14 , for a period that lasts approximately 40 hours, alarm signals occur intensively approximately 4 days before the unscheduled shutdown. In Fig. 15 , “Caution” and “Critical” alarms occur considerably approximately 3 hours and 40 minutes immediately before the unplanned shutdown, respectively. Fig. 16 illustrates fault samples with “Critical” alarms of the fault regions in a three-dimensional space. As indicated in Fig. 16 , the geometric patterns of the fault samples are completely dissimilar from those of normal samples. In the fault region 1, considerable “Critical” alarms occur because the reheater pressure and feedwater flow decline rapidly and the condenser make-up flow increases sharply. The fault region 2 corresponds to an early warning region where the condenser make-up flow increases gradually.
PPT Slide
Lager Image
Fault samples with the “Critical” alarms in Case 2: (a) fault region 1; (b) fault region 2
- 5.5 Performance evaluation and comparison
In this subsection, we present the results of the performance comparison between the proposed method and PCA-based fault detection method using four evaluation measures. The closer the evaluation measures are to 1, the better the results are. The PCA-based method has been successfully applied to technical processes such as centrifugal chiller [25] , thermal power plant [2] , helical coil steam generator [26] , continuously stirred tank reactor [27] and self-powered neutron detectors [28] . For the test samples, the four evaluation measures, i.e., accuracy (ACC), sensitivity (SEN), specificity (SPE), and precision (PRE), are calculated as follows [20] :
PPT Slide
Lager Image
PPT Slide
Lager Image
PPT Slide
Lager Image
PPT Slide
Lager Image
where
P the number of fault samples; N the number of normal samples; TP the number of samples correctly detected as fault samples; TN the number of samples correctly determined as normal samples; FP the number of samples incorrectly detected as fault samples; FN the number of samples incorrectly determined as normal samples.
In PCA, the cumulative percent variance technique is used to decide the proper number of principal components, and Hotelling’s T 2 statistic is employed for fault detection index. If the T 2 statistics of test samples are larger than or equal to
PPT Slide
Lager Image
, where α = 1%, the samples are detected as fault samples. In the proposed method, the samples that satisfy the condition, FS ( t )≥ T 99th (or EWMAw ( t ) ≥ T 99th ), are decided as fault samples. Tables 4 and 5 summarize the results of performance comparison in Cases 1 and 2, respectively. As listed in Tables 4 and 5 , with the exception of sensitivity, the proposed method exhibits superior performance compare to PCA-based method.
Performance comparison for Case 1
PPT Slide
Lager Image
Performance comparison for Case 1
Performance comparison for Case 2
PPT Slide
Lager Image
Performance comparison for Case 2
6. Conclusion
In this paper, a clustering-based FD method was proposed for the steam boiler in a 200 MW TPP. Failure cases due to boiler tube leakage were collected from the target DCS and main monitored variables for leakage detection were selected based on expert empirical knowledge. In the proposed method, after applying k -means algorithm to training samples, FSs are assigned to test samples based on the distances between the samples and their closest cluster centers. To determine the proper number of clusters, slope statistic, an advanced version of silhouette statistic, is employed. The 95th, 97th and 99th percentiles for FSs of the training samples were used for threshold settings and three different alarm signals for unseen samples were generated according to the strength of their FSs. In a second failure case, EWMA was used to consider the trend of FSs.
The main advantages of the proposed method are summarized as follows. First, the proposed method did not require labeled training samples because unsupervised learning was employed. Second, the computation time in the test phase was fairly short because simply calculating the distance between unseen samples and their nearest cluster centers were required. In addition, more flexible FD was possible based on the strength of FSs because three different threshold values were set up using the 95th, 97th and 99th percentiles. Lastly, using EWMA to consider the trend of FSs, false alarms could be easily reduced.
To demonstrate the effectiveness, the proposed method was applied to collected failure cases. The experiment results showed that the proposed method can detect fault samples whose features are markedly different from those of normal samples. In addition, early detection of faults immediately before an unplanned shutdown was achieved successfully.
In this work, we only focus on FD that determines whether a fault has occurred. In future research, we will combine fault identification step with the proposed method to confirm monitored variables relevant to the fault.
Acknowledgements
This work was supported by the Energy Efficiency & Resources Core Technology Program of the Korea Institute of Energy Technology Evaluation and Planning (KETEP) granted financial resource from the Ministry of Trade, Industry & Energy, Republic of Korea (No. 20151110200040)
BIO
Jungwon Yu He received the B.S. and M.S. degrees from the Department of Electrical and Computer Engineering from Pusan National University (PNU), Busan, Korea, in 2012 and 2014, respectively, and is currently pursuing the Ph.D. degree in the Department of Electrical and Computer engineering at PNU. His research interests include time series analysis, data mining and fault detection and diagnosis, etc.
Jaeyel Jang He received the M.S. degrees from the Graduate School of Information Security from Korea University, Seoul, Korea, in 2013. He is currently a deputy general manager at the Technology & Information Department, Technical Solution Center, Korea East-West Power Co., Ltd. His research interests include multi-variable control, data mining and fault diagnosis, etc.
Jaeyeong Yoo He received the B.S. degrees in Electrical Engineering from Yonsei University, Korea, in 1982. He has been a senior researcher at LSIS Co., Ltd. from 1983 to 1991. He is currently a chief technology officer (CTO) at the XEONET Co., Ltd. His research interests include process controller design, fault diagnosis and prognosis, etc.
June Ho Park He received the B.S., M.S. and Ph.D. degrees from Seoul National University, Seoul, Korea in 1978, 1980, and 1987, respectively, all in electrical engineering. He is currently a Professor at the School of Electrical Engineering, Pusan National University, Busan, Korea. His research interests include intelligent systems applications to power systems. Dr. Park has been a member of the IEEE Power Engineering Society.
Sungshin Kim He received his B.S. and M.S. degrees in Electrical Engineering from Yonsei University, Korea, in 1984 and 1986, respectively, and his Ph.D. degree in Electrical Engineering from the Georgia Institute of Technology, USA, in 1996. He is currently a professor at the Electrical Engineering Department, Pusan National University. His research interests include fuzzy logic controls, neuro fuzzy systems, neural networks, robotics, signal analysis, and intelligent systems.
References
Patan K. 2008 Artificial neural networks for the modelling and fault diagnosis of technical processes Springer
Ajami A. , Daneshvar M. 2012 “Data driven approach for fault detection and diagnosis of turbine in thermal power plant using Independent Component Analysis (ICA),” Int. J. of Elect. Power & Energy Syst. 43 (1) 728 - 735    DOI : 10.1016/j.ijepes.2012.06.022
Hsu C. C. , Su C. T. 2010 “An adaptive forecast-based chart for non-Gaussian processes monitoring: with application to equipment malfunctions detection in a thermal power plant,” IEEE Trans. Control Syst. Technol. 19 (5) 1245 - 1250
Cai J. , Ma X. , Li Q. 2009 “On-line monitoring the performance of coal-fired power unit: A method based on support vector machine,” Appl. Thermal Eng. 29 (11-12) 2308 - 2319    DOI : 10.1016/j.applthermaleng.2008.11.012
Chen K. Y. , Chen L. S. , Chen M. C. , Lee C. L. 2011 “Using SVM based method for equipment fault detection in a thermal power plant,” Comput. in Ind. 62 (1) 42 - 50    DOI : 10.1016/j.compind.2010.05.013
Shashoa N. A. A. , Kvašščev G. , Marjanović A. , Djurović Ž. 2013 “Sensor fault detection and isolation in a thermal power plant steam separator,” Control Eng. Practice 21 (7) 908 - 916    DOI : 10.1016/j.conengprac.2013.02.012
Li F. , Upadhyaya B. R. , Coffey L. A. 2009 “Modelbased monitoring and fault diagnosis of fossil power plant process units using group method of data handling,” ISA Trans. 48 (2) 213 - 219    DOI : 10.1016/j.isatra.2008.10.014
Prasad G. , Swidenbank E. , Hogg B. W. 1999 “A novel performance monitoring strategy for economical thermal power plant operation,” IEEE Trans. Energy Convers. 14 (3) 802 - 809    DOI : 10.1109/60.790955
Guo S. , Wang J. , Wei J. , Zachariades P. 2014 “A new model-based approach for power plant Tube-ball mill condition monitoring and fault detection,” Energy Conversion and Manage. 80 10 - 19    DOI : 10.1016/j.enconman.2013.12.046
Ahmed M. , Mahmood A. N. , Islam M. R. 2016 “A survey of anomaly detection techniques in financial domain,” Future Generation Comput. Syst. 55 278 - 288    DOI : 10.1016/j.future.2015.01.001
Ahmed M. , Mahmood A. N. , Hu J. 2016 “A survey of network anomaly detection techniques,” J. of Network and Comput. Applicat 60 19 - 31    DOI : 10.1016/j.jnca.2015.11.016
Costa K. A. P. , Pereira L. A. M. , Nakamura R. Y. M. , Pereira C. R. , Papa J. P. , Falcão A. X. 2015 “A natureinspired approach to speed up optimum-path forest clustering and its application to intrusion detection in computer networks,” Inform. Sci. 294 95 - 108    DOI : 10.1016/j.ins.2014.09.025
Li H. , Achim A. , Bull D. 2012 “Unsupervised video anomaly detection using feature clustering,” IET Signal Process. 6 (5) 521 - 533    DOI : 10.1049/iet-spr.2011.0074
Zhao J. , Liu K. , Wang W. , Liu Y. 2014 “Adaptive fuzzy clustering based anomaly data detection in energy system of steel industry,” Inform. Sci. 259 335 - 345    DOI : 10.1016/j.ins.2013.05.018
Fujita A. , Takahashi D. Y. , Patriota A. G. 2014 “A nonparametric method to estimate the number of clusters,” Computational Stat. & Data Anal. 73 27 - 39    DOI : 10.1016/j.csda.2013.11.012
Jang J. S. R. , Sun C. T. , Mizutani E. 1997 Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence Prentice Hall
Kim K. B. , Song D. H. 2015 “Automatic Intelligent Asymmetry Detection Using Digital Infrared Imaging with K-Means Clustering,” Int. J. of Fuzzy Logic and Intelligent Syst. 15 (3) 180 - 185    DOI : 10.5391/IJFIS.2015.15.3.180
Rousseeuw P. J. 1987 “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” J. of Computational and Appl. Math. 20 53 - 65    DOI : 10.1016/0377-0427(87)90125-7
Chandola V. , Banerjee A. , Kumar V. 2009 “Anomaly detection: a survey,” ACM Computing Surveys (CSUR) 41 (3)
Han J. , Kamber M. , Pei J. 2011 Data mining: concepts and techniques Elsevier
Flynn D. 2003 Thermal power plant simulation and control IET
Raja A. K. 2006 Power plant engineering New Age Int.
Sarkar D. 2015 Thermal power plant design and operation Elsevier
Oakey J. E. 2011 Power plant life management and performance improvement Elsevier
Wang S. , Cui J. 2005 “Sensor-fault detection, diagnosis and estimation for centrifugal chiller systems using principal-component analysis method,” Appl. Energy 82 (3) 197 - 213    DOI : 10.1016/j.apenergy.2004.11.002
Zhao K. , Upadhyaya B. R. 2006 “Model based approach for fault detection and isolation of helical coil steam generator systems using principal component analysis,” IEEE Trans. Nucl. Sci. 53 (4) 2343 - 2352    DOI : 10.1109/TNS.2006.876049
Harroua F. , Nounoua M. N. , Nounoub H. N. , Madakyaru M. 2013 “Statistical fault detection using PCAbased GLR hypothesis testing,” J. of Loss Prevention in the Process Ind. 26 (1) 129 - 139    DOI : 10.1016/j.jlp.2012.10.003
Penga X. , Lib Q. , Wanga K. 2015 “Fault detection and isolation for self powered neutron detectors based on Principal Component Analysis,” Ann. of Nucl. Energy 85 213 - 219