In this paper, we describe a novel method for document layout analysis that is based on a Fuzzy Energy Matrix (FEM). A FEM is a twodimensional matrix that contains the likelihood of text and nontext and is generated through the use of Fuzzy theory. The key idea is to define an Energy map for the document to categorize text and nontext. The proposed mechanism is designed for execution with a lowresolution document image, and hence our method has a fast processing speed. The proposed method has been tested on public ICDAR 2009 datasets to conduct a comparison against other stateoftheart methods, and it was also tested with Korean documents. The results of the experiment indicate that this scheme achieves superior segmentation accuracy, in terms of both precision and recall, and also requires less time for computation than other stateoftheart document image analysis methods.
1. INTRODUCTION
In recently years, high resolution phone cameras have been developed and are widely used in the smart phones to get better image. The document recognition based on smart phone is also attractive theme for smart phone user. Therefore new application focus on algorithm time consuming in order to lunch smart phone environment. In the document recognition field, document image layout analysis is necessary preprocessing part. Usually, a scenario of document recognition consists of three tasks. First, a document layout analysis application segments text and nontext. Second, text lines and words are categorized. Finally, the OCR engine recognizes word information. Therefore layout analysis algorithm should be fast and precise for next part. However traditional document analysis approaches
[6]

[10]
tend to undervalue time consuming and made algorithm heuristic for increasing performance. So our algorithm considers not only performance but also processing time.
In the past few decades various document layout analysis method have been proposed using different paradigms. There are generally two models for documents analysis: bottomup and topdown. The bottomup approach relies on lowlevel features such as pixel and contented component result. In the document analysis, generally, categorized components can be merged into successively group of blocks represent paragraph or figure
[6]

[9]
,
[11]
. Whereas, the topdown approaches divide page into group of blocks, which are successively split into subblocks, in an iterative mechanism, to obtain the text, figure
[10]
. Usually, topdown method is faster than bottomup but it is less accurate than bottomup.
In the bottomup approaches, Koich at al
[9]
proposed a document analysis algorithm using Voronoi diagram. In order to reduce processing time, they focus on text segmentation because quality of segmented nontext is not important in the OCR engine process for recognition. Therefore our method also does not spend much time for accurately detecting nontext. Laura et al
[8]
utilized a neurofuzzy methodology to segment document. The proposed method strategy is capable of describing the physical structure of document. They extract pixel level features using fuzzy set and then after training, an adaptive neuro fuzzy network algorithm recognizes text and nontext. In
[11]
used topdown approach based on the connected component extraction. The proposed document analysis model, which preserves topdown generation, is proposed based on which a document is logically represented for interactive editing, storage, retrieval, transfer, and logical analysis. The authors have demonstrated pretty good results. Henry at el
[10]
extracts background region in the document image using computationalgeometry algorithms for offline enumeration of maximal white rectangles and online rectangle unification. This algorithm is simple and fast however algorithm is so heuristic. In the limited datasets, heuristic algorithm can reach to high performance but it cannot expect good performance in different environment.
In our work, the fuzzy theory with rectangular membership function is used for just converting continuous input values to several grade values in order to make FEM. The reason for taking fuzzy theory is that the continuous input values are not suitable for generating FEM.
The remainder of our paper consists of following sections. In Section 2, we introduce the fuzzy set theory. Section 3 presents the method for extracting component level features from connected component algorithm result. Section 4 provides detail for generating fuzzy energy matrix. Section 5 briefly demonstrates postprocessing algorithm. In Section 6, we show the experimental results with public datasets, and we conclude our paper in Section 7.
2. FUZZY THEORY
A Fuzzy set is a set whose elements have levels of membership represented by a real number in the interval [0, 1]. In order to represent imprecise vectors, fuzzy set is very powerful tool that uses degree of variables rather than quantitative variables. For example weather condition may have a value such as ‘1: good’ or ‘0: bad’ that are not clearly defined because weather has various conditions. But the fuzzy set can represent binary weather conditions as meaningful classifications
[1]
. The definition of fuzzy set can be given as follows:
Definition.
Let U be the universe of discourse, where U=
u
_{1}
,
u
_{2}
,…
u_{n}
. A fuzzy set is determined depending on universe of discourse and sub intervals. The Fuzzy set A can be defined on the universe of discourse follow as;
Where
f_{A}
denotes the membership function of the fuzzy set A,
f_{A}
: U → [0 ,1], and
f_{A}
(
x,u_{i}
) denotes the degree of membership function of the crisp interval
u_{i}
with input linguistic value x, and
f_{A}
(
x,u_{i}
) ∈ [0 1] and 1 ≤ i ≤ n , linguistic values x should be assigned to each fuzzy set
[2]
. Here
f_{Ai}
(
x,u_{j}
) is a rectangular membership degree and it is shown in equation (2)
In equation the observations of features are converted to the fuzzy set has got the highest degrees of membership value. In our study, we changed quantitative features of text and nontext into degree of variables using Fuzzy set rules.
3. FEATURE EXTRACTION
The aim of this section is to extract features, which represent text and nonnext. To extract features, an input image is downsampled to 1/5 because the lowresolution document image is enough to segment document and can reduce time expense. We have utilized combined 2 binarization results using otsu
[3]
and sauvola
[4]
. Each binarization technique has advantages: the otsu can clearly keep figure region and the sauvola can keep table and separator therefore we used linear combination result with morphological filling operation as an input image.
Fig. 1
illustrates this process. Given binary input image, the connected component algorithm is applied to binary image. In order to categorize between text and nontext, we proposed a feature extraction function (3) based on spatial properties of the connected components
Initialization input image (a) Otsu (b) Sauvola (c) Linear combining result between Otsu and Sauvola
Where
H,W
, height and width of component respectively. The
α
and
β
are weight value. The function
F
is designed to increase energy when component is nontext. Usually nonnext has not uniformly variation in comparison with text and gap of scalar between
H
and
W
is bigger than text. The function F depends on
H
value because our input image is lowresolution image so we often see distorted width features from touched small text horizontally whereas most of height features keep its property despite lowresolution environment as in
Fig. 2
. In this part, we just extract elemental features based on geometrical properties so the function F is not enough for categorizing component into text and nontext. The reason is because feature of several nontexts have similar features with big text. This problem is main challenge task in the document analysis field. We will figure out problem using Fuzzy Energy Matrix in the next section.
Binarization with lowresolution image (a) original (b) binary result.
4. FUZZY ENERGY MATRIX (FEM)
In this section, the ultimate goal is to generate the Fuzzy Energy Matrix to classify text and nontext. The FEM is twodimensional matrix representing the grade of membership of
F
and location in the document. As mentioned in the previous section, the function
F
is not enough to recognize between big text and figure so we included a location feature in the document. These combining features can provide that how many components, which have similar
F
, are located in similar level space to us. The location of components is an efficient feature for recognizing big text from figure because most of big texts are arranged in a row. Therefore, we will generate the FEM using based on mentioned properties.
 4.1 Fuzzification
After connected component algorithm, we extracts two features, which are height coordinate of vector and
F
, and then extracted quantitative features are transformed into “grade of membership” using universe of discourse and sub intervals. In the fuuzy set theory, this process is called “Fuzzification”. The lengths of intervals used in partitioning universe of discourse are determined subjectively. The following Fuzzification algorithm can be given:

Step 1. Define the two universe of discourse and subintervals. Based on document height.
Where h is document height. The
U
and
V
are universe of discourses based on function
F
and image location, respectively.
Step 2. The length of intervals is defined as
n
= 15 and equation can be defined follows;

vi,ui= [L_boundi,R_boundi]
Where Ud represents each universe of discourse.
[
R_{boundi}
] and [
L_{boundi}
] are boundary (right/ left) of intervals.
Step 3. We fuzzified input features, and pseudocode can be given:
The fuzzified vectors (
Fv
_{1}
,
Fv
_{2}
) based on the length of intervals are applied to generate Fuzzy Energy Matrix in the next section.
 4.2 Matrix Generation
The Fuzzy Energy Matrix represents 2dimensional probability matrix for classifying between text and nontext based on Fuzzified vectors from previous section. In order to generate FEM, first we make a
n ×n
matrix which represents fuzzy grade for
F, location
. And then whole of fuzzified vectors are assigned to level of matrix and then we count frequency corresponding to each grade. In the
Table 1
, we can observe that how many components are located in each fuzzy grade and assume that fuzzy level space including large energy is text. Whereas isolated small energy is nontext because when similar F values are located in similar grade in the Fuzzy Energy Matrix, it is strong feature representing the text. The integrated values in
Table 2
are normalized to range 0~1 using sigmoid function
[5]
. The normalization equation can be given:
Fuzzy Energy Matrix with Document inFig. 3
Fuzzy Energy Matrix with Document in Fig. 3
Normalization Fuzzy Energy Matrix using Sigmoid inFig. 3
Normalization Fuzzy Energy Matrix using Sigmoid in Fig. 3
Where
γ
is gradient of curve and
X
is Threshold value considering text feature. In this study, we have used a
X
=4. If number of same F values are detected in same location more than
X
=4, it indicates a high probability with text. A pseudocode for FEM process can be given:
In
Fig. 3
, we can see segmented nontext energy with green coordinates from the
Table 1
,
Table 2
. And the
Fig. 3
(c) is a document energy corresponding to
n_FEM
. The text and nontext results are defined using document energy with T=0.5 (energy >0.5: text, energy <0 .5: nontext).
Classification text using FEM (a) original (b) binary result (c) Energy map with 3D (d) text segmentation result (e) nontext segmentation result (green coordinates are fuzzified values * we can check energy in the Table 1, Table 2)
5. POSTPROCESSING
In this section, the main focus was introducing on five tasks which are Table detection, noise, separator, generating boundary for text.
 5.1 Text boundary
In order to generate boundary region for text, we proposed a background extraction algorithm based on projection technique. This process is as follows:

Step 1. The morphology horizontal dilation algorithm is applied to the segmented text results (Fig. 4(a)).

Step 2. In the dilation result, projection technique with horizontal and vertical are utilized for extracting background. The stop condition is that when projection algorithm may be faced with text pixel. And then if length of projection result is more than (image height/2, width/2), the result is considered background.Fig. 4shows estimated background.

Step 3. Finally, we observe nontext pixels in the black regionFig. 4(c). If there are no nontext pixels, we made rectangle around text in the observing black region whereas the morphology result is just utilized for case of observed nontext.
Text boundary generation (a) segmented text (b) morphology result (c) background extraction (d) final result
 5.2 Heuristic filter for Nontext
First, in order to segment Table, our algorithm examines components, which has fuzzy grade
u_{1}
, are existed in the nontext region because most of table has a number of similar components arranged in the nontext region uniformly. In this study, when number of counted component
u_{1}
in the nontext more then 10, it considers table. When table doesn’t have lines, this way has strong advantage compare to traditional methods because we easily utilized geometric information of words arranged table using fuzzified values. Second, the separator is 2type which are horizontal and vertical. A ratio between horizontal and vertical is utilized to detect separator. If ratio less than 0.2, we have considered the separator. Finally, the noise components are eliminated according to size of component.
6. EXPERIMENTAL RESULTS
We have implemented the proposed algorithm using MATLAB 2012 on an Intel(R) Core(TM) i54670 CPU 3.40GHz. The performance of the proposed method is evaluated on public ICDAR 2009 datasets because they contain 55 document images and have been utilized in most comparative works
[12]
. Also we have utilized our 100 document images, which are 50 English and 50 Korean for evaluation with ground truth mask.
The DICE system comprises two main steps. First each pixel is categorized primarily into machineprinted text. Second, isolated vectors are eliminated. Third, open and close operations are used to remove small regions. Finally, interior pixels are removed and contours of polygons are extracted
[12]
. The Fraunhofer system includes hybrid methods, which are separator detection, page segmentation and Text line and region extraction
[11]
. The Tesseract is bottomup method submitted by Ray Smith of Google Inc. This method includes binary morphology and connected component analysis, to estimate the type, which are text, image, separator and unknown. Two of the key methods employed include neighbourhood strokewidth measurement and appropriateness of overlap between adjacent connected components
[13]
.
Fig. 5
demonstrates the performance using Fmeasure result with other stateoftheart methods. As mentioned previously, our algorithm focused on processing time however our performance is also good as compare to traditional methods in the
fig. 5
,
fig. 6
and average of proposed algorithm processing time is a 1.4 second in the MATLAB environment. Unfortunately, other stateoftheart methods didn’t mention their processing time. However the stateoftheart methods are operated on the fullresolution image so we can guess that their processing time may well exceed 3 second because while implementing connected component and binarization algorithm in the fullresolution document, algorithm requires processing time of at least 2 second in the ICDAR 2009 datasets (PRIMA). Whereas proposed algorithm have used lowresolution document image (scale 1/5) to decrease time consuming and our performance is not so bad.
Fig. 5
shows proposed algorithm performance with other stateoftheart methods using Fmeasure.
Fmeasure result with ICDAR 2009 datasets
PRIMA measure with ICDAR 2009 datasets
In the
Fig. 6
, the PRIMA measure for different region types for the four submitted methods. The PRIMA Lab (University of Salford, Manchester, United Kingdom) provides evaluation tool for PRMIA measure.
Fmeasure with 100 datasets with Korean and English
Fig. 8
illustrates visualization results of proposed method with 3D maps
Visualization results of proposed method with 3D map (a) ICDAR 2009 (b) Our datasets.
7. CONCLUSION
We have presented a document image analysis method using FEM with lowresolution image. In particular, our algorithm performance and processing time are good compare to the stateofartmethods however there are several drawbacks. First, the FEM algorithm can only detect text and nontext, in order to detect, table, separator, noise, it depends on heuristic postprocessing algorithm. As mentioned previously, a heuristic algorithm is not best way. Therefore we will figure out the first drawback using improvement in
F
equation. Second, proposed algorithm performance depends on initial binarization result. When character are touched each other in the binarization result, it may cause a critical effect on our proposed mechanism. In order to overcome second problem, we need extract a reasonable feature for recognizing touched character in the future works.
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2015018993).
BIO
Kang Han Oh
He received the B.S in Computer Science from Honam University in 2010. And he received the M.S in Electronic & Computer Engineering at Chonnam National University in 2013. He has been taking the Ph.D course in Electronics & Computer Engineering at Chonnam National University, Korea. His research interests are pattern recognition, machine learning and Image processing.
Soo Hyung Kim
He received his B.S degree in Computer Engineering from Seoul National University in 1986, and his M.S and Ph.D degrees in Computer Science from Korea Advanced Institute of Science and Technology in 1988 and 1993 respectively. From 1990 to 1996, he was a senior member of research staff in Multimedia Research Center of Samsung Electronics Co., Korea. Since 1997, he has been a professor in the Department of Computer Science, Chonnam National University, Korea. His research interests are pattern recognition, document image processing, medical image processing, and ubiquitous computing.
Bhupendra Kumar P.
,
Sanjay S.
2014
“Integrated FuzzyHMM for project uncertainties in timecost tradeoff problem,”
J. Applied Soft Computing
21
320 
329
DOI : 10.1016/j.asoc.2014.03.035
Otsu N.
1975
“A Threshold Selection method from Gray Level Histogram,”
IEEE Transactions on Systems, Man and Cybernetics
9
(1)
62 
66
Fletcher A.
,
Kasturi R.
1998
“A robust algorithm for text string separation from mixed text/graphics images,”
IEEE Trans. Pattern Anal. Mach. Intell.
10
294 
308
Anil K. J.
,
Bin Y.
1998
“Document representation and its application to page decomposition,”
IEEE Trans. Pattern Anal. Mach. Intell.
20
294 
308
DOI : 10.1109/34.667886
O’Gorman D. L.
1993
“The document spectrum for page layout analysis,”
IEEE Trans. Pattern Anal. Mach. Intell.
15
1162 
1173
DOI : 10.1109/34.244677
Laura C.
,
Ciro C.
,
Przemyslaw G.
2008
“Document page segmentation using neurofuzzy approach,”
Applied Soft Computing
8
118 
126
DOI : 10.1016/j.asoc.2006.11.008
Kise K.
,
Sato A.
,
Iwata M.
1998
“Segmentation of page images using the area Voronoi diagram,”
In Computer Vision Image Understanding
70
(3)
370 
382
DOI : 10.1006/cviu.1998.0684
Anil K. J.
,
Bin Y.
1998
“Document representation and its application to pagedecomposition,”
IEEE Trans. Pattern Anal. Mach. Intell.
20
294 
308
DOI : 10.1109/34.667886
Antonacopoulos A.
,
Pletschacher S.
,
Bridson D.
,
Papadopoulos C.
“ICDAR2009 Page Segmentation Competition”
Proc. ICDAR
2009
1370 
1374
Smith Ray
“Hybrid page Layout Analysis via TabStop Detection”
Proc. ICDAR
Barcelona, Spain
2009
241 
245