Advanced
Body Segmentation using Gradient Background and Intra-Frame Collision Responses for Markerless Camera-Based Games
Body Segmentation using Gradient Background and Intra-Frame Collision Responses for Markerless Camera-Based Games
Journal of Electrical Engineering and Technology. 2016. Jan, 11(1): 234-240
Copyright © 2016, The Korean Institute of Electrical Engineers
This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • Received : February 28, 2015
  • Accepted : September 30, 2015
  • Published : January 01, 2016
Download
PDF
e-PUB
PubReader
PPT
Export by style
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Jun-Geon Kim
Department of Electronics and Radio Engineering, Kyung Hee University, Yongin-si, Gyeonggi-do, Korea. (joongoen@khu.ac.kr)
Daeho Lee
Corresponding Author: Humanitas College, Kyung Hee University, Yongin-si, Gyeonggi-do, Korea. (nize@khu.ac.kr)

Abstract
We propose a novel framework for markerless camera-based games. By using a visual camera, our method may yield robust human body segmentation with high performance comparable to the segmentation using depth cameras. The edges of human bodies are detected by subtracting gradient backgrounds, and human body regions are segmented by the operations based on mathematical morphology. Collisions between detected regions and virtual objects are determined by finding the colliding time using intra-frame positions of virtual objects. Experimental results show that the proposed method may produce robust segmentation of human bodies, thereby and the collision responses are more accurate than previous methods. Therefore, the proposed framework can be widely used in camera-based games requiring high performance.
Keywords
1. Introduction
Human activity analysis has been a challengeable issue of computer vision research [1] . The capability of analyzing human activities has led to the development of various applications such as surveillance systems, remote control systems and computer games [2 - 14] . Especially, human movements are used as their gestures in camera-based games.
Human body segmentation is a classically difficult topic [9 , 10] . To segment human bodies, background subtraction is commonly used [2] , where the pixel intensities of backgrounds are modeled with a Mixture of Gaussian (MoG) distribution, and they are recursively updated [15 , 16] . Unfortunately, intensity backgrounds are sensitive to small intensity variations. Color segmentation [11 , 17] may be an alternative method for human detection, however only skin or particular color regions can be detected using this method. Also color variation may result in poor segmentation. To improve the performance of movement detection, motion captures (e.g., data gloves) or special markers may be used [18 - 20] . These cumbersome captures or markers are, however, always necessary to play the games.
In recent years, human activity analysis using depth cameras [21 , 22] , has been widely issued [24 - 26] . The depth information is retrieved using the time-of-flight (TOF) principle [23] or structured lighting. As shown in Fig. 1 , the use of depth cameras may yield accurate segmentation results, compared to the visual cameras; however, an expensive depth camera is always necessary and they work only in limited regions.
PPT Slide
Lager Image
Visual image (a) and its depth image (b).
This paper is aimed at developing a novel framework for markerless camera-based games by human body segmentation having high performance comparable to the depth cameras. To achieve this aim, we apply a novel background subtraction method using gradient and an intra-frame collision response. The proposed framework can be used adequately to markerless camera-based games with accurate collision responses.
2. Organization of Proposed Framework
The proposed framework uses only a visual camera connected to a PC as shown in Fig. 2(a) , where the background of playing regions is not severely cluttered. For human interacting with virtual objects in the game screen of Fig. 2(a) , first of all, the human bodies should be segmented in every frame. This procedure is performed by gradient background subtraction, which is composed of several steps including gradient map calculation of a frame, background subtraction, and morphological operations. The detail explanation on the process appears in section 3.
PPT Slide
Lager Image
The proposed framework.
Virtual objects may be generated by the game scenario and each piece moves with its velocity. When the virtual objects collide with segmented human body region in computer screen, the objects respond by intra-frame collision responses which is in detail depicted in section 5. Fig. 2(b) shows the entire logical flow of the proposed framework.
3. Gradient Background Subtraction
Gradients of an image are simply calculated by gradient operators such as Sobel, Roberts cross and Prewitt operators. In this paper, we use the Sobel operator because of its capability of noise suppression. To detect edges, the gradient magnitude is commonly used, but we use a gradient vector ∇ I , so the gradient background B is a 2D array of vectors. A gradient vector of a gray image It at time t is calculated by
PPT Slide
Lager Image
At the first frame, the gradient background B 0 is ∇ I 0 , and the background is updated over time as follows:
PPT Slide
Lager Image
where H t−1 denotes a binary human mask at time t −1 and is calculated after human body segmentation and α denotes the updating gain.
To detect evidences of human bodies, the gradient at the current frame is subtracted by the gradient background as follows:
PPT Slide
Lager Image
To determine the edges of human bodies, S t is binarized to S b1,t as follows:
PPT Slide
Lager Image
where τ denotes a threshold value. The second condition of (4) is very important. If this condition is omitted, false estimated background pixels cannot be corrected, because we use the binary human mask H in updating the background by (2). Fig. 3 shows an example of gradient background subtraction.
PPT Slide
Lager Image
An example of gradient background subtraction.
4. Human Body Segmentation
To detect human body regions, closed edges must be filled; however, S b1,t S has many unlinked edges. Thus, we apply some operations based on mathematical morphology as shown in Fig. 4 . We first apply dilation to S b1,t as follows:
PPT Slide
Lager Image
An example of human body segmentation.
PPT Slide
Lager Image
where ⊕ denotes the dilation operator, and A 1 is a structuring element shaped by a n × m cross.
Before region filling S b2,t small connected regions are removed using connected component analysis; S b3,t is the binary image after removing small regions. And S b4,t is the binary image after filling S b3,t . Because S b4,t is dilated, we apply erosion to S b4,t as follows:
PPT Slide
Lager Image
where ⊝ denotes the erosion operator. Finally, an opening operation is applied to S b5,t in order to split non body regions as follows:
PPT Slide
Lager Image
where A 2 is a structuring element shaped by an l × l square. By the morphological operations, human body regions S b6,t are segmented and this mask is used as H in the background estimation of (2).
5. Collision Detection and Response
To detect and respond between human bodies and virtual objects, we calculate the boundaries of S b6,t and the boundaries are smoothed by a Gaussian kernel as shown in the last image of Fig. 4 .
In [10 , 13] and [14] , collisions are detected by an overlapped ratio, and collision responses are estimated using collision planes. However, these methods do not consider intra-frame positions as shown in Fig. 5 ; that is, the positions of virtual objects only at discrete frame times are considered.
PPT Slide
Lager Image
Collision response of [6], [9] and [10].
In this paper, we also use the overlapped ratios for collision detections, but intra-frame positions are considered for collision responses as shown in Fig. 6 . To find the precise colliding time, we shift the positions of virtual objects at intervals of −Δ ( 0 < Δ < 1 ) from time t , and find the maximum n Δ colliding with bodies, where the colliding time is t n Δ . When the overlapped radio of the collision region between bodies and a virtual object is more than 0, the virtual object is determined as a collided object, and then the colliding time t n Δ is found. The motion vector of a collided object at time t +1 is calculated by
PPT Slide
Lager Image
The proposed collision response method.
PPT Slide
Lager Image
where v t+1 and v t are the motion vectors at time t +1 and t , respectively, and n tnΔ is the normal vector of the colliding plane estimated as the vector starting the centroid of the collision region to the center of the collided object. Finally, the position ( x , y ) of the collided object at time t +1 is calculated by
PPT Slide
Lager Image
where
PPT Slide
Lager Image
and
PPT Slide
Lager Image
are the x and y components of v t+1 .
6. Experimental Results
The proposed method was tested on a Pentium PC (Core TM 2 Duo, 2.40 GHz). The test images were acquired from a camera with the duration of 33ms. The resolution of the test image is 320×240 (8 bit grayscale).
In the experiments, we used τ = 45 for the background subtraction, a 7×15 cross as the structuring element for the dilation and erosion operations of S b2,t and S b5,t , and a 5×5 square for the opening operation of S b6,t . When different cameras are used, these parameters may be adjusted by various threshold methods and image resolution. Human bodies were correctly segmented in various environments as shown in Fig. 7 , and Fig. 8 shows the detection of the body according to the lighting conditions, while false segmentations were rarely observed as shown in Fig. 9 .
PPT Slide
Lager Image
Results of human bodies (1).
PPT Slide
Lager Image
Results of human bodies (2).
PPT Slide
Lager Image
Examples of false segmentations.
The false segmentations were not consecutively detected for several frames, so these are acceptable for games. In addition, players almost could not notice these errors because the games are played very quickly at 30 frames/second. To evaluate the performance of the proposed background updating, we tested an image sequence starting from a frame image in which a player exists as shown in Fig. 10 . After 60th frame, human bodies were correctly segmented, and gradient backgrounds were correctly updated after 150th frame. As a result, the background can be stably estimated after 5 seconds when a player dynamically moves.
PPT Slide
Lager Image
Human body detection and background estimation.
To compare our detection method with the methods using intensity backgrounds, we generated the background images updated by MoG as shown in Fig. 11 . In game environments, players act in the limited space, so many foreground regions should be exposed in the backgrounds updated by MoG. In addition, static background cannot be used for body segmentation, because the segmentation results are very sensitive to luminance changing and initial manual setup processing is needed. So, the false segmentations gradually increase as time goes by as shown in Fig. 12 . In the results segmented by low threshold levels, the regions having similar value with background pixels are not segmented, while false positive regions increase in the results with high threshold levels.
PPT Slide
Lager Image
Background images updated by MoG.
PPT Slide
Lager Image
Body segmentation results by the static background subtracion (left and right results are segmented by low and high threshod levels, respectively).
For objective comparison, we used some foreground ground truth images made manually as shown in Fig. 13 (a) , and compared with the results of background subtraction methods, static background and MoG background, by evaluating the mean squared error (MSE). As shown in Fig. 13 , the result of the proposed method is the most similar to the ground truth. The comparison results with some ground truth images and the average MSE comparison are shown in Fig. 14 and Table 1 , respectively, where the results of the proposed method have much smaller MSEs than other methods.
PPT Slide
Lager Image
A binary foreground ground truth image (a), the result of MoG background subtraction (b), the result of static background subtration (c), and the result of the proposed method (d).
PPT Slide
Lager Image
MSE comparison of the proposed method and two background subtraction methods.
Average MSE comparison with ground truth images.
PPT Slide
Lager Image
Average MSE comparison with ground truth images.
Fig. 15 shows that to what direction virtual object is bounced over discrete time ( t −1 , t , t +1 ). Collisions with virtual objects were responded to very accurately as shown in Fig. 15 , where we used Δ = 0.1. If the methods of [10 , 13] and [14] were used, the responding motion of the right-bottom image of Fig. 15 was in the right-top direction. However, it is not right situation when considering real physical collision. The wrong motion was corrected to the left-top direction by the proposed collision response considering intra-frame positions. That is, the proposed method more precisely reflects real physical collision phenomenon than other methods
PPT Slide
Lager Image
Results of collision detection and response.
The processing time of the proposed game, including rendering time, is about 26.79 ms, and that is an acceptable time for real-time games.
7. Conclusions
In this paper, we proposed a novel framework for camera-based games by human body segmentation and collision responses. Human bodies are segmented by subtraction of gradient backgrounds and operations based on mathematical morphology, so foreground detection is not sensitive to color variances. Since the collisions between human bodies and virtual objects are estimated by the intra-frame analysis, the collision responses are very accurate. Therefore, our framework can be used for markerless camera-based games requiring high perfor-mance. Our future work includes the human activity analysis using skeletons of human bodies and tracking to improve this proposed framework.
Acknowledgements
This work was supported by a grant from the Kyung Hee University in 2012 (KHU-20120577).
BIO
Jun-Geon Kim He received two B.S. degrees in Electronics & Radio engineering and computer engineering and is currently pursuing an M.S. degree in Electronics & Radio engineering at Kyung Hee University. His research interests include computer vision, pattern recognition, image processing, Electrical Impedance Tomography (EIT), 3D reconstruction and frame rate up-conversion.
Daeho Lee He received the M.S. and Ph.D. degrees in Electronics Engineering from Kyung Hee University, Seoul, Korea, in 2001 and 2005, respectively. He has been an Associate Professor in the Humanitas College at Kyung Hee University, Korea, since 2005. His research interests include computer vision, pattern recognition, image processing, computer games, ITS (intelligent transportation system), HCI (human computer interaction), EIT (electrical impedance tomography) analysis, image fusion, and digital signal processing.
References
Aggarwal J. K. , Ryoo M. S. 2011 “Human activity analysis: A review,” J. ACM Comp. Surv. 43 (3) 16:1 - 16:43
Barnich O. , Van Droogenbroeck M. 2011 “ViBe: A universal background subtraction algorithm for video sequences,” IEEE Trans. Image Process. 20 (6) 1709 - 1724    DOI : 10.1109/TIP.2010.2101613
Lee D. , Lee S.G. 2011 “Vision-based finger action recognition by angle detection and contour analysis,” ETRI J. 33 (3) 415 - 422    DOI : 10.4218/etrij.11.0110.0313
Zhang D. G. , Zhu Y. 2012 “A new constructing approach for a weighted topology of wireless sensor networks based on local-world theory for the Internet of Things (IOT),” Comput. Math. Appl. 64 (5) 1044 - 1055    DOI : 10.1016/j.camwa.2012.03.023
Zhang D. , Liang Y. 2013 “A kind of novel method of service-aware computing for uncertain mobile applications,” Math. Comput. Model. 57 (3-4) 344 - 356    DOI : 10.1016/j.mcm.2012.06.012
Zhang D. , Zheng K. , Zhang T. 2015 “A novel Multicast Routing Method with Minimum Transmission for WSN of Cloud Computing Service,” Soft Comput. 19 (7) 1817 - 1827    DOI : 10.1007/s00500-014-1366-x
Zhang D. , Zhang X. 2012 “Design and implementation of embedded un-interruptible power supply system (EUPSS) for web based mobile application,” Enterprise Inf. Syst. 6 (4) 473 - 489    DOI : 10.1080/17517575.2011.626872
Lee D. , Park Y. 2009 “Vision-based remote control system by motion detection and open finger counting,” IEEE Trans. Consumer Electron. 55 (4) 2308 - 2313    DOI : 10.1109/TCE.2009.5373803
Lee D. , Park K. , Park Y. 2010 “Collision detection and response method for markerless camera-based games using motion boundary estimation,” IEEE Trans.Consumer Electron. 56 (4) 2178 - 2184    DOI : 10.1109/TCE.2010.5681088
Lee D. , Lee Y.J. 2010 “Framework for vision-based sensory games using motion estimation and collision responses,” IEEE Trans. Consumer Electron. 56 (3) 1356 - 1362    DOI : 10.1109/TCE.2010.5606270
Lee Y. J. , Lee D. H. 2008 “Research on detecting face and hands for motion-based game using Web camera,” Proc. 2008 Int. Conf. Security Technology 7 - 12
Lee D. H. , Lee Y. J. 2008 “Sensing and motion control of virtual objects for Web camera-based game,” Proc. Second Int. Conf. Future Generation Communication and Networking 28 - 33
Lee D. , Lee Y. 2008 “Estimation of collision response of virtual objects to arbitrary-shaped real objects,” IEICE Electron. Express 5 (17) 678 - 682    DOI : 10.1587/elex.5.678
Lee D. , Lee S. G. , Kim W. M. , Lee Y. J. 2010 “Sphere-to-sphere collision estimation of virtual objects to arbitrarily-shaped real objects for augmented reality,” Electron. Lett. 46 (13) 915 - 916    DOI : 10.1049/el.2010.0471
Tang Z. , Miao Z. 2007 “Fast background subtraction and shadow elimination using improved Gaussian mixture model,” IEEE Workshop on Haptic Audio Visual Environments and the Applications 541 - 544
Zivkovic Z. , Heijden F. 2006 “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Patt. Recogn. Lett. 27 (7) 773 - 780    DOI : 10.1016/j.patrec.2005.11.005
Park J. , Yi J. 2006 “Gesture recognition based interactive boxing game,” Int. J. Inf. Technol. 12 (7) 36 - 44
Oda O. , Lister L. J. , White S. , Feiner S. 2008 “Developing an augmented reality racing game,” Proc. Int. Conf. Intelligent Technologies for Interactive Environment
Paolis L.T. , Aloisio G. , Pulimeno M. 2009 “A simulation of billiards game based on marker detection,” Proc. Int. Conf. Advances in Computer-Human Interactions 148 - 151
Rohs M. 2007 “Marker-based embodied interaction for handheld augmented reality games,” J. Virt. Real. Broadc. 4 (5)
Kolb A. , Barth E. , Koch R. 2008 “ToF-Sensors: New Dimensions for Realism and Interactivity,” Proc. IEEE Conf. Computer Vision Pattern Recognition
Leyvand T. , Meekhof C. , Wei Y.C. , Sun J. , Guo B. 2011 “Kinect identify: technology and experience,” Computer 44 (4) 94 - 96
Zhu J. , Wang L. , Yang R. , Davis J. 2008 “Fusion of time-of flight depth and stereo for high accuracy depth maps,” Proc. IEEE Conf. Computer Vision Pattern Recognition 1 - 8
Crabb R. , Tracey C. , Puranik A. , Davis J. 2008 “Real-time foreground segmentation via range and color imaging,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition Worksh. 1 - 5
Parvizi E. , Wu Q.M.J. 2008 “Multiple object tracking based on adaptive depth segmentation,” Proc. Canadian Conf. Computer and Robot Vision 273 - 277
Gonzalez-Scanchez T. , Puig D. 2011 “Real-time body gesture recognition using depth camera,” Electr. Lett. 47 (12) 697 - 698    DOI : 10.1049/el.2011.0967
Zhang D. G. , Kang X. J. 2012 “A novel image denoising method based on spherical coordinates system,” EURASIP J. Adv. Signal Process. 1 110 -
Zhang D. , Li G. , Zheng X. 2014 “An energy-balanced routing method based on forward-aware factor for wireless sensor network,” IEEE Trans. Ind. Informat. 10 (1) 766 - 773    DOI : 10.1109/TII.2013.2250910
Zhang D. , Wang X. , Song X. , Zhao D. 2014 “A novel approach to mapped correlation of ID for RFID anticollision,” IEEE Trans. Serv. Comput. 7 (4) 741 - 748
Zhang D. G. 2012 “A new approach and system for attentive mobile learning based on seamless migration,” Appl. Intell. 36 (1) 75 - 89    DOI : 10.1007/s10489-010-0245-0