Laser Spot Detection Using Robust Dictionary Construction and Update
Laser Spot Detection Using Robust Dictionary Construction and Update
Journal of information and communication convergence engineering. 2015. Mar, 13(1): 42-49
Copyright © 2015, The Korean Institute of Information and Commucation Engineering
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • Received : November 29, 2014
  • Accepted : January 20, 2015
  • Published : March 31, 2015
Export by style
Cited by
About the Authors
Zhihua, Wang
Yongri, Piao
Minglu, Jin

In laser pointer interaction systems, laser spot detection is one of the most important technologies, and most of the challenges in this area are related to the varying backgrounds, and the real-time performance of the interaction system. In this paper, we present a robust dictionary construction and update algorithm based on a sparse model of background subtraction. In order to control dynamic backgrounds, first, we determine whether there is a change in the backgrounds; if this is true, the new background can be directly added to the dictionary configurations; otherwise, we run an online cumulative average on the backgrounds to update the dictionary. The proposed dictionary construction and update algorithm for laser spot detection, is robust to the varying backgrounds and noises, and can be implemented in real time. A large number of experimental results have confirmed the superior performance of the proposed method in terms of the detection error and real-time implementation.
Recently, we have witnessed a growing interest in laser pointer interaction (LPI), which allows users to interact directly from a distance through a laser pointer. In laser pointer-based interaction systems, the captured laser spot is recognized and used for interactions by using various image processing techniques. The advantage of ensuring movement flexibility for users has led to the widespread use of this method for multimedia presentations [1 - 4] , robot navigation [5 - 7] , medical purposes [8] , virtual reality systems [9 , 10] , and smart houses [11] .
Recently, Kim et al. [2] summarized three fundamental problems with LPI: laser spot detection, interaction function, and coordinate mapping. In [11 - 13] , the researchers focused on the development of a laser spot detection algorithm that directly influences the performance of LPI systems. The most difficult challenges of laser spot detection are strong light environments, real-time implementation, and dynamic backgrounds. For example, the background information always changes when the speaker turns the slides in practical presentation cases.
To overcome the above mentioned problems, two types of algorithms, namely target search (TS) and background subtraction (BGS), have been developed to detect a laser spot. The TS method directly searches the laser spot without considering the background. Shin et al. [12] simply searches for pixels with maximum intensities to detect the location of the laser spot. Chávez et al. [11] used a combination of template matching and fuzzy rule-based systems to improve the success rate of laser spot detection. Geys and Van Gool [13] determined the laser spot by using clusters along with the fact that a group effect is caused on laser spots by hand jitters. However, the TS method fails because of the strong light environment and the appearance change of the moving laser spot. On the other hand, BGS covers a set of methods that aim to distinguish between the foreground and the background areas by utilizing a background model. The traditional models used to represent background include statistical models, neural networks, estimation models, and some recent models including fuzzy models, subspace models, transform domain models, and sparse models [14] . Among them, sparse models have been successfully applied in compressive sensing [15] . Cevher et al. [16] considered background subtraction as a sparse approximation problem and provided different solutions based on convex optimization. Hence, the background is learned and adapted in a low-dimensional compressed representation, which is sufficient to determine spatial innovations. Huang et al. [17] proposed a new learning algorithm called dynamic group sparsity (DGS). The idea is that the nonzero coefficients in the sparse data are often not random but tend to be a cluster such as those in the case of foreground detection. However, the dictionary of backgrounds is constructed simply by using video frames that make this model sensitive to noise and background changes. In order to solve the problem of background changes and outliers in training samples, Zhao et al. [18] formulated background modeling as a dictionary learning problem. However, the learning process is time consuming and needs all the background information, which makes it difficult to apply in practice. Therefore, to solve the problem discussed in [18] , we propose a novel robust algorithm for the construction and update of a dictionary for laser spot detection. Subsequently, the proposed model can control the varying backgrounds and the real-time performance.
The remainder of this paper is organized as follows: Section II briefly explains the proposed method of background modeling and foreground detection. In Section III, we show the experimental results in comparison with those of the existing methods, and some conclusions of the proposed method are presented in Section IV.
Suppose that we have an image Y of size n 1 × n 2 and we vectorize it into a column vector y of size n × 1 ( n = n 1 × n 2 ) by concatenating the individual column of Y in the order from first to last. We formulate the background subtraction as a linear decomposition problem, i.e., to find a background component yB and a foreground component yF that together constitute a given frame y :
PPT Slide
Lager Image
where yB and yF denote the column vectors of background a nd foreground, respectively.
- A. Sparse Representation
Suppose that we have K different backgrounds y B1 , y B2 ,..., yBK Rn ; then, we can build K configurations for dynamic backgrounds with each configuration standing for one background. Therefore, at a specific frame, the background yB can choose from one of these configurations. We define a new matrix D = [ d 1 , d 2 ,..., dK ] as the concatenation of all the configurations; here, di denotes the ith configuration. Then, we say that background yB has the linear representation yB = dixi , where xi denotes a coefficient representing the relationship between yB and di . Thus, the background can be modeled as a sparse linear combination of atoms from a dictionary D , each atom of which characterizes one of the configurations. Next, we rewrite yB in terms of D as follows:
PPT Slide
Lager Image
where x = [0,...,0, xi ,0,...,0] T denotes a sparse coefficient vector whose entries are ideally zeros except at positions associated with xi .
Zhao et al. [18] summarized two assumptions for this sparse model:
Assumption 1 . Background yB of a specific frame y has a sparse representation over a dictionary D .
Assumption 2 . The candidate foreground yF of a frame is sparse after background subtraction.
On the basis of these two assumptions, the BGS problem can be interpreted as follows: given a frame y , find a decomposition that has the sparse coded background yB = Dx and the sparse foreground yF = y - Dx :
PPT Slide
Lager Image
where ║ x 0 denotes the 0 -norm counting the number of nonzero elements of x , D indicates the dictionary capturing of all the background configurations, and λ represents the weighting parameter balancing between the two terms.
Since Eq. (3) is an NP-hard problem because of the non-convexity of 0 -norm, Zhao et al. [18] replaced 0 -norm with 1 -norm and obtained the 1 -measured and 1 -regularized convex optimization problem:
PPT Slide
Lager Image
Considering the LPI application, the foreground (laser spot) generally occupies a far smaller spatial area than the background. Therefore, we can simply treat the foreground as noises and obtain a Lasso problem:
PPT Slide
Lager Image
This problem can be easily and rapidly solved using least angle regression (LARS) [19] , and then, we can obtain the foreground using
PPT Slide
Lager Image
- B. Dictionary Construction
To make the sparse model robust against dynamic backgrounds, the dictionary must be able to represent all the backgrounds. Huang et al. [17] assumed that background subtraction has already been performed on the first K frames of the video sequences and let D = [y 1 ,y 2 ,...y K ] ∈ R n×K . It is noteworthy that this method is sensitive to noise and cannot be used in practice. Zhao et al. [18] collected all background training samples and developed a robust dictionary learning approach to construct the dictionary:
PPT Slide
Lager Image
However, in LPI applications, we are unable to collect a sufficient number of training samples. For example, we are unable to capture a large number of backgrounds in a presentation application since we do not know the information of the next slide until the user gives the ‘PageDown’ or ‘PageUp’ command. Besides, solving this optimization problem is time consuming and the solution is difficult to implement in real-time.
Since the use of video sequences as a dictionary is sensitive to noise, we use information from multiple frames for ensuring robustness. Therefore, the strategy is to apply an exponentially decaying weight to run an online cumulative average on the backgrounds:
PPT Slide
Lager Image
where α denotes the decay rate often chosen as a tradeoff between stability and quick update and K represents the parameter that controls the number of backgrounds. The advantage of this approach apart from its simplicity is that it can suppress noise and solve the problem low-frequency background changes to some extent. We assume that the background changes at a high frequency at the dictionary update stage but not the dictionary construction stage, which is often true in an LPI application.
- C. Dictionary Update
The dictionary needs to update quickly in order to handle the occurrence of a new background. Huang et al. [17] set a time window to update the dictionary. For frame t ,the dictionary is updated by D = [y t-K ,...,y t-2 ,y t-1 ]. However, this method is still sensitive to noise, which makes the model unstable. Zhao et al. [18] updated the dictionary D by solving the following optimization problem with the coefficients being updated and considered constant:
PPT Slide
Lager Image
Zhao et al. [18] assumed that the atoms in D are independent of each other and thus, updated each of them separately. However, solving this optimization problem is still time consuming.
Considering that when a new background occurs, the foreground yF solved by Eqs. (5) and (6) will not be a sparse result, we can figure out whether a new background occurs by setting a threshold for the 0 -norm of yF . Whenever a new background occurs, we add the new background configuration into the dictionary; otherwise, we directly update the dictionary by using Eq. (8). This method can be formulated as follows:
PPT Slide
Lager Image
Where Th can be set as the size of the laser spot.
The proposed strategy is made sensitive to changing backgrounds by adding new background configurations, and robust against noise by using the online cumulative average of the backgrounds. The proposed dictionary construction and update algorithm is summarized in Table 1 .
Description of the proposed dictionary construction and update algorithm
PPT Slide
Lager Image
Description of the proposed dictionary construction and update algorithm
To validate the ability of the proposed algorithm to handle the above mentioned high-frequency background changes and evaluate the algorithm’s real-time performance, in this section, we discuss two experiments of LPI. Through these experiments, we evaluated the performance of the proposed algorithm with the different parameters used in this algorithm, measured the detection error under dynamic backgrounds, and compared it with the running times of different algorithms as well.
- A. Laser Pointer-Operated Windows
A typical example of LPI in practice is the interactive demonstration of software with a computer whose screen content is sent to a video beamer by using a common laser pointer tracked by a video camera as an input device. Algorithms use the behavior of the laser spot to realize the functions of Button Press, Button Release, and Mouse Move. When Button Press is recognized, the corresponding file or dialog may show up, which leads to a background change immediately. We record three videos of the size 160×120, 320×240, and 640×480, respectively, to simulate this process on such a system.
In LPI, the laser spot cannot be static because of the hand jitter, thus instead of measuring the detection error compared with the ground truth, we validate it using the possibility of false detected frames as follows:
PPT Slide
Lager Image
The performance of the proposed algorithm is compared with that of two algorithms representing state-of-the-art sparse model approaches [17 , 18] . Notice that we use LARS [19] to solve Eq. (5) for all these methods in order to evaluate the dictionary construction and update approach. Fig. 1 illustrates some results of the abovementioned algorithms.
PPT Slide
Lager Image
Results on laser pointer-operated Windows. (a) Original image (size: 320×240). (b) Using video images as dictionary [17]. (c) Dictionary learning method [8]. (d) Proposed method.
Image sequences having a size of 320×240 are used to test how the parameters λ and α determine the detection performance. The detection errors of different parameter values are shown in Fig. 2 . As we can see from Fig. 2 , a larger weighting parameter λ is helpful for the detection since the sparsity of the background is the key assumption of the proposed algorithm. However, a considerably large λ value increases the reconstruction error, which leads to relatively low performance. Thus, the value of λ can be chosen from 5 to 10 in order to obtain good performance. The decay rate α is used against noises; a small α value is sensitive to noises, and a large one cannot adapt to a low frequency of background changes.
PPT Slide
Lager Image
Detection error with different parameters λ and α.
As can be observed in Fig. 2 , a moderate α value of 0.5 can lead to better performance. In our experiments, the weighting parameter was set at λ = 5 and the decay rate at α = 0.5 .
As the other parameter values used in these tests, we select K = 20 to build the dictionary and Th = 50 to control the sparsity of the laser spot. A standard PC with a 2.0-GHz Intel CPU processor and 3 GB of memory is used in our experiments. As can be seen from Fig. 1 , our algorithm can handle a situation that has dynamic backgrounds and is robust against noise. The final results of the detection error defined by Eq. (11) and the running time per frame are illustrated in Fig. 3 . As can be observed, our algorithm achieves detection errors that are as low as those of the dictionary learning approach and consumes as little time as the using video images as dictionary method. Notice that the detection error of the using video images as dictionary method [17] is considerably higher than that of our algorithm, and that dictionary learning [18] consumes a considerably large amount of time and thus, cannot be implemented in real time.
PPT Slide
Lager Image
Results on laser pointer-operated Windows. (a) Detection errors. (b) Running time.
- B. Multimedia Presentation
In a presentation application, we can use the laser pointer to change slides and draw lines. It should be noted that high-frequency changes are caused when the user changes the slides. Further, each slide may be totally different from the others. For this application, we manually change the slides to obtain dynamic backgrounds and use the above mentioned algorithms for the detection of the laser spot. The final results are shown in Figs. 4 and 5 .
PPT Slide
Lager Image
Results of multimedia presentation. (a) Original image (size: 320×240). (b) Using video images as dictionary [17]. (c) Dictionary learning method [18]. (d) Proposed method.
PPT Slide
Lager Image
Results of multimedia presentation. (a) Detection errors. (b) Running time.
From Figs. 4 and 5 , we can see that the proposed algorithm can achieve a lower detection error with a low time cost, which is similar to the results of the laser pointer-operated windows method. Thus, the proposed algorithm is robust against different scenarios with dynamic backgrounds. From Table 2 , we can see that the detection error when the image resolution 160×120 is the highest, while similarly low detection errors are obtained when the resolutions of 320×240 and 640×480 are used. However, the time cost of using the resolution of 640×480 is considerably higher than that of using the resolution of 320×240. Thus, we recommend the use of the 320×240 resolution in practice.
Performance comparison of different image resolutions
PPT Slide
Lager Image
Performance comparison of different image resolutions
In this paper, we focus on the laser spot detection algorithm and model it as a background subtraction problem. Further, we propose a robust dictionary construction and update algorithm based on the sparse model for laser spot detection. To test the performance of the proposed method, a large number of experiments are conducted from the perspectives of detection error and real-time performance. The experimental results confirm that the proposed method outperforms the existing methods with a lower detection error and better real-time performance when the background exhibits a high frequency of changes.
Finally, the proposed robust algorithm can also be applied to solve other practical problems, such as traffic monitoring [18] where the background switches among several configurations controlled by the status of traffic lights.
This work was supported by the Natural Science Foundation of China under Grant 61405022.
Zhihua Wang
received his B.S. in Electronic and Information Engineering from Dalian Maritime University, Dalian, China, in 2012. He is currently working toward his M.Sc. in Communication and Information System at Dalian University of Technology (DUT), Dalian, China. His research interests include computer vision and human–computer interactions.
Yongri Piao
Received his B.S. in Automation Engineering from Jilin University, China, in 2003, and his M.S. and Ph.D. in Information and Communication Engineering from Pukyong National University, Republic of Korea, in 2005 and 2008, respectively. From September 2008 to December 2011, he was Research Professor at the 3D Display Research Center of Kwangwoon University. Since March 2012, he has been an Associate Professor at the School of Information and Communication Engineering, Dalian University of Technology, Dalian, China. His research interests include optical imaging and 3D display, optical and digital encryptions, 3D pattern recognition and tracking, and 2D/3D image processing. He has more than 40 publications, including 20+ peer reviewed journal articles and 20+ conference proceedings.
Minglu Jin
is a professor at the School of Information and Communication Engineering, Dalian University of Technology, Dalian, China. He received his Ph.D. and M.Sc. degrees from Beihang University, Beijing, China, his B.Eng. degree from University of Science & Technology, Hefei, China. He was a visiting scholar at the Arimoto Lab, Osaka University, Osaka, Japan from 1987 to 1988. He was Research Fellow at the Radio & Broadcasting Research Lab, Electronics Telecommunications Research Institute (ETRI), Korea from 2001 to 2004. Professor Jin’s research interests are in the general areas of signal processing and communications systems. His specific current interests are cognitive radio, multiple-input and multiple-output (MIMO) radio antenna design, and wireless sensor networks.
Kirstein C. , Muller H. “Interaction with a projection screen using a camera-tracked laser pointer,” in Proceedings of Multimedia Modeling (MMM'98) Lausanne, Switzerland 1998 191 - 192
Kim N. W. , Lee S. J. , Lee B. G. , Lee J. J. “Vision based laser pointer interaction for flexible screens,” in Proceedings of the 12th International Conference on Human-Computer Interaction Beijing, China 2007 845 - 853
Zhang L. , Shi Y. , Chen B. “NALP: navigating assistant for large display presentation using laser pointer,” in Proceedings of the 1st International Conference on Advances in Computer-Human Interaction Sainte Luce, Martinique 2008 39 - 44
Widodo R. B. , Chen W. , Matsumaru T. “Interaction using the projector screen and spot-light from a laser pointer: handling some fundamentals requirements,” in Proceedings of SICE Annual Conference (SICE) Akita, Japan 2012 1392 - 1397
Shojaeipour S. , Haris S. M. , Shojaeipour A. , Shirvan R. K. , Zakaria M. K. 2010 “Robot path obstacle locator using webcam and laser emitter,” Physics Procedia 5 187 - 192    DOI : 10.1016/j.phpro.2010.08.136
Minato Y. , Tsujimura T. , Izumi K. “Sign-at-ease: robot navigation system operated by connoted shapes drawn with laser beam,” in Proceedings of SICE Annual Conference (SICE) Tokyo, Japan 2011 2158 - 2163
Shibata S. , Yamamoto T. , Jindai M. “Human-robot interface with instruction of neck movement using laser pointer,” in Proceedings of IEEE/SICE International Symposium on System Integration (SII) Kyoto, Japan 2011 1226 - 1231
Fukuda Y. , Kurihara Y. , Kobayashi K. , Watanabe K. “Development of electric wheelchair interface based on laser pointer,” in ICCAS-SICE International Joint Conference Fukuoka, Japan 2009 1148 - 1151
Kim N. W. , Lee H. “Developing of vision-based virtual combat simulator,” in Proceedings of International Conference on IT Convergence and Security (ICITCS) Macao, China 2013 1 - 4
Kim S. J. , Jang M. S. , Kuc T. Y. “An interactive user interface for computer-based education: the laser shot system,” in World Conference on Educational Multimedia, Hypermedia and Telecommunications Lugano, Switzerland 2004 4174 - 4178
Chávez F. , Fernández F. , Alcalá R. , Alcalá-Fdez J. , Olague G. , Herrera F. 2012 “Hybrid laser pointer detection algorithm based on template matching and fuzzy rule-based systems for domotic control in real home environments,” Applied Intelligence 36 (2) 407 - 423    DOI : 10.1007/s10489-010-0268-6
Shin J. , Kim S. , Yi S. “Development of multi-functional laser pointer mouse through image processing,” in Proceedings of International Conference on Multimedia, Computer Graphics and Broadcasting (MulGraB) Jeju, Korea 2011 290 - 298
Geys I. , Van Gool L. “Virtual post-its: visual label extraction, attachment, and tracking for teleconferencing,” in Proceedings of the 3rd International Conference on Computer Vision Systems(ICVS) Graz, Austria 2003 121 - 130
Bouwmans T. 2014 “Traditional and recent approaches in background modeling for foreground detection: an overview,” Computer Science Review 11-12 31 - 66    DOI : 10.1016/j.cosrev.2014.04.001
Candès E. J. , Wakin M. B. 2008 “An introduction to compressive sampling,” IEEE Signal Processing Magazine 25 (2) 21 - 30    DOI : 10.1109/MSP.2007.914731
Cevher V. , Sankaranarayanan A. , Duarte M. F. , Reddy D. , Baraniuk R. G. , Chellappa R. “Compressive sensing for background subtraction,” in Proceedings of the 10th European Conference on Computer Vision (ECCV) Marseille, France 2008 155 - 168
Huang J. , Huang X. , Metaxas D. “Learning with dynamic group sparsity,” in Proceedings of IEEE 12th International Conference on Computer Vision Kyoto, Japan 2009 64 - 71
Zhao C. , Wang X. , Cham W. K. 2011 “Background subtraction via robust dictionary learning,” EURASIP Journal on Image and Video Processing 2011 1 - 12    DOI : 10.1155/2011/972961
Efron B. , Hastie T. , Johnstone I. , Tibshirani R. 2004 “Least angle regression,” The Annals of Statistics 32 (2) 407 - 499    DOI : 10.1214/009053604000000067