Image processing and computer vision algorithms are gaining larger concern in a variety of application areas such as robotics and man-machine interaction. Vision allows the development of flexible, intelligent, and less intrusive approaches than most of the other sensor systems. In this work, we determine the location and orientation of a mobile robot which is crucial for performing its tasks. In order to be able to operate in real time there is a need to speed up different vision routines. Therefore, we present and evaluate a method for introducing parallelism into the multiple non-overlapping camera pose estimation algorithm proposed in
. In this algorithm the problem has been solved in real time using multiple non-overlapping cameras and the Extended Kalman Filter (EKF). Four cameras arranged in two back-to-back pairs are put on the platform of a moving robot. An important benefit of using multiple cameras for robot pose estimation is the capability of resolving vision uncertainties such as the bas-relief ambiguity. The proposed method is based on algorithmic skeletons for low, medium and high levels of parallelization. The analysis shows that the use of a multiprocessor system enhances the system performance by about 87%. In addition, the proposed design is scalable, which is necaccery in this application where the number of features changes repeatedly.
omputer vision is gaining a larger importance in a variety of applications such as: robotics, man-machine interaction, and 3D surgical operations. It not only allows a great deal of flexibility but also the lowest degree of invasiveness compared to the most of other sensor systems. In order to be able to operate in real time there is a need to speed up different vision routines
. Computer vision is considered a good candidate for the application of parallel processing because of the large volumes of data and the complex algorithms commonly encountered. Parallel image processing systems can be classified into three categories: computer-based dedicated systems, computer-based general systems and digital signal processing (DSP)-based systems. According to this classification, the analysis and comparison of many kinds of realization technologies and structure characteristics are carried out in
. For an autonomous mobile robot, the first step to do any worthwhile task is to determine its location and direction. This is the pose estimation which is a crucial problem lasting-for-decades in computer vision as well as in various other fields. Obtaining the robot ego-motion is one of its many facets. The application range extends from mixed reality in movies to activity recognition
, and guidance of bronchoscopic tracking
.The EKF has been used to solve the pose estimation problem in different settings. The two main aspects of this use are: firstly, the number and arrangement of cameras, and secondly, the number and usage of filters. For example, a single camera and one EKF for both pose and structure are used in
, and in
. Additionally, a single camera, one EKF for pose, and many EKFs for structure are used in
. Moreover, four cameras arranged in two back-to-back stereo pairs, and one EKF for pose are used in
while the structure is obtained on-demand using triangulation. On the other hand, two cameras arranged as a one stereo pair, and one EKF for pose are used in
without solving for structure. An important benefit of using multiple cameras for robot pose estimation is resolving the bas-relief ambiguity
, the multiple cameras are dealt with as a single generalized camera. Multiple cameras are used in
mainly as fixed cameras to estimate the pose of an object with a known CAD model. To sum up, our motivation is to make use of the parallel processing to estimate the pose of a moving robot within an unknown indoor scene in real time. In particular, we speed up the multiple non-overlapping camera implementation in
. The analysis shows that the use of a multiprocessor system enhances the system performance by about 87%. In addition, the proposed design is scalable (that is to say, increasing the number of cameras and the number of processors enhances the system performance), which is an important factor in this application where the number of features changes repeatedly.This paper is organized as follows: in section 2, a background of pose estimation problem and its implementation in multiprocessor systems is given. Then, the proposed method is detailed in section 3. In section 4, a performance evaluation of the proposed method is given. Finally, the paper is concluded in section 5.
2. Background and Related Work
- 2.1 Robot Navigation and Path Planning
For any mobile platform (a robot or a vehicle), the ability to navigate in its environment is one of the most important capabilities at all. Robot navigation means its ability to determine its own position with respect to a frame of reference and then to plan a path towards some goal location. Mobile robot path planning in an unknown environment is a major problem in robot navigation. This process is crucial because it offers the robot the flexibility of reaching a point securely without any assumed prior knowledge about the scene. In the literature, mobile robot path planning problems are solved in various ways. For example, ultrasonic sensors are used in
, and laser range finders and CMOS coupled devices (CCD) arrays are used in
. These schemes allow a robot to use the sensors to map out its surrounding environment as a grid of empty (no obstacle), occupied (obstacle) or unknown locations (out of sensor's area). When a robot is introduced into a new environment, it takes the readings of all the available sensors. Through accumulating these readings, several points on the grid can be labeled as empty or occupied. Accordingly, the robot travels through the empty space. This process is repeated at each motion step of the mobile robot until it arrives at its destination.
- 2.2 Pose Estimation of a Multiple Camera System
Multiple cameras are increasingly prevalent on robots and vehicles. These cameras come in a variety of models such as: wide-angle, fish-eye, and catadioptric. Furthermore, although odometry is generally available on the vehicles, it may be inaccurate due to wheel slipping. However, vision applications may use wheel odometry as a strong prior for camera pose estimation, and in this case, an accurate camera calibration is required
. Many researchers try to solve the pose estimation problem of a calibrated multi-camera system
. Heng et. al., proposed an easy-to-use automated pipeline that handles both intrinsic and extrinsic calibration
. They run an automatic intrinsic calibration for each generic camera using a checkerboard. Afterward, they run an extrinsic calibration to obtain all camera-odometry transforms. The extrinsic calibration is unsupervised, using natural features, and only requiring the vehicle to be driven around for a short time. The intrinsic parameters are optimized in a final bundle adjustment step while performing the extrinsic calibration. In addition, the pipeline produces a globally-consistent sparse map of landmarks which can be used for visual localization. Lee et. al., in
suggest a model for solving the pose estimation problem of a calibrated multi-camera system. They assume that the non-central rays passing through the 3D world points and the multi-camera system can be represented as Plucker lines. They show that the minimal solution for the depth of the points along the Plucker lines is an eight degree polynomial that gives up to eight real solutions. The coordinates of the 3D world points in the multi-camera frame are computed from the known depths. In addition, the pose of the multi-camera system can be obtained from absolute orientation. Moreover, the solution can be refined by including all the inlier correspondences in a non-linear refinement step. In order to construct a real-time system, it is essential to aim for high-performance by minimizing the computational latency. Unfortunately, tracking objects is quite complex and requires a vast amount of computation. However, various techniques have been proposed in an effort to reduce the computation. One such technique verifies and updates information about the positions of the selected small windows, or features, of an object. Selection of such windows is generally determined using a priori basis for interesting features. It is necessary to carefully select the points where motion information can be extracted. The basis for the selection of a point could be tracking of corners, windows with high spatial frequency content, or regions with particular brightness patterns
. Tomasi and Kanade in their work
derive a criterion for feature selection based on large contrasts of intensity. The Kanade-Lucas-Tomasi (KLT) feature selection and tracking algorithm, commonly accepted within the vision community, is chosen for this work compared to other feature detectors, it verifies an adequate quality without too much computational demand. Ragab et. al., in
solve the problem of pose estimation in real time using multiple non-overlapping cameras and the Extended Kalman Filter (EKF). Four cameras arranged in two back-to-back pairs are put on the platform of a moving robot. The axes passing through the camera centers of each pair are perpendicular. Because each camera has its individual EKF for the pose estimation, this model is suitable for parallel processing.
- 2.3 Parallel Feature Selection and Tracking
Feature selection and tracking are two of the essential problems in computer vision. The selection of features can be based on intuitive descriptions of feature quality. The feature tracker presented by Kanade, Lucas, and Tomasi in
has approached the selection of features in a way that is optimal by construction with respect to the accompanying tracking algorithm. That is to say, the KLT method has been successfully used for feature tracking as long as the baselines between successive frames are short. Feature selection involves evaluating every window in the image frame for a being textured by computing the eigenvalues of the 2×2 gradient matrix. Because the computation is completely independent for each window being examined, this is a natural candidate for acceleration using parallelism. Moreover, tracking in the KLT algorithm is accomplished by minimizing a dissimilarity measure between feature windows which is another source of potential parallelism
. Klein and Murray in
develop a system for Visual Simultaneous Localization and Mapping (VSLAM) called Parallel Tracking and Mapping (PTAM). In PTAM, online, real-time camera pose estimation (tracking) and structure estimation (mapping) are separated into two threads. There has been also much effort to implement the KLT on Graphic Processing Units (GPUs) to increase the speed especially with large frames and more features to track. Kim et. al., in
implement the KLT using an affine-photometric model on GPUs, albeit with a hight computational demand. Fassold et. al., in
reporte work done on carrying out the KLT using one GPU for each feature point. Their GPU implementations achieves real-time performance (> 25 frames per second) for High Definition (HD) video sequences and successfully track several thousands of points. Sinha et. al., in
combined the implementations of the KLT and Scale Invariant Feature Tracking (SIFT) feature extraction algorithms on GPUs which is suitable for video analysis in real-time systems. The GPU-based SIFT implementation extracts about 800 features from 640 × 480 video at ten frames per second which is approximately ten times faster than an optimized CPU implementation. In this work, we focus on using CPUs. The reason for this is the need for their general programing capabilities to deal with the problem at hand.
3. The Parallel Implementation of Multiple Non-overlapping Camera Pose Estimation
In this section we propose a new method for the parallel implementation of the pose estimation algorithm proposed in
. The algorithm therein presents the main problems of an autonomous mobile robot platform, which uses digital images for extracting information from the environment for visual navigation. In this algorithm, the pose estimation problem has been solved in real time using multiple non-overlapping cameras and the EKF. Four cameras arranged in two back-to-back pairs are put on the platform of a moving robot without an overlapping field of view. The two axes passing through the camera centers of each pair are perpendicular as shown in
The multiple non-overlapping camera model 
Each camera has its individual EKF for the pose estimation, therefore this model is suitable for the parallel processing on both levels of feature tracking and pose estimation. To solve this problem in a parallel environment, initially each camera (which represented by a task T
) is assigned to one processor. During run-time, each processor can estimate the time needed for its tasks, and hence load sharing is possible. Since each task has a sequence of calculations to be performed, these calculations can be broken into sets of computations (subtasks) that can be executed by more than one processor. At the end of each motion step, processors may need to communicate the accumulated cameras' fillter outputs to make the motion decision. This decision must be done by only one processor. The amount of data to be transferred between processors, which increases the communication overhead, is changing based on the configuration area (image size
), the number of processors (
) and the length of motion sequence. The amount of data to be transferred will increase with the size of the configuration area. This process is repeated at each frame of the sequence until the robot reaches its destination. From the parallel processing point of view, this application is considered to be a divisible workload consisting of independent tasks of different sizes. The tasks' execution times are evaluated during run-time.
represents the steps of the algorithm, where each vertex represents an individual task, while the edges are the dependencies among them (the communication between two adjacent tasks). The task T
represents the initialization process, tasks Ti's (1≤ i ≤4) represent the denoted camera pose estimation, and task T
represents the process of optimizing decision making based on the median filter and the invariant physical dimensions of the baselines between the cameras.
represents the details of task T
which describes the process of each camera (cam k). Based on the time needed to calculate these tasks they are assigned priorities and aranged in the in descending order (the camera which needs the largest execution time takes the highest priority). For "
" processors there are three cases: (
≤ 4), (4 <
≤ 16) and (
> 16) which will be explained below.
System steps (these steps must be done for each camera, cam k) where (1≤k≤4)), NF: number of features, and Min is the minimum number of features
- 3.1 First Case (M ≤ 4)
When the number of processors "
" equals four (the number of cameras), each processor is sequentially assigned a camera from the list. In case of (
< 4), each processor is assigned
task(s) from the list (where ⎿ ⏌ denotes the floor function), and the remaining
tasks are assigned to the lightly loaded processor. As shown in
must be executed sequentially.
- 3.2 Second Case (4 < M ≤ 16)
In this case, we assume that the configuration area (image size
) can be divided into four quarters (Q
, and Q
), and each task T
can be divided into four subtasks as shown in
. That is to say, each task T
is assigned to
processor(s) and the remaining
processors help the overloaded ones. These subtasks can be computed in parallel. To calculate the number of subtasks assigned to each processor, the following assumptions are made: First the number of processors assigned to the
" is "
". Secondly, each subtask has its different execution time.
Four quarter representation. Tkl 1≤ (k,l )≤4, where k:denotes the camera, l: denotes the quarter, Tk5 is the arbiter of cam k, and T5 is the general arbiter
< 16), and
is less than four, each processor is assigned
subtask(s) and the remaining
subtasks are assigned to the lightly loaded processors. Finally, when (
= 16), each processor is assigned one subtask.
- 3.3 Third Case 3: M >16
Further enhancement is obtained when the number of processors is greater than 16. In this case, more than one processor can cooperate to execute task T
for each quarter. As shown in
each task T
can be divided into small tasks called subtasks T
, …., and T
. Each quarter can behave as a single camera as shown in
. This figures show that the most consuming time functions are: the feature extraction and the feature tracking, That is to say, more than one processor can cooperate to execute each quarter of each camera. In this case, the feature extraction and the feature tracking take the major amount of computation and can be executed in parallel after being divided into small subtasks. Finally, only one processor collects the data and computes the last task sequentially. This sequence must be repeated for all quarters belonging to each camera.
Detailed four quarters representation. Each of the four branches has the same subtasks of Figure 3, albeit on the quarter level
Execution time, which is refers to the total running time of the program, is the most obvious way of describing the performance of parallel programs. In fact, reducing this time is the aim of parallel processing. In the next sub-section, the execution time is calculated for different levels. Initially, for a frame within quqrter, then for a quarter, furthermore for a camera, and finally for the whole system.
- 3.4 Determining Parallel Execution Time
Assuming that T
is the parallel execution time, "T
" is the time needed to compute frame number "
" in parallel and 1 ≤
≤ length of sequence (LoS). Assume that for frame number "
" the total number of features is "
", the minimum number of features for quarter number "
" is "MinF
", and the number of processors need to execute each frame in quarter number "
" is "
". Then, the execution time of quarter "
" of camera "cam k" is given by:
There are two cases:
Using equations 2 and 3, we can calculate the time needed to compute frame "
" inside quarter "
", and T
is the communication time.
▪ Case 1: mod= 0 , NPl=(where mod is the mod function)
▪ Case 2: mod≠ 0 , NPl=(where ⎾ ⏋ denotes the ceil function)
Furthermore, using equations (1) and (2), the total time needed to execute camera "cam k" is given by:
Finally the total execution time needed for the whole system is given by:
is the communication time, and T
is the time needed for general arbiter. In the next section, we will evaluate the proposed method and show the experimental results.
4. Experimental Results and Evaluation
In order to evaluate the system, a publicly available library written in "C"
is used. In particular, the library is used to assess the feature selection and tracking algorithm, KLT. In this work, we assume that each processor's speed is 2.5 Ghz, its memory "
" equals 1Gbits, and the bandwidth "
" is 4.2Gbps.
shows the analysis of the proposed technique with different number of processors (from 2 to 48).
(a) illustrates the total parallel execution time. In addition,
(b) represents the speedup "
", which is the ratio between the sequential time and its corresponding parallel time. Moreover,
(c) illustrates the system efficiency "
" which equals
(d) represents the degree of system improvement with respect to the sequential version. This is the percentage of improvement in the system performance with respect to the sequential execution is defined by (
The system performance: Execution time, speedup, efficiency and the improvement degree
From the above figures, we note that:
▪Figure 6(a) summarizes the total parallel execution time for different number of processors (from 2 to 48). It is shown that as the number of processors increases, the execution time decreases. The reduction is approximately 50% as the number of processors increases from one to two, while the total execution time decreases by about 85% when the number of processors equals 16. When increasing the number of processors from 16 to 24 the execution time decreases to 13% of the sequential time.
▪ Upon increasing the number of processors, the speedup increases, and consequently the efficiency decreases as illustrated inFigure 6(b), (c). On the other hand,Figure 6(d) shows the degree of improvement, compared to the sequential performance forM=2,4,6,8,10,14,16,24 and 48 the degree of improvement is 48.7%, 71.6%, 74.8%, 80.7%, 80.9%, 84.86, 85%,, 87.5% and 90.5% respectively.
▪ Increasing the number of processors reduces the total execution time but correspondingly increases the communication overhead and reduces the system efficiency. Using 24 processors leads to a reduction of about 87.5% of the processing time with efficiency equals to 33%. When increasing the number of processors more than 24, they need to exchange more data between them which could lead to the increase of the communication overhead and decrease the system efficiency. Therefore, using 24 processors are sufficient to compromise between system performance and utilization.
▪ The analysis shows that the use of a multiprocessors system enhances the system performance. In addition, the proposed design is scalable (increasing the number of cameras and the number of processors improves the system performance), this is an important factor in this application where the number of features seen by each camera changes repeatedly.
In the present paper, we have solved the robot pose estimation problem using parallel implementation on three levels. The first level of parallelization is the coarse grained level where the parallelization is done in the camera level. On the other hand, in the second level (medium level) of parallelization more than one processor cooperate to compute the work of each camera. Moreover, in the third level (fine grained level) the parallelization is done in the loop/instruction level, where feature extraction and feature tracking tasks are divided into small subtasks and executed in parallel. The analysis shows that the use of a multiprocessor system enhances the system performance. It is obvious that parallel implementation of the pose estimation approach proposed in
reduces its computation time by "87%" compared to the non-parallel implementations. In addition, the proposed design is scalable, which is neccesary for this application where the number of features changes repeatedly. The aforementioned results are a strong stimuls for our future hardware implementation of the system.
Mohammad Ehab Ragab is an Assistant Professor at the Electronics Research Institute, Cairo, Egypt. He received the B.Sc., M.Sc., degrees from Ain Shams University, and Cairo University, Cairo, Egypt, respectively. Then, he obtained his Ph.D. from the Chinese University of Hong Kong . His research interests include: Computer vision, Robotics and Image Processing.
Ghada Farouk ElKabbany is an Assistant Professor at the Electronics Research Institute, Cairo- Egypt. She received her B.Sc. degree, M.Sc. degree and Ph.D. degree in Electronics and Communications Engineering from the Faculty of Engineering, Cairo University, Egypt. Her research interests include High Performance Computing (HPC), Robotics, and Network Security.
Ragab M. E.
Wong K. H.
“Multiple Non-overlapping Camera Pose Estimation,”
in Proc. of the 17th IEEE Int. Conf. on Image Processing (ICIP)
Sept. 26-29, 2010
“Tutorial in Data Parallel Image Processing,”
Australian Journal of Intelligent Information Processing Systems (AJIIPS)
“Research on the Architectures of Parallel Image Processing Systems,”
in Proc. of the 2nd Int. Symposium on Intelligent Information Technology Application (IITA '08)
20-22 Dec., 2008
Bachelor of Science Summa Cum Laude thesis
"Parallel Image processing and Computer Vision Architecture"
Univ. of Florida
Bachelor of Science Summa Cum Laude thesis
“Dynamical motion vocabularies for kinematic tracking and activity recognition,”
in Proc. of CVPRW’06. IEEE, vol. I
Merritt S. A.
“Real-time image-based guidance method for lung-cancer assessment,”
CVPR’06. IEEE, vol. II
Broida T. J.
“Recursive 3-D Motion Estimation from a Monocular Image Sequence,”
IEEE Trans. Aerospace and Electronic Systems
DOI : 10.1109/7.55557
“Recursive estimation of motion, structure, and focal length,”
IEEE Trans. on PAMI
DOI : 10.1109/34.387503
“Structure from Motion Causally Integrated Over Time,”
DOI : 10.1109/34.993559
Yu Y. K.
“Recursive Three-Dimensional Model Reconstruction Based on Kalman Filtering,”
IEEE Trans. SMC-B
"EKF Based Pose Estimation using Two Back-to-Back Stereo Pairs,”
in Proc. of ICIP’07. IEEE, vol. VI
“Recursive recovery of position and orientation from stereo image sequences without three-dimentional structure,”
in Proc. of CVPR'06, vol. 1
“A spherical eye from multiple cameras (makes better models of the world),”
in Proc. of CVPR, vol.1
“Pose Estimation for Multiple Camera Systems,”
in Proc.of ICPR, vol. 3
“A general imaging model and a method for finding its parameters,”
in Proc.of ICCV, vol. 2
“Using many cameras as one,”
in Proc. of CVPR , vol. 2
Schurman D. C.
Capson D. W.
“Direct Visual Servoing Using Network-synchronized Cameras and Kalman Filter,”
in Proc. of ICRA, vol. 4
“Position and Orientation Estimation Based on Kalman Filtering of Stereo Images,”
in Proc. of CCA
“Resource Scheduling and Load Balancing in Distributed Robotic Control Systems,”
Robotics and Autonomous Systems
DOI : 10.1016/S0921-8890(03)00075-7
“On-line Navigation of Mobile Robot Among Moving Obstacles Using Ultrasonic Sensors,”
Lecture Notes in Computer Science, Robot Soccer World Cup V
“Calibration of Non-Overlapping Cameras- Application to Vision-Based Robotics,”
“Computer Vision Based Mobile Robot Navigation in Unknown Environments,”
Bulletin of the Transilvania University of Brasov, Series I: Engineering Sciences
“Autonomous Vehicle Navigation Using Vision and Mapless Strategies: A Survey,”
Hindawi Publishing Corporation Advances in Mechanical Engineering
DOI : 10.1155/2013/697415
“Visual Navigation for Mobile Robots: a Survey,”
Journal of Intelligent and Robotic Systems
DOI : 10.1007/s10846-008-9235-4
DeSouza Guilherme N.
Kak Avinash C.
“Vision for Mobile Robot Navigation: A Survey,”
IEEE Transaction on Pattern Analysis and Machine Intelligence
DOI : 10.1109/34.982903
“CamOdoCal: Automatic Intrinsic and Extrinsic Calibration of a Rig with Multiple Generic Cameras and Odometry,”
in Proc. of IEEE/RSJ Int. Conf.of Intelligent Robots and Systems
July 18, 2013
Lee Gim H.
“Minimal Solutions for Pose Estimation of a Multi-Camera System,”
in Proc. of Int. Symposium on Robotics Research (ISRR)
“A Multiple-Camera System Calibration Toolbox Using a Feature Descriptor-Based Calibration Pattern,”
in Proc. of EEE/RSJ Int. Conf. on Intelligent Robots and Systems
“Profiling Accuracy-Latency Characteristics of Collaborative Object Tracking Applications,”
Journal of VLSI Signal Processing Systems
DOI : 10.1007/s11265-005-4162-0
“An iterative image registration technique with an application to stereo vision,”
in Proc. of DARPA Image Understanding Workshop
“Good features to track,”
in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition CVPR’94
McCracken Michael O.
“Evaluating Performance of Two Implementations of the Shi & Tomasi Feature Tracker,”
Department of Computer Science and Engineering, University of California
“Parallel tracking and mapping for small AR workspaces,”
in Proc. of IEEE and ACM Int. Symposium on Mixed and Augmented Reality
Washington, DC, USA
“Realtime Affine-photometric KLT Feature Tracker on GPU in CUDA Framework,”
in Proc. of 12th Int. IEEE Conf./ Workshops (ICCV Workshops), on Computer Vision
Sept. 27 - Oct. 4 2009
“Realtime KLT Feature Point Tracking for High Definition Video,”VaclavSkala and Dietmar Hildebrand, editors
GraVisMa 2009- Computer Graphics, Vision and Mathematics for Scientific Computing
“Feature tracking and matching in video using programmable graphics hardware,”
Machine Vision and Applications
DOI : 10.1007/s00138-007-0105-z