High Speed SD-OCT System Using GPU Accelerated Mode for <italic>in vivo</italic> Human Eye Imaging
High Speed SD-OCT System Using GPU Accelerated Mode for in vivo Human Eye Imaging
Journal of the Optical Society of Korea. 2013. Feb, 17(1): 68-72
Copyright ©2013, Optical Society of Korea
  • Received : November 01, 2012
  • Accepted : January 28, 2013
  • Published : February 25, 2013
Export by style
Cited by
About the Authors
Nam Hyun Cho
School of Electrical Engineering and Computer Science, Kyungpook National University, 1370 Sangyeok 3-dong, Buk-gu, Daegu 702-701, Korea
Unsang Jung
School of Electrical Engineering and Computer Science, Kyungpook National University, 1370 Sangyeok 3-dong, Buk-gu, Daegu 702-701, Korea
Suhwan Kim
School of Electrical Engineering and Computer Science, Kyungpook National University, 1370 Sangyeok 3-dong, Buk-gu, Daegu 702-701, Korea
Woonggyu Jung
School of Nano-Bioscience and Chemical Engineering, Ulsan National Institute of Science and Technology, 100 Banyeon-ri, Eonyang-eup, Ulju-gun, Ulsan 689-798, Korea
Junghwan Oh
Department of Biomedical Engineering, Pukyong National University, Pusan 608-737, Korea
Hyun Wook Kang
Department of Biomedical Engineering, Pukyong National University, Pusan 608-737, Korea
Jeehyun Kim
School of Electrical Engineering and Computer Science, Kyungpook National University, 1370 Sangyeok 3-dong, Buk-gu, Daegu 702-701, Korea

We developed an SD-OCT (Spectral Domain-Optical Coherence Tomography) system which uses a GPU (Graphics Processing Unit) for processing. The image size from the SD-OCT system is 1024 × 512 and the speed is 110 frame/sec in real-time. K-domain linearization, FFT (Fast Fourier Transform), and log scaling were included in the GPU processing. The signal processing speed was about 62 ms using a CPU (Central Processing Unit) and 1.6 ms using a GPU, which is 39 times faster. We performed an in-vivo retinal scan, and reconstructed a 3D visualization based on C-scan images. As a result, there were minimal motion artifacts and we confirmed that tomograms of blood vessels, the optic nerve, and the optic disk are clearly identified. According to the results of this study, this SD-OCT can be applied to real-time 3D display technology, particularly auxiliary instruments for eye operations in ophthalmology.
OCTGPUCUDARetinaOptic disk
Optical Coherence Tomography (OCT) is an imaging technology which is based on a light source of low coherence length. High resolution images (1 ~ 15 μm) of cross sections from living samples can be acquired by using OCT technology, noninvasively and in real-time [1 - 3] . OCT has been applied to early diagnosis instruments for cancer and various diseases. A lot of research is being done into applying OCT to ophthalmology, dermatology, internal medicine, dentistry and gynecology [4 - 9] . Due to technology development using GPU, much research into the application of OCT to auxiliary instruments for operations is now being carried out. The GPU has more ALU (Arithmetic and Logic Unit) than a CPU (Central Processing Unit), so its processing speed is high whereas its cost is lower. Because of those merits and the need for more computation power, there are many studies about OCT using the GPU in the imaging field [10 , 11] .
In this study, we first compared the processing speed using the GPU and the CPU in each step for applying to real-time auxiliary instruments for operations in ophthalmology. Secondly, to measure the processing performance of a GPU, the processing speeds for different image sizes were compared. Thirdly, we measured and compared the data acquisition time for different image sizes. Last, real-time display speeds were compared using a 1024 × 512 image with each processing unit. Because there was much movement in the ophthalmologic operation, we took 2D images of the retina for confirmation of availability to the operation auxiliary instrument. We performed an experiment that confirms the effect of motion artifacts by reconstructing a 3D optic disk from the 2D image. As a result, each cross section and blood vessel, and the optic nerves are clearly observed.
- 2.1. SD-OCT Hardware System
A schematic diagram of the GPU accelerated mode realtime display SD-OCT system is shown in Fig. 1 . (A) A 12-bit CMOS line scanning camera (Sprint spL2048-140 k, Basler AG) with 70,000 line/sec effective line rate at 2,048 pixel mode was used as the detector of the GPU accelerated mode real-time display SD-OCT system. The transmission type diffraction grating (Spatial Frequency 1,800 lpmm, Nominal AOI/AOD 46.05 Degrees, Wasatch Photonics) was adapted to enhance light efficiency in the detection path. Combined with a superluminescence diode (SLED) (λ 0 = 850 nm, Δλ = 55 nm, Exalos AG) as a light source, a fiberbased interferometer was implemented. The light source was split into sample and reference arms terminated by a stationary mirror. The reference arm for retinal imaging also contained a dispersion compensation unit (prism pair)
Lager Image
GPU accelerated mode real-time display SD-OCT system. (a) Schematic diagram of the GPU accelerated mode real-time display SD-OCT system: FC, Fiber Coupler; PC, Polarization Controllers; CL, Collimator; FL, Focusing Lens; M, Mirror; DC, Dispersion Compensation(Prism pair); GS, X․Y Galvanometer Scanner; SL, Scan Lens; OL, Ocular Lens; DG, Diffraction Grating; LSC, Line Scan Camera; CPU, Central Processing Unit; GPU, Graphics Processing Unit. (b) 3D CAD design of the GPU accelerated mode real-time display SD-OCT system.
to account for the dispersion within the optics of the human eye and the sample arm. A probe at the end of the sample arm delivered light to a sample and collected backscattered light from different depths in the sample. B-mode scanning was performed using a galvanometer scanner (GVS002, Thorlabs) at the back focal plane of the objective lens at the sample arm. The developed system has axial and lateral resolutions of 6 μm and 15 μm, respectively. The detected OCT signals were transferred to a host memory in the computer mounted with four CPUs (Core 2 Quad Q8200, 2.33 GHz Clock rate, Intel) through a frame grabber (PCIe-1433, 850 MB/s Bandwidth, National Instruments). The galvanometer scanner was driven by the computer with a data acquisition board (PCIe-6321, National Instruments). The computer also contained a graphics card (GeForce GTX 480, 700 MHz Clock rate, 480 CUDA processor, NVIDIA). Fig. 1 (b) is a schematic of the developed system rendered using a 3D CAD (Computer Aided Design) tool (Solid works 2012). The ratio of the schematic is 1:1 to the real size of the optics, so we can shorten the fabrication time of the OCT system by designing it, first with the CAD program. ① is the spectrometer part of the system, and it shows the beam path. To make the beam size large for better resolution, we used a fiber adaptor lens (LA1608, Thorlabs) for a collimated beam. So, we could acquire the highest resolution using a beam size of 15 mm. We used a flip mirror for comfortable path length adjustment when we image the retina and cornea, which is shown in ②. By using the flip mirror, we could shorten the experimental time. ③ is the light source which is mounted on breadboard for miniaturizing the system.
- 2.2. GPU Accelerated Mode SD-OCT System Software System and Performance
In order to apply GPU technology to OCT signal processing, an optimum combination of CPU and GPU is designed [12] . Two separate buffers are allocated in the host memory of the CPU and they are assigned as one CPU thread. These buffers are mainly dedicated for temporal data storage right after data acquisition. Due to the limited data transfer channel between the CPU and GPU careful data handling was necessary to minimize any bottleneck events during the data transfer. We report a significant decrease in the processing time when we use host memory to allocate the acquired data. The conventional method utilizing the memory in the frame grabber requires 16 ms of processing for 2,048 × 512 pixels of data compared to 8.3 ms using the proposed two separate buffer method. As a device memory in the GPU, the signal processing job is divided into 480 CUDA processors.
Figure 2 displays the data flow chart for the GPU accelerated mode SD-OCT system including the flow of data path, thread events, and the buffer ring. First the data acquisition thread stores incoming two-dimensional signals into the first buffer allocated in the host memory and calls a signal processing thread. Later, the self-iterated acquisition thread continuously transfers the incoming signals to the second buffer without any temporal delay between the
Lager Image
Architecture of the SD-OCT system with signal processing part implemented in a GPU.
Lager Image
SD-OCT Software based on Visual C++ 2008. (a) System control panel. (b) FFT spectrum part. (c) Raw data part. (d) OCT image display part.
Lager Image
Comparison frame rate test of different SD-OCT signal processing methods. (a) Comparison in performance between method that use GPU and CPU. (b) Comparison by the size of the image.
acquisition events. The signal processing thread copies the frame data stored in the buffers of the host memory through the PCI express ×16 2.0 interface into the device memory. Later, the processing divides 480 CUDA subprocessors to further process the signal for OCT. K-domain linearization is completed using the full-range k-domain linearization [13 , 14] . By removing the background noise, the speckle of the source can be minimized and the images with high resolution can be obtained as well [2 , 15] . The background noise removal method removes the noise by comparing the difference between acquired B-scan data of the existing OCT system signal with B-scan data of the changed OCT system and by subtracting the difference. The reconstructed OCT images are transferred back to the host memory to be real-time displayed.
Figure 3 is the UI (User Interface) of the developed software. We made a driving program based on the MFC (Microsoft Foundation Class) of Visual C++ 2008. The CUDA toolkit version 3.2 was used. ① is the system configuration part which can control the program driving status (on or off), saving mode of the data, brightness, contrast, scan mode, scan range, image size and graph of spectrum output. ② is the FFT spectrum output for analyzing the A-scan profile, and ③ is the raw spectrum for confirming the system status. Last, ④ is the real-time image part of OCT to show the output image to users.
- 2.3. Results
Figure 4 is measured performance characteristics according to the image size. Fig. 4 (a) is the performance comparison of the system when it uses the CPU and GPU for signal processing. When we use the GPU accelerated mode, the image output speed was 110 frame/sec. On the other hand, in the case of using the multi-core system, the image output speed was 16 frame/sec. We confirmed that the image output speed was improved by 6.9 times. Video output speed is lower than the signal processing speed because the speed of data acquisition is fixed at 8.3 ms. It is affected by the performance of the line scan camera, so we can improve this problem by using a faster camera. Fig. 4 (b) is the speed comparison data in GPU processing according to the image size. When the image size is 1024 × 128, the output speed reaches up to 336 frame/sec. As the image size decreased, the output speed may be improved. The image size and the imaging speed are mainly dependent on the acquisition rate of the line scan camera. Therefore, the video output speed can be improved if a qualitative line camera is used.
Table 1 is the comparison data of the processing speed when only using a multi-core system for confirming the performance of the GPU accelerated mode real-time display SD-OCT system. We realized and compared the processing in the CPU using the same iteration structure, and used an Intel Core2Quad Q8200 CPU, version 3.3 of the FFTW library FFT arithmetic operation. The FFT arithmetic operation in GPU processing was version 3.2 of the CUFFT library from NVidia. We measured the averaged data of 500 images, and the image size was 1024 × 512 pixels. In the case of the CPU processing, there is no procedure of transmitting the data from the processing unit to the
Comparison in processing time between processors that use CPU and GPU
Lager Image
Comparison in processing time between processors that use CPU and GPU
Lager Image
Real-time display OCT image of a human eye. (a) Human retina real-time display 2D movie. (b) Reconstruction of human optic disk 3D image.
memory, so data transmitting is not necessary. Nevertheless, when we used the GPU accelerated mode, we confirmed a performance improvement of 160 times in the k-domain linearization process, 15 times in the FFT process and 400 times in the log scaling process. Total processing time was 1.6 ms with GPU accelerated mode processing, and 62 ms with sole CPU processing. Performance was improved by 38 times as shown in Table 1 .
Figure 5 is the image data used by the accelerated realtime display SD-OCT system. Fig. 5 (a) is the real time image of an in vivo human retina with a resolution of 1024 × 512. The image output speed was 110 frame/sec, and the movement of the eye can be confirmed from the movie. Fig. 5 (b) is a reconstructed 3D image of the in vivo human optic disk from the 2D image acquired by the C-scan. Blood vessels and optic nerves were clearly observed. We could confirm that the imaging speed is fast enough to ignore motion artifacts.
In this paper, first we compared the signal processing speed of a conventional CPU and a GPU accelerated mode for applying to real-time display auxiliary instruments for an ophthalmologic operation. When we used the GPU accelerated mode processing, the speed was improved by 6.9 times compared to the CPU processing. Secondly, we compared the image output speed according to image size for measuring the arithmetic performance of the GPU. The speed was about 110 frames per second when the image size was 1024 × 512 pixels. Thirdly, we measured the data acquisition speed according to the image size. The DAQ speed was improved by 38 times when we used the GPU accelerated processing rather than the CPU processing. Finally, we compared and measured the real-time display speed of 1024 × 512 which value is optimized for OCT image. As the result, the real-time imaging was reliably performed avoiding the motion artifacts caused by eye movement, and the blood vessels and the optic nerve were clearly observed as a connected structure. Real-time display based on these results in this paper, the developed OCT system can be applied to auxiliary instruments for ophthalmologic operations. The ability to check the status of the patient and the surgery particularly during a lens implantation, glaucoma surgery and cornea incision convinces the reliability of the OCT system.
This study was financially supported by the grant of the Korea Healthcare Technology R&D Project, Ministry of Health & Welfare, Republic of Korea (A102024-1011- 0000200), BK21 project, Technology Innovation Program. (10040121, Development of high-speed OCT assisted femtosecond laser disruption system for ocular anterior segment surgery), the National Institutes of Health (201225940000) and a National Research Foundation of Korea (NRF) grant funded by the Korean government (MEST) (2010-0014461).
Huang D. , Swanson E. A. , Lin C. P. , Schuman J. S. , Stinson W. G. , Chang W. , Hee M. , Flotte T. , Gregory K. , Puliafito C. A. , Fujimoto J. G. (1991) “Optical coherence tomography” Science 254 1178 - 1181    DOI : 10.1126/science.1957169
Bouma B. E. , Tearney G. J. 2002 Handbook of Optical Coherence Tomography Marcel Dekker Inc. New York, USA
Fercher A. F. , Drexler W. , Hitzenberger C. K. , Lasser T. (2003) “Optical coherence tomography-principles and applications” Rep. Prog. Phys. 66 239 - 303    DOI : 10.1088/0034-4885/66/2/204
Schmitt J. M. (2007) “Optical coherence tomography (OCT): a review” IEEE J. Select. Topics Quantum Electron. 5 1205 - 1215
Kim J. , Sohn B.-S. , Milner T. E. (2007) “Real-time retinal imaging with a parallel optical coherence tomography using a CMOS smart array detector” J. Korean Phys. Soc. 51 1787 - 1791    DOI : 10.3938/jkps.51.1787
Tao Y. K. , Zhao M , Izatt J. A. (2007) “High-speed complex conjugate resolved retinal spectral domain optical coherence tomography using sinusoidal phase modulation” Opt. Lett. 32 2918 - 2920    DOI : 10.1364/OL.32.002918
Yasuno Y. , Madjarova V. D. , Makita S. , Akiba M. , Morosawa A. , Chong C. , Sakai T. , Chan K.-P. , Itoh M. , Yatagai T. (2005) “Three-dimensional and high-speed swept-source optical coherence tomography for in vivo investigation of human anterior eye segments” Opt. Express 13 10652 - 10664    DOI : 10.1364/OPEX.13.010652
Jeon M. , Jung U. , Song J.-W. , Kim J. , Oh J. , Eom J. , Kim C.-S. , Park Y.-H. (2009) “Frequency swept laser at 1300 nm using a wavelength scanning filter based on a rotating slit disk” J. Opt. Soc. Korea 13 330 - 334    DOI : 10.3807/JOSK.2009.13.3.330
Zhang K. , Kang J. U. (2011) “Real-time intraoperative 4D full-range FD-OCT based on the dual graphics processing units architecture for microsurgery guidance” Opt. Express 2 764 - 770    DOI : 10.1364/BOE.2.000764
Zhang K. , Kang J. U. (2010) “Real-time 4D signal processing and visualization using graphics processing unit on a regular nonlinear-k Fourier-domain OCT system” Opt. Express 18 11772 - 11784    DOI : 10.1364/OE.18.011772
Zhang K. , Kang J. U. (2010) “Graphics processing unit accelerated non-uniform fast Fourier transform for ultrahighspeed, real-time Fourier-domain OCT” Opt. Express 18 23472 - 23487    DOI : 10.1364/OE.18.023472
Jeong H. , Cho N. H. , Jung U. , Lee C. , Kim J.-Y. , Kim J. (2012) “Ultra-fast displaying spectral domain optical Doppler tomography system using a graphics processing unit” Sensors 12 6920 - 6929    DOI : 10.3390/s120606920
Jeon M. , Kim J. , Jung U. , Lee C. , Jung W. , Boppart S. A. (2011) “Full-range k-domain linearization in spectral-domain optical coherence tomography” Appl. Opt. 50 1158 - 1163    DOI : 10.1364/AO.50.001158
Jung U. , Cho N. H. , Kim S. , Jeong H. , Kim J. , Ahn Y. C. (2011) “Simple spectral calibration method and its application using an index array for swept source optical coherence tomography” J. Opt. Soc. Korea 15 386 - 393    DOI : 10.3807/JOSK.2011.15.4.386
Tearney G. J. , Yabushita H. , Houser S. L. , Aretz H. T. , Jang I. K. , Schlendorf K. H. , Kauffman C. R. , Shishkov M. , Halpern E. F. , Bouma B. E. (2003) “Quantification of macrophage content in atherosclerotic plaques by optical coherence tomography” Circulation 107 113 - 119    DOI : 10.1161/01.CIR.0000044384.41037.43