A Real-Time Seamless Immersive Display System Video Based Panorama

A Real-Time Seamless Immersive Display System Video Based PanoramaA Real-Time Seamless Immersive Display System Video Based Panorama A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 分类号____________ 密级____________ U D C ___________ 编号____________ Central South University 论文题目: A Real-Time S...

A Real-Time Seamless Immersive Display System Video Based Panorama A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 分类号____________ 密级____________ U D C ___________ 编号____________ Central South University 论文题目: A Real-Time Seamlessly Immersive Display System, Video-Based Panorama. 学科、专业:计算机应用技术研究生姓名: KONDELA EMMANUEL 学院: 信息科学与工程学院导师姓名: 教授黄东军 2011 年 04 月12 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Central South University School of Information Science and Engineering Department of Computer Science Application and Technology Master’s Thesis Title: A Real-Time Seamlessly Immersive Display System Video Based-Panorama. Specialty/Major: Computer Science Application and Technology Student Name: Kondela Emmanuel Student Identity ID: 084618006 Supervisor Name: Prof. Huang Dong Jun (PHD) 2011 年 4 月12 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 原创性声明(declaration) 本人声明，所呈交的学位论文是本人在导师指导下进行的研究工作及取得的研究成果。尽我所知，除了论文中特别加以标注和致谢的地方外，论文中不包含其他人已经发表或撰写过的研究成果，也不包含为获得中南大学或其他单位的学位或证书而使用过的材料。与我共同工作的同志对本研究所作的贡献均已在论文中作了明确的说明。作者签名: Chinese Name 导师姓名:教授 ………………….. ………………….. 日期:……..年……月……天日期:……..年……月……天学位论文版权使用授权书本人了解中南大学有关保留、使用学位论文的规定，即:学校有权保留学位论文并根据国家或湖南省有关部门规定送交学位论文，允许学位论文被查阅和借阅;学校可以公布学位论文的全部或部分内容，可以采用复印、缩印或其它手段保存学位论文。同时授权中国科学技术信息研究所将本学位论文收录到《中国学位论文全文数据库》，并通过网络向社会公众提供信息服务。作者签名: 导师签名日期: 年月日 3 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Declaration The thesis studies represent original work by the author and have not otherwise been submitted in any form for any degree or diploma to any University. Where use has been made of the work of others it is duly acknowledged in the bibliography. Author: Kondela Emmanuel Instructor: Prof. Huang Dong Jun Sign: ………………….. Sign: ………………….. Date: ………………….. Date: ………………….. Copyright Authorization I understand that the reservation of Central South University, using the provisions of the thesis, namely: the right to retain the school thesis Hunan Province, according to the relevant departments under the State or sent to theses, dissertations are allowed to access and borrow; Schools can publish dissertation in whole or in part, can copy, prints or other means to save thesis. Also authorize the Institute of Scientific and Technical Information of China will be included in this thesis to the "degree of China Full text database ", and through the network to provide information services to the public. Author: Kondela Emmanuel Instructor: Prof. Huang Dong Jun Sign: ………………….. Sign: ………………….. Date: ………………….. Date: ………………….. 4 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Abstract Wide field-of-view (FOV) is necessary for many industrial applications, such as air traffic control, large vehicle driving and navigation. In this thesis, I introduce a real time immersive display system. It captures live videos from the cameras and recreates an immersive environment. It immerses the viewer with a panoramic view of the environment. The design goals of our system are real-time, live, low-cost, and scalable. The system consists of three stages: initialization, real-time and projection stages. In the first stage, I detect automatically robust features in the initial frame of each camera and find the corresponding points between them. Then the matched point’s pairs are employed to compute the perspective matrix which describes the geometric relationship of the adjacent views. In real time stage, to reduce the computation, parameters for stitching are determined once during the system initialization. I register the frame sequence of different cameras on the same plane using the perspective matrix and synthesize the overlapped region using a nonlinear blending method in real time. Then, I stitch multiple video streams captured from ordinary charged couple device (CCD) to generate a panoramic video. To avoid being blocked by the supporting frame, we allow a flexible placement of cameras. This approach trades the accuracy of the generated panoramic image for a larger FOV. The panoramic video is presented on an immersive display which covers the FOV of the viewer. In this way, the narrow fields of each camera are displayed together as one wide scene. Due to the existence of seam artifacts, exposure differences and other misalignments along overlapped region, least square fitting technique being used to achieve wide seamlessly immersive image mosaic. In the final stage, the stitched images are projected simultaneously from multi-channels under IP multicast system, since it is bandwidth-conserving technology specifically designed to reduce traffic by simultaneously delivering a single stream of information to potentially thousands of nodes. 5 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Acknowledgements Through a difficult year of my thesis preparation, thanks to my supervisor Prof. Huang Dong Jun for his guidance and encouragement, for introducing me to many of the topics in this thesis and for his support and patience. I would like to acknowledge the National Natural Science Foundation of China No.60873188 for supporting this research. I would like to say thanks to my wife Esther for her motivating love, spirit and guidance. I also thank to my committee members for many wonderful discussions and suggestions. Many thanks to the CSU multimedia group members for providing an excellent research environment who worked closely with me to build many research prototypes. Thanks to the staff of the CSU computer science department, for helping me in innumerable ways and for being a pleasure to work. Thanks to my parents for their love and for being supportive of whatever I wanted to do in life, for their selfless dedication and their emphasis on education. 6 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 TABLE OF CONTENTS:LIST OF FIGURES: ........................................................................................................ 9 CHAPTER 1: Introduction ........................................................................................... 10 1.1 A Briefly History ................................................................................................. 12 1.2 Problem and Approach ........................................................................................ 12 1.3 Thesis Statement .................................................................................................. 14 CHAPTER 2: Background............................................................................................ 16 2.1 Acquisition .......................................................................................................... 16 2.2 Image Registration ............................................................................................... 17 2.3 Immersive Display ............................................................................................... 18 2.4 Panoramic Mosaic of Photographs ....................................................................... 19 2.4.1 Re-Projection ................................................................................................ 20 2.4.2 Blending ....................................................................................................... 20 2.5 Projector-Based Environment .............................................................................. 21 2.5.1 Panoramic Environments .............................................................................. 21 2.5.2 Tiled Planar Displays ................................................................................... 21 2.5.3 Illumination .................................................................................................. 22 CHAPTER 3: Video Acquisition ................................................................................... 24 3.1 Camera Placement ............................................................................................... 24 3.2 Video Capturing .................................................................................................. 26 3.3 Video Streaming .................................................................................................. 27 CHAPTER 4: Panorama Construction ........................................................................... 29 4.1 Calibration of Camera .......................................................................................... 29 4.1.1 3D Projection ............................................................................................... 29 4.1.2 Photogrammetric .......................................................................................... 32 4.1.3 Pinhole Camera Model ................................................................................. 34 4.1.4 Perspective Matrix ........................................................................................ 36 4.1.5 Camera Intrinsic ........................................................................................... 37 4.1.6 Camera Matrix.............................................................................................. 39 4.1.7 Camera Calibration ...................................................................................... 40 4.1.8 Homograph ................................................................................................... 42 4.1.9 Image Mapping ............................................................................................. 43 4.2 Finding Correspondences ..................................................................................... 44 4.2.1 Feature Detection ......................................................................................... 44 4.2.2 Point Correspondences ................................................................................. 46 4.2.3 Frame Registration ....................................................................................... 54 4.2.4 Image Blending ............................................................................................. 55 4.2.5 LSF Technique .............................................................................................. 56 CHAPTER 5: System Architecture and Implementation ................................................ 63 5.1 Scene from Camera.............................................................................................. 64 5.2 Performance ........................................................................................................ 64 5.3 Scalability ............................................................................................................ 65 5.4 Limitations .......................................................................................................... 66 CHAPTER 6: Conclusion and Future Work................................................................... 67 APPENDIX A: Camera Placement ................................................................................ 68 7 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 APPENDIX B: Software Development.......................................................................... 70 APPENDIX C: List of Publications and Achievements ................................................. 85 BIBLIOGRAPHY ......................................................................................................... 85 8 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 LIST OF FIGURES: Figure 1: Air traffic control tower. (a) FOV is limited by the supporting frame. An ideal case is to have a wide FOV as illustrated by the shaded region in (b). ............................ 10 Figure 2: Panoramic photo-mosaic from multiple images. ............................................. 13 Figure 3: HMD .............................................................................................................. 18 Figure 4: CAVE and ImmersaDesk ............................................................................... 19 Figure 5: Layout for three projector Reality Room by Trimensions Inc. ......................... 22 Figure 6: Illumination of large architecture, Son et Lumiere light show. ........................ 22 Figure 7: (a) Basic capture unit: 2 CCD cameras connected with an angle-adjustable frame. (b) The views of the cameras should overlap in certain degrees. ......................... 25 Figure 8: (a) Cameras Placement for Indoor Scene. (b) Camera Placement for Outdoor Scene............................................................................................................................. 26 Figure 9: The USB HUB merger combines 4 video streams ........................................... 27 Figure 10: Steps of Capturing and Generating Panoramic Vision ................................... 28 Figure 11: Planes ........................................................................................................... 30 Figure 12: (a) Pinhole camera model (left) and Figure 4.2(b): Camera lens system (right). ...................................................................................................................................... 33 Figure 13: (a): Images of the same scene with simple camera (left) and Figure 4.3(b) more accurate (right) ..................................................................................................... 33 Figure 14: Pinhole Camera Model ................................................................................. 35 Figure 15: A point it is projected into two images: relationship between the 3D point coordinate x= (X; Y; Z; 1) and the 2D projected point (x; y; 1; d). ................................. 36 Figure 16: Stereo camera captured from the 3D scene object. ........................................ 40 Figure 17: (a) Original image and Figure 17(b): Calibrated image. ................................ 41 Figure 18: Original image (top) and the result of Sobel line detection (bottom). ............ 45 Figure 19: Input images (camera A and B) with Harris Corners (green points)............... 47 Figure 20: Input images (camera A and B) with Harris Corners (green points)............... 48 Figure 21: The result of correspondence matching using NCC (camera A and B). ......... 50 Figure 22: The result of correspondence matching using NCC (camera A and B). ......... 50 Figure 23: The RANSAC results: green lines represent inliers and red ones are outliers. 51 Figure 24: The RANSAC results: green lines represent inliers and red ones are outliers. 51 Figure 25: Panoramic frame is composed by texture mapping and blending. .................. 55 Figure 26: (a): 2D Pixels value within rows and columns. (b) 1D Pixel value within a row ................................................................................................................................ 57 Figure 27: Approximation using a polynomial LSF ....................................................... 60 Figure 28: Experiment-1 Final result with visible artifacts ............................................. 60 Figure 29: Experiment-1 Final result with LSF technique .............................................. 61 Figure 30: Experiment-2 Result with visible artifacts..................................................... 61 Figure 31: Experiment-2 Final result with LSF technique .............................................. 62 Figure 32: System architecture and Configuration ......................................................... 63 Figure 33: Parallax of two cameras ................................................................................ 69 Figure 34: CFLTK Image Processing System 2010........................................................ 71 Figure 35: Image gradients ............................................................................................ 72 Figure 36: Sobel Edge Detector ..................................................................................... 73 Figure 37: Harris Corner Detection ................................................................................ 76 9 CHAPTER 1: Introduction Large seamlessly immersive display is an emerging technology for constructing high – resolution immersive vision environments capable of presenting high - resolution images from scientific simulations, images of large format and surrounding environment for instructions. Immersion increases the perceived level of reality and helps interactivity in many multimedia applications, ranging from video conferencing to entertainment industry. A wide field-of-view (FOV) is needed to immerse the user into the virtual environment. For high level of fidelity, real-time provision of immersive video is necessary. Real-time video streaming is possible, but it is limited to a restricted field-of-view without the use of a specialized camera. The feature of wide FOV is necessary in many applications. For instance, an air traffic control tower with almost 360? FOV (Figure 1(b)) is ideal for traffic coordinators inside to respond quickly during an emergency. Hence the wide FOV improves safety. However, the FOV is often limited by the supporting frame of the building. Figure 1(a). Figure 1(b). Figure 1: Air traffic control tower. (a) FOV is limited by the supporting frame. An ideal case is to have a wide FOV as illustrated by the shaded region in (b). A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 This thesis, presents a system among applications generated from image stitching approach, where two or more live videos captured from different portion of the same scene are stitched and partial images projected and warp together as a result of large scale panoramic vision. It is still challenging to develop an immersive application that acquires and presents panoramic video in a real-time, live, low-cost and scalable fashion. My goal is to develop a low-cost, workable, and scalable immersion system in order to demonstrate a system framework which allows the generation of panoramic video stream in real-time. I use groups of low-cost CCD video cameras (without using expensive specialized cameras), each pointing at different direction, to capture the scene. Their views are slightly overlapped in order to facilitate the stitching of these videos to form the panoramic video. Live video streams from the cameras are fed into our mosaicing engine to create live panoramic video. Unlike the previous systems [23], [47], [21], [40] which restrict the placement of cameras in a small region in order to minimize the parallax between camera centers, I allow a flexible placement of cameras to avoid the occlusion due to the supporting frame. This approach may introduce error into the composed panoramic video, but enlarge the FOV. In other words, I trade the accuracy of panorama for larger FOV. The panoramic video is created in real-time and projected onto a large hemispherical display. Due to the large FOV coverage, users are immersed in the reconstructed environment just as situated in the actual environment. The development of this system challenges in various fields, ranging from computer vision, image processing, video compression, networking, high performance computing and computer graphics. The system is built partially based on off-the-shelf components. My contribution is mainly the system integration of different technologies stemmed from different research fields. The remainder of this thesis presents how I handle different challenges. It is organized as follows. Chapter 2 describes background and some of the related works. Chapter 3 describes how the system acquires the video streams of the scene. In Chapter 4, I explain how the collection of video streams is stitched to form a panoramic FOV video. Chapter 5 gives the architectural overview of the system. Finally, conclusions are drawn in Chapter 6. 11 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 1.1 A Briefly History A seamless 3D immersive vision has a long history. A 3D device called a stereoscope was invented during the early 1830s by Sir Charles Wheatstone [16] made it possible to view a different image with each eye. Photography allowed people to capture images with cameras separated by the same distance as human eyes. The stereoscope made it possible to view objects in such a way that the brain would create a 3D image. The invention of the digital camera made it possible to capture images with remarkable precision. Additionally, such cameras made it easier to extract 3D information coded in flat 2D images. 1.2 Problem and Approach In the last few years, we have seen a number of ideas for creating seamless immersive displays on planar screens using electro-optic approaches such as vignette or using camera in the loop to determine the registration and blending parameters. Traditionally, digital projector is treated like any other 2D device such as a LCD or CRT monitors to create flat and usually rectangular images. Note, however, that a digital projector can be treated as dual of a camera, i.e. a projection device relating 3D space and an image. For camera an analytical camera model such as the thin lens model or the pinhole projection model, is a valuable abstraction that is used in various areas of computer vision, e.g. projective geometry and calibration. In image - based rendering, the chosen camera is the key aspect in the creation of panoramic vision and generation of novel views from given source of images. If we treat a projector as the dual of a camera, we can address and solve the following problems? 12 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 2: Panoramic photo-mosaic from multiple images. For examples, consider the following questions: , Is possible to create a panoramic photo – mosaic by stitching a set of images captured by a camera. Similarly, can we create an immersive panoramic display in a room by roughly positioning a set of projectors? (Figure 2) , In view - dependent image - based rendering, we can take a few sample images of an object and recreate an image of that object from any given viewpoint with correct shading and highlights. Each pixel in the new image is derived from a weighted combination of appropriate pixels in the source images. Similarly, by illuminating an object from a few projectors, can we reproduce a new appearance that is valid from any given viewpoint? , Is possible to detect the geometric misalignments and color imbalances which monitor the contributions of multiple images using computer-vision techniques? , Can we stabilize seam artifacts included within the overlapped cameras due to the inevitable noise in digital images and difficult illumination conditions? The general problem translates on how to compute the necessary images for each projector to attain seamless immersive display system. This problem is addressed by devising a different rendering and calibration technique for each individual display configuration, resulting in a myriad of approaches. Approach In this thesis, I propose a single unified application to allow these problems to address in a consistent fashion. I introduce a new rendering strategy based on a geometric framework to represent the relationship between various display components. This approach provides a new conceptual structure to comprehend the underlying geometric problems in applications of projectors. These problems can be recast and re-thought using 13 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 the new framework and can take advantage of corresponding rendering and calibration algorithms presented. The necessary geometric and color correction functions necessary to enable the generation of a single seamless image across the entire multi-projector display are determined. Finally, the image from each projector is appropriately pre-distorted by the software to achieve this correction. Thus, projectors can be casually placed and the resulting inaccuracies in geometries and color can be corrected automatically by the camera-based calibration techniques in minutes, greatly simplifying the deployment of projector-based large format displays. Recently, techniques have been developed that use one or more cameras to observe a given scene and display a relaxed alignment, where projectors are only casually aligned. I minimize ghosting pixels without making seam visible; this is to adjust with LSF technique value to each pixel near along seam artifacts to make it less visible. My algorithm is linearly smoothing seam artifacts by doping approximated value over seam region. 1.3 Thesis Statement The central thesis of this research is: A single unified application for seamless immersive display system can greatly improve and widen the applications of projectors. The framework incorporates the system of three stages: initialization, real-time and projection stages. In the first stage, I detect automatically robust features in the initial frame of each camera and find the corresponding points between them. Then the matched point’s pairs are employed to compute the perspective matrix which describes the geometric relationship of the adjacent views. In real time stage, to reduce the computation, parameters for stitching are determined once during the system initialization. I register the frame sequence of different cameras on the same plane using the perspective matrix and synthesize the overlapped region using a nonlinear blending method in real time. Then, I stitch multiple video streams captured from ordinary charged couple device (CCD) to generate a panoramic video. To 14 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 avoid being blocked by the supporting frame, we allow a flexible placement of cameras. This approach trades the accuracy of the generated panoramic image for a larger FOV. The panoramic video is presented on an immersive display which covers the FOV of the viewer. In this way, the narrow fields of each camera are displayed together as one wide scene. Due to the existence of seam artifacts, exposure differences and other misalignments along overlapped region, least square fitting technique being used to achieve wide seamlessly immersive image mosaic. In the final stage, the stitched images are projected simultaneously from multi-channels under IP multicast system, since it is bandwidth-conserving technology specifically designed to reduce traffic by simultaneously delivering a single stream of information to potentially thousands of nodes. 15 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 CHAPTER 2: Background This chapter reviews the background and some earlier work related to this thesis. Creating virtual environment is not new. Most of the existing systems fall into one of the three categories: either geometry-based, image-based and hybrid systems. Geometry-based virtual reality system uses geometrical objects to represent the scenes. Flight simulator for training air pilot is a typical example. Since real scene can be arbitrarily complex, modeling real scene may result in huge amount of data which cannot be rendered in real-time. Besides, the modeling cost is prohibitively high. Instead, scenes can be represented using the image-based approach [18], [29]. The reconstructed environment is realistic and the modeling process can be avoided. Hybrid systems make use of both image and geometric model. Augmented reality systems belong to this category. The proposed Immersive Display System belongs to category of image-based system. Three main challenges arise in this area: image acquisition, image registration and display. 2.1 Acquisition In acquiring a scene, ordinary video cameras can only capture a limited FOV [37]. By using a cluster of cameras [22], [39], [60], [59], wide FOV image can be obtained. These cameras introduce further problems such as the requirement of a common optical center. Instead of using ordinary ones, cameras with fish-eye lens can be used [46]. Furthermore, specially designed cameras are developed. Nayar [23] developed an Omni-directional camera which captures at video rate, a hemispherical field of view as seen from a single point. Majumder et al. [20] developed a camera cluster with a single camera center. Baldwin et al. [3] used a single camera to capture the image reflected from a conic mirror. Most previous acquisition systems require a cluster of cameras being mounted on a framework and/or special devices (such as fish-eye lens or conic mirror) being installed. In my system, multiple ordinary cameras are adopted and no special device is needed. The placement of cameras is flexible and not mounted on any specially designed framework in order to avoid the occlusion due to the supporting frame. However the relative position and orientation among the cameras should be fixed during the capture. 16 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 2.2 Image Registration Since I do not fix the cameras on any specially designed framework, I rely on the image registration techniques to generate the panorama. Most existing techniques fall into three main categories. The phase correlation approach bases on the image properties in frequency domain. Firstly proposed by Kuglin and Hines [19], the algorithm uses 2D Fourier transform and computes the displacement between two images from the phase of their cross power spectrum. This algorithm is scene-independent and accurate to within one pixel for image differing by a pure 2D translation. Instead of finding the displacement, other methods [33], [55] determine the rotation and scaling for image registration. These algorithms, however, are quite sensitive to noise and also require the overlap extent to occupy a significant portion of the images (e.g. at least 50%). The second approach is transformation recovery [7], [21], [40], [42]. It is a more intuitive approach. The method makes use of the fact that two spatially neighboring images are related to each other by a homograph transformation. By recovering the homograph transformation, images can be stitched together. To recover the homograph transformation, feature based methods [47], [21] rely on the detection of image features, with which the camera motion can be computed. In the absence of distinctive features, automatic feature extraction methods usually fail. Another way to recover the homograph transformation [15] is to iteratively adjust the camera motion parameters. Szeliski [40] and Gonzalez et al. [14] proposed the use of Levenberg- Marquardt minimization to find the transformation matrix. Szeliski and Shum [38], [42] further formulated the 2D mosaicing problem by assuming the camera rotates around a fixed point. In our system, we applied their method to stitch video frames from different cameras. The third approach is the video sweeping [28], [61], [34]. It creates panoramic mosaics by combining the scene information from the frames in a video sequence. The motion of the camera is recovered and thus the information of the video frames can be properly combined together. Peleg and Herman [28] used a manifold projection which simulates the sweeping of the scene with a plane using a one dimensional sensor array. Their method works fine for a single video strip only but usually fails for the case of multiple 17 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 video strips. Sawhney et al. [34] improved the method by using topology inference and local-to-global alignment. 2.3 Immersive Display For immersive presentation of data, it comes mainly in two forms. The first is the head mount display (HMD). The second is the very large screen [31], [35], [30], typically projected onto a wall and referred to as a spatially immersive display [43], [53]. HMDs (Figure 3) have a number of deficiencies such as separating the users from their familiar environment, low resolution, bulky, and heavy. Since Cruz-Neira et al. developed CAVE [53] (Figure 4), the focus moves to the construction of display system with multiple screens and projectors. The problem becomes finding the relationship between screens and projectors so that the whole set of devices can be considered as a single logical display unit [53], [51], [48], [44], [56], [25]. With multiple randomly placed projectors, Raskar et al. [30] tackled the problem of aligning the projected images so that they appear as a seamless one. Screens are commonly planar in shape. Raskar et al. [32] projected the image onto non-planar screens. This approach allows the use of virtually any wall or structure of a room to act as the display screen. In our system, we adopt a hemispherical display screen which covers about 180? FOV of the viewer. Figure 3: HMD 18 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 4: CAVE and ImmersaDesk 2.4 Panoramic Mosaic of Photographs The problem of achieving geometric registration between overlapping images displayed by projectors has not been explored in great detail. However, many authors have addressed the problem of geometric registration in stitching together multiple camera images to create panoramic photo-mosaics. Given the duality between a projector and a camera, the two problems have many similarities. Typically, the images are taken with a camera mounted on a rotating tripod. If there is no strong motion parallax, the images are ―stitched‖ and smoothly blended to create a single panoramic image. Earlier stitching methods required pure (horizontal) panning motion of the camera [Chen95]. This is analogous to current multi-projector systems that allow only side-by-side overlaps and align two projectors at a time. Newer panoramic image mosaicing techniques allow uncontrolled 3D camera rotations [Szeliski96, Sawhney97] by representing each image with a 3-parameter rotational model or sometimes with more parameters. This allows mosaicing of images taken with even a hand-held camera. The change in position of the center of projection of cameras for the different views is assumed to be negligible compared to the distance to the points in the scene allowing representation of the 3D scene with a projection on a 2D manifold. The panoramic imagery is created using arbitrary projection overlaps. Most of the camera image mosaicing techniques deal with the difficult problem of computing image feature correspondences. 19 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 2.4.1 Re-Projection Pixel re-projection methods are also known as transfer methods in photogrammetric literature. They use a relatively small number of images and geometric constraints to re-project image pixels appropriately at a given virtual camera viewpoint. The geometric constraints, recovered at some stage or known a priori, can be of the form of known depth values at each pixel, epipolar constraints between pairs of images, or trilinear tensors that link correspondences between triplets of images. If the depth value at each pixel is known, then the change in location of that pixel is constrained in a predictable way. The new views can be synthesized either from rectilinear reference images [Chen93] or cylindrical panoramic images [McMillan95]. The geometric and rendering framework presented in the next chapter borrows many concepts in pixel re-projection. The image-based modeling and rendering techniques, however, did not address the problem of how to combine appearance information from multiple images to optimally produce novel views. View-Dependent Texture Mapping (VDTM) was presented in [Debevec96] as a method of rendering interactively constructed 3D architectural scenes using images taken from multiple locations. 2.4.2 Blending Some image-based modeling and rendering work has addressed the problem of blending between available views of the scene in order to produce new renderings. The techniques attempt to achieve smooth weight transition between multiple source images as well as smooth weight changes as the viewpoint moves. The goal is to hide seams where neighboring pixels are rendered with very different combinations of images. The problem is most likely to be noticeable near the frame boundaries of the original images, or near a shadow boundary inside an image. Most techniques available are applicable to panoramic mosaics. They include feathering using linear ramps, gray-level shifts and multiresolution spline techniques. 20 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 2.5 Projector-Based Environment 2.5.1 Panoramic Environments In panoramic display environments, the user is surrounded by high resolution images projected by multiple projectors. The images could be front-projected or rear-projected. The majority of the systems, such as those from Panoram Technologies and Trimension Systems, create images for a single ideal viewer location. Specifically, Trimension uses three overlapping projectors to project images on a rigid spherical screen (Figure 5). The light projectors are aligned symmetrically so that each overlap region is a well-defined rectangle. Flight simulators have been using a similar technique for a long time [Lyon85]. Omnimax [Max82] and ARC domes [Bennett00] immerse the user in high resolution wide-field images using a single projector and dome shaped surfaces. Using rear-projection and head-tracking, the CAVE [Cruz93, Pyramid] enables interactive and rich panoramic visualizations. The setup is a precise and well designed cube-like structure. The CAVE assumes that the display surface and projector geometries are known and are fixed a priori in a specific cube-like configuration (Figure 5). Geometric registration is obtained by carefully ensuring that the physical configuration matches the design. 2.5.2 Tiled Planar Displays Some projector display systems use a purely planar display surface. For example in immersive workbenches a real-projection system illuminates a flat table top (Figure 5). More recently tiled arrays, i.e. two-dimensional arrangement of m× n projectors have become popular. Some examples are PowerWall, InfoMural and Princeton Wall [PowerWall, Humphreys99 and Li00]. 21 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 5: Layout for three projector Reality Room by Trimensions Inc. All the existing large format displays systems are defined by a very specific configuration. The actual projection environment then attempts to be a precise implementation of the design blueprint. However, this leads to the need of constant electro-mechanical alignment and calibration of the projectors, screens and the supporting structure. Figure 6: Illumination of large architecture, Son et Lumiere light show. 2.5.3 Illumination When we illuminate a physical object with a white light, the surface reflects particular wavelengths of light, and we perceive the respective surface attributes. Our perception of the surface attributes is dependent only on the spectrum of light that eventually reaches our eyes. This concept been effectively used in many theater and entertainment settings to create interesting visual effects. A limited but compelling example of this idea is the use of projectors to animate artificial human heads in the Walt Disney World’s ―Haunted Mansion‖ attraction. Projected imagery animates four neutral busts of singing men, and a patented projector and fiber-optic setup animates the head of the fictional fortune teller ―Madame Leota‖ inside a real crystal ball [Liljegren90]. On a more physically grand 22 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 scale, projectors have recently been used to render a variety of lighting and projected imagery on a very large architectural scale. For example, in 1952 Paul Robert-Houdin used sounds and colored lights on a building for nighttime entertainment. The most well-known modern realization of this idea is the Son et Lumiere (light show) at/on the Blois castle in the Loire Valley (France) (Figure 6). I consider this process as a subset of image-based illumination techniques I cover later in this dissertation. 23 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 CHAPTER 3: Video Acquisition The video capture module captures the scene with as large FOV as possible. Capturing devices such as digital camcorder or CCD camera should be used. Instead of using sophisticated cameras as in [39], [20], I use off-the-shelf components in developing this system. This has numerous advantages including higher flexibility and greater system scalability. 3.1 Camera Placement Unless using special equipment [50], [46], it is not possible to capture the whole 360? scene (spherical, not just cylindrical) with a single camera. To capture video streams of 360? FOV, this system uses multiple ordinary CCD cameras. The video cameras are placed so as to enlarge the FOV and avoid the occlusion due to the supporting frame. One requirement of our system is scalability, i.e. when more cameras are used, we get image with larger FOV. Since our display unit is of hemispherical shape, capturing 180? is sufficient for my need. This system can be easily extended to capture 360? FOV by adding more cameras. Theoretically, these cameras should be placed close together so that their centers of projection coincide (Figure 7(a)). This arrangement of cameras facilitates the mosaicing process. However, in practice, this coincidence requirement can be relaxed depending on the actual applications. Outdoor scene capture (with no nearby object) relaxes this coincidence constraint. Even in the indoor scene capture, relaxation is possible if the user accepts small error in the final panoramic images. Moreover, I want to avoid the occlusion due to the supporting frame, a setup with all camera centers being coincident cannot avoid occlusion due to the nearby obstacle. 24 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 7(a) and Figure 7(b) Figure 7: (a) Basic capture unit: 2 CCD cameras connected with an angle-adjustable frame. (b) The views of the cameras should overlap in certain degrees. Instead, my system requires all camera centers should be loosely coincident, i.e. all cameras should be pointing roughly from the same point. Interested readers are referred to Appendix A for the details of this parallax problem. The Appendix A also serves as a guideline for camera placement. Another requirement is that there should be about 20% of overlap between the views of two adjacent cameras in order to facilitate the image mosaicing (Figure 7(b)). The basic capture unit is a group of two cameras mounted on an angle-adjustable framework as shown in Figure 7(a). The upper camera points downward while the lower one points upward in order to reduce the distance between the two cameras (Figure 7(a)). Note that we mount two cameras on a framework, not because our system requires so, but solely due to convenience. For indoor scene where objects are near, the distance between centers of projection should be as small as possible. (Figure 8.a) shows the camera placement for indoor scene capture. For outdoor scene, cameras can be placed farther apart without causing serious problems in the stitching process. Figure 8.b shows the floor-plan of our camera placement for outdoor scene capture (more details in Section VII). With eight cameras, my system covers approximately a view with horizontal FOV of 150?? and vertical FOV of 110?. To enlarge the coverage, more cameras can be added. 25 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 8(a) Figure 8(b) Figure 8: (a) Cameras Placement for Indoor Scene. (b) Camera Placement for Outdoor Scene 3.2 Video Capturing During capture, the output of 4 cameras (2 basic capture units) is connected to a USB merger which combines 4 videos into 1 video (Figure 9). The combined video is then fed to the video grabber installed in a PC. This combination degrades the video quality. The reason of using a video merger is because normally a PC can handle only one input video stream without relying on some expensive specialized video capture boards. For 4 camera cluster, this would require a farm of 4 PCs. Using a video merger can greatly reduce to using 1 PCs only, though sacrificing the video quality. With the combined video stream, 26 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 video compression and streaming can be carried out in a more cost-effective manner. Figure 6 shows an example. Four 320×240 video streams are combined into a single 640×480 video stream. Figure 9: The USB HUB merger combines 4 video streams 3.3 Video Streaming The captured video is compressed in real-time and streamed across the network to the target display module. Since I transmit multiple video streams to the stitching and rendering module, it is important to synchronize the input video streams. Otherwise, the stitching and rendering module may wrongly use the frames of different time instance to compose the panoramic frame and introduce discontinuity into the resultant panoramic frame. I synchronize the clock of each video capture PC with the network time protocol (NTP). NTP is a protocol which can accurately distribute network time to the accuracy of millisecond. Video usually has a frame rate of 25 to 30 fps, which is in the order of ten of milliseconds. Hence NTP offers sufficient accuracy for our need. During capture, we add a time-stamp to each frame so that the stitching and rendering module can determine the right set of frames at any particular time instance. My delivery model for the video streams follows the simple peer-to-peer design. This consists of the following steps: 27 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 1. Before the streaming starts, messages are sent to the capturing PCs to notify the beginning of the capture session. 2. Video are grabbed and encoded by the software video encoder. 3. The encoded data is time-stamped and sent across the network to the receiver side. 4. Upon receiving the encoded frames, they are decompressed. Time-stamps are used for recovering the frame sequence and synchronizing frames from different video streams. 5. The synchronized video frames are passed to the stitching and rendering module to generate the panoramic video. Different video streaming models [27], [58], [24], [36], [49] suit for different applications. Although I currently use the peer-to-peer model, the system can be easily extended to the broadcast basis which allows multiple users to experience the same virtual environment simultaneously. Figure 10: Steps of Capturing and Generating Panoramic Vision 28 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 CHAPTER 4: Panorama Construction Image mosaic is the task of combining a collection of images with small FOV to obtain an image of larger FOV. In my system, I combine a collection of continuous video streams (fixed viewing direction) with small FOV to generate a continuous panoramic video stream of large FOV. I called this process continuous video mosaicing. Note that I do not use the terminology video mosaic as it has been used by other researchers [14], [41] to refer the construction of a static panoramic image from a video stream. Instead of running the image mosaic algorithm once per video frame, I apply the mosaicing algorithm once and re-use the mosaic parameters for all the subsequent frames. In this way, the image mosaicing can be done in the offline phase. The processing pipeline can be divided into offline and online phases. In the offline phase, all necessary pre-computations are done. This includes the computation of lens distortion parameters, intrinsic parameters, extrinsic parameters, and warping parameters due to sweet spot relocation (explained shortly). All these parameters are then implicitly stored as texture coordinates attached to the vertices of a triangular mesh representing the panorama. In the online phase, minimal work is carried out to create and display the panoramic video stream. The subsequent video frames from multiple CCD cameras are simply treated as textures and mapped onto the triangular mesh using standard graphics hardware to synthesize the panoramic video frames. Hence, no computation of those parameters is necessary. 4.1 Calibration of Camera Each camera is calibrated separately for computing its intrinsic parameters and lens distortion parameters. 4.1.1 3D Projection I consider the projection of 3D object captured from multi-cameras of image planes p1, m2(x2,y2)p2 and p3 (Figure 11), the computation of homograph h maps every point to m1(x1,y1)m3(x3,y3)its corresponding point and similarly every point to its 29 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 m2(x2,y2)corresponding point. The homograph in effect maps the point where the same light ray intersecting p2 with p1 and p2 with p3 respectively. I can do this using a linear 3D to 2D projection matrix. The simplest model is orthography projection, which can be used to show profile, detail or precise measurements requires no division to get the final result. The more commonly used model is perspective matrix, since this more accurately models the behavior of real cameras. This work applies window size of 3 x 3 with 8 degree of freedom, since the corresponding points can be defined up to scale. We compute homograph with at least four points in pair between p1-p2 and between p2 –p3. . Figure 11: Planes m1As illustrated in the (Figure 11) above consider an image point P at 3D object, where TTP[X,Y,Z]m,[x,y,z]and but z = f on image plane and j = 1. An orthographic projection simply drops the z component of the three-dimensional coordinate to obtain the 2D point x. (In this section, I use p to denote 3D points and x to denote 2D points.) This can be written as ,,x,I|0p (1) 2x2 If I am using projective coordinates, I can write 30 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 1000，， ,,x,0100p,,,, (2) ,,0001，， I drop the z component but I keep the w component. Orthogonal is an approximate model for long focal length lenses and objects whose deep is shallow relative to their distance to the camera. In practice, world coordinates need to be scaled to fit onto an image sensor. For this reason scaled orthogonal is actually more commonly used, ,,x,sI|0p (3) 2x2 This model is equivalent to first projecting the world points onto a local front-parallel image plane, and then scaling this image using regular perspective projection. The scaling can either be the same for all parts of the scene or it can be different for different objects that are being modeled independently. More importantly the scaling can vary frame to frame when estimating structure from motion, which can better model the scale change that occurs as an object approaches the camera. Scaled orthogonal is a popular model for reconstructing the 3D shape of objects far away from the camera, since it greatly simplifies certain computations. For example, pose (camera orientation) can be estimated using simple least squares. A closely related projection model is para-perspective [17]. In this model, object points are again first projected onto a local reference parallel to the image plane. However, rather than being projected orthogonally to this plane, they are projected parallel to the sight to the object center. This is followed by usual projection onto the final image plane, which again amounts to a scaling. The combination of these two projections is therefore aaaa，，affine and can be written as ,, x,aaaap00010203,,~~ (4) ,,000110111213，， 31 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Para-perspective provides a more accurate projection model than scaled orthography, without incurring the added complexity of per-pixel perspective division, which invalidates traditional factorization methods. 4.1.2 Photogrammetric The problem of extracting 3D information from images has its roots in photogrammetric. Photogrammetric was developed and mainly used as a method for measuring real-life objects based on images containing objects of interest. Originally, photogrammetric did not use either computers or digital cameras. The main task for photogrammetric is to recover real dimensions of objects being photographed. The problem is the deformation of an object in an image caused by the way light traverses through the camera elements. This section contains a description of important topics from photogrammetric that apply to this project. The process of image creation assumed in this project is that described by the pinhole camera model (see Section 4.1.3 for more details). In practice though, due to the camera construction, the assumptions about the pinhole camera model are violated. A pinhole camera model is a simplified mathematical model of how images are created. In practice, the aperture of a pinhole camera would have to have infinitely small diameter. This makes the exposure of images impossible. Instead, glass lenses are used that are able to catch a bundle of rays and focus them on one point in an image plane. This makes it possible to form an image on light-sensitive film inside a camera. The disadvantage of using glass lenses is that a ray of light coming from an object refracts several times. The angle at which a ray of light enters a lens is not the same as the angle at which the ray of light leaves the lens. As a result, one cannot select one point and consider it the centre of the projection and points in the image are recorded with different focal lengths (see Figure 12). The actual focal length f depends on the angle between a principal ray and a given object. The greater the angle θ, the bigger the distortion of an image, as shown in the left part of Figure 4.3, the image produced by an inexpensive camera is shown. The lens distortion causes straight lines to be recorded as non-straight lines. This effect becomes more severe as the distance from a principal point increases. The right part of Figure 12 shows an 32 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 image taken with a digital camera equipped with Sony Mavica Carl-Zeiss lens systems. The accuracy of the Mavica image is much better than the camera image in Figure 4. The superiority of the Mavica image is due to the fact that the Mavica camera has a bigger focal length. The resulting Mavica image has geometry similar to that obtained when using the pinhole camera model. Nevertheless, even images produced by high quality cameras contain some distortions and these distortions need to be removed in order to obtain reliable results in 2D to 3D conversion. Figure 12: (a) Pinhole camera model (left) and Figure 4.2(b): Camera lens system (right). The distortions due to glass lens are corrected using photogrammetric techniques. This process is called camera calibration. To produce an accurate image, a camera is calibrated and an image is corrected. The literature is rich in camera calibration techniques that do not impose conditions on how the images are to be taken. Figure 13: (a): Images of the same scene with simple camera (left) and Figure 4.3(b) more accurate (right) For this project, full camera calibration is described (see Section 4.1.7). The reason for this is that despite the fact that the most common application of the system is to process 33 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 images taken using more accurate digital cameras, such cameras produce slightly distorted images. The precision required for this project is a crucial factor and small distortions produced by high quality off-the-shelf digital cameras are still a concern. 4.1.3 Pinhole Camera Model Geometric equations used for converting an object in 3D space into flat images, assume that images are obtained using a camera that conforms to the pinhole camera model. In practice, digital camera construction violates this assumption and the resulting images contain distortions. The most common result of these distortions is that straight lines are mapped into curves. (Section 4.1.7) describes a standard camera calibration method used to remove these distortions. Figure 14 shows the pinhole camera model (also called central projection). The image plane is shown on the left-hand side of Figure 14. In digital cameras, the image plane is a CCD (Charged-Coupled Device) matrix used to record an image. By contrast, in classical cameras, images are recorded on light-sensitive film. Regardless of the media, the image plane is considered to be the actual image containing a view of an observed scene. The straight line connecting each point Pi in three - dimensional space with its corresponding image point pi is called the projecting ray. The point where the projecting ray crosses the image plane pi is described by the coordinates of the pixel representing given point Pi in three - dimensional space. Each projecting ray goes through a special point called the perspective centre, the lens centre or the pinhole lens. 34 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 14: Pinhole Camera Model The projecting ray that is perpendicular to the image plane is called the principal ray. It crosses the image plane at a point called the principal point. The distance f between the image plane and the perspective centre is called the focal length. The focal length controls the width of a camera view. The shorter the focal length the wider the camera angle, but also the bigger the distortion of an image due to perspective transformation. Perspective transformation is the transformation that converts coordinates of a 3D point Pi into image plane coordinates pi. For the sake of simplicity, assume that the world Z coordinate is aligned with the principal ray. The x and y coordinates of the image plane correspond to world X and Y coordinates. Assuming that the origin of the world coordinate system is placed at the perspective centre, the focal length equals f and the fXfYiicoordinates of point Pi are (Xi, Yi, Zi), the coordinates of a point pi = (xi, yi) are given by x,,y,ii fXfY hence we constitute a foundation of pin-hole camera model. ZZiix,,y, Therefore hence we constitute a foundation of pin-hole camera model. ZZ (R,t)From the extrinsic parameter a change from the camera coordinate system O to the external world can be accomplished by a translation T and rotation R. 35 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 R1R11R12R13，，，， P,R(Pw,T) (5) ,,,, R,R2,R21R22R23 ,,,, (6) ,,,,R3R31R32R333x33x3，，，， T,Ow,O1 (7) From above equations I have to note that, the extrinsic parameters of perspective camera are necessary geometric parameter that allows a change from the camera coordinates system to the external coordinate system and vice versa. In the intrinsic parameters the parameters of the projective transformation is itself a focal length f. 4.1.4 Perspective Matrix The 3D perspective projection is most commonly used in computer graphics and computer vision. Here, points are projected onto the image plane by dividing them by their z component. Using inhomogeneous coordinates, this can be written as x/z ，， ,,x,P(p),y/z,, (8) z ,,1，， Figure 15: A point it is projected into two images: relationship between the 3D point coordinate x= (X; Y; Z; 1) and the 2D projected point (x; y; 1; d). 36 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 In homogeneous coordinates the projection has a simple linear form. Thus after projection it is not possible to recover the distance of the 3D point from the image, which 1000，，makes sense for 2D imaging sensor. The initial perspective projection is then represented using a 4 x 4 matrix where zf and zn are the far and near z clipping planes. ,, 0100 ,, ,zfzn,zfx,p ,,00,, zf,znzf,zn,, (9) ,,0010，， If we set zn = 1, zf = α and switch the sign of the third row, the third element of the normalized screen vector becomes the inverse depth. This can be quite convenient in many cases, since for cameras moving around in the outdoors, the inverse depth to the camera is often a better conditioned parameterization that direct 3D distance. While a regular 2D image sensor has no way of measuring distance to a surface point, range sensor or stereo matching algorithms can compute such values. It is then convenient to be able to map from a sensor-based depth or disparity value d directly back to a 3D location using the inverse of a 4 x 4 matrix. 4.1.5 Camera Intrinsic Once I have projected a 3D point through an ideal pinhole using a projection matrix, I must still transform the resulting coordinates according to the pixel sensor spacing and the relative position of the sensor plane to the origin. Image sensors return pixel values (x,y)indexed by integer pixel coordinates often with the coordinates starting at the ss upper-left corner of the image and moving down and to the right. To map pixel centers to (x,y)(x,y)3D coordinates, first I scale the values by the pixel spacing(sometimes ssss expressed in microns for solid-state sensors), and then describe the orientation of the sensor array relative to the camera projection center Oc with an origin cs and a three - dimensional rotation Rs. Therefore, the combined three - dimensional projection can then be written as 37 s00，，xA Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 x，，s,, 0s0y,,,,,,p,R|cy,Mx (10) sssss ,,,,000 ,,1,,M，，The first two column of 3x3 matrix are the three - dimensional vectors corresponding s 001，，xyto units steps in the image pixel array along the and directions, while the third ss cMcolumn is the 3D image array origin. The matrix is parameterized by eight ss (R,c)unknowns: the rotation and translation describing the sensor orientation and the ss (s,s)xytwo scale factors. MHowever, estimating a camera model with the required seven degrees of freedom is s impractical, so most of practitioners assume a general 3x3 homogeneous matrix form. The relationship between the 3D pixel center and p and the 3D camera-centered point pc p,spis given by an unknown scaling s,. c I, therefore, write the complete projection between pc and a homogeneous version of the ~xpixel address as s ,1 (11) x,,Mp,Kpscc, The 3x3 matrix K is called the calibration matrix and describes the camera intrinsic parameters. When calibrating a camera based on external 3D points or other measurements, we end up estimating the intrinsic (K) and extrinsic (R, t) camera parameters simultaneously using a series of measurements where pw are known 3D world coordinates and is known as the camera matrix ~ ,,x,KR|tp,Pp (12) ww . ,,P,KR|t (13) 38 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 RInspecting this equation, we see that we can post-multiply K by and pre-multiply 1 T,,R|tR by, and still end up with a valid calibration. Thus is impossible based on image 1 measurements alone to know the true orientation of the sensor and the true camera intrinsic? The choice of an upper-triangular form for K seems to be convectional. Given ,,P,KR|ta full 3 x 4 camera matrix, I can compute an upper-triangle K matrix using QR factorization. There are several ways to write the upper-triangular form of K. One 0，，possibility is fcxx,,,0,,Kfc (14) yy ,,001，， ffyWhich uses independent focal length and for the sensor x and y dimensions. The x entry s encodes any possible skew between the sensor axes due to the sensor not being (c,c)xymounted perpendicular to the optical axis and denotes the optical center expressed in pixel coordinates. 4.1.6 Camera Matrix Now that I have shown how to parameterize the calibration matrix K, I can put the camera intrinsic and extrinsic together to obtain a single 3 x 4 camera matrix. P,K[R|t] (15) It is sometimes preferable to use an invertible 4 x 4 matrix, which can be obtained by not dropping the last row in the P matrix, where E is a 3D Euclidean transformation and K is the full-rank calibration matrix. ,, K0Rt ，，，， P,,KE (16) TT,,,, 0101 ，，，， 39 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 ~PThe 4 x 4 camera matrix can be used to map directly from 3D world coordinates ~sP,(x,y,z,1)x,(x,y,1,d) to screen coordinates, where ~ indicates equality up ss ~~wx~ ppt o scale. wwwsw 4.1.7 Camera Calibration Initially I need to set up the cameras to capture the scene and develop the mosaic. All cameras I use are adjusted in such a way to have an overlapped area for each pair of them. This way, the process of cameras calibration is made automatic. Currently, 2D model of the scene is used and for panoramic image affine transformation is suitable. But this approach works correctly only for the same depth object. In other words all objects of the scene are supposed to be at almost the same distance from the camera. Figure 16: Stereo camera captured from the 3D scene object. Due to the fact that digital cameras do not conform to the pinhole camera model, it is necessary to transform digital images. This process (called camera calibration) consists of two steps. First, all internal camera parameters describing the actual camera model are identified. Then each digital image is transformed to the required format. The first step of camera calibration process requires several images of a shape with known geometry. To calibrate the cameras, I employ the camera calibration toolbox of Open Source Computer Vision Library (OpenCV). I place a checkerboard in front of each 40 2 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 camera and capture several sample images. Each image is taken from a different angle, 2(2)un24634and all the corners in the checkerboard are identified and the internal parameters of a camera are calculated. The model used in the calculations takes into account the focal x，，xkxy，kr，x，，，，length of the camera, the principal point and skew coefficient as well as radial and (1)123tangential distortion coefficients k1, k2, . . . , k5. An image point with coordinates (x, y) is 2 mapped to a point (xun, yun) in undistorted image using the transformation ,，kr，kr，kr，， ,, (17) ,,,,(2)2un34 222r,x，y. After all parameters have been identified, an inverse mapping from a yykr，y，kxydistorted to an undistorted image is found. Having found this transformation, all the ，，，，images taken by the camera need to be transformed to the undistorted form. The ，， undistorted image will then conform to the pinhole camera model. For experiments for this dissertation, only undistorted versions of images have been used. The Figure 17 shows the result of calibrating the image taken by the digital camera with the focal length set to 6 mm. The distortion of the original image is visible in the corners of Figure 17(a). Straight lines in 3D space should be mapped to straight lines in the image plane. But, as can be seen in Figure 17(b), the lines forming a checkerboard are bent. In Figure 17(b) the image from 17(a) was undistorted using the information about the internal camera parameters. This image conforms to the pinhole camera model. Figure 17(a) Figure 17(b) Figure 17: (a) Original image and Figure 17(b): Calibrated image. 41 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 4.1.8 Homograph For any two cameras with a common optical center, their images can be related by a 2D projective transformation, or homograph. Several approaches have been proposed, interested readers are referred to [40], [38] for in-depth derivation. For completeness, I m1(x1,y1)summarize the key idea in this section. This homograph maps pixel of image m2(x2,y2)p1 to pixel of image p2. This 3 x3 homograph matrix H has nine unknowns hhh，，111213but with eight degree of freedoms. ,,Hhhh,212223,, (18) imageworldworldimageworldimageworld，，，，h,,,,，，hhxXY1000x*Xx*Y11131320000000，， ,,,,,, imageworldworldimageworldimageworldimaeworldworldimageworldimageworldx,X*h，Y*h，h,x*X*h,x*Y*hh,, (19) y000XY1y*Xy*Y1112133132120000000,,,,,,imageworldworldimageworldimageworldy,X*h，Y*h，h,y*X*h,y*Y*h (20) 2122233131 ,,,,imageworldworldimageworldimageworld,, h,,xXY1000x*Xx*Y131111111 These eight unknowns can be solved without using any 3D information from the scene. kkkkkkk,,,,,, With a minimum of four pairs of image correspondences, I can determine all unknowns imageworldworldimageworldimageworldh,,y000XY1y*Xy*Ykkkkkkk211111111,,,,[40], [38]. In practice, more correspondence pairs are identified to minimize the error. ,, ,2nWith n correspondence points, I form a system of equations. ,,,,,,imageworldworldimageworldimageworldh,,xXY1000x*Xx*Y22 2222222 ,,,,,, imageworldworldimageworldimageworldh,,y000XY1y*Xy*Y23,,,,2222222,, ,,,,,, (21) imageworldworldimageworldimageworldh,,xXY1000x*Xx*Y313333333 ,,,,,, imageworldworldimageworldimageworldh,,,,,,y000XY1y*Xy*Y,,323333333，，，，，， m1Solving the system for, the homograph can be obtained. When the system is over-determined, I find a solution which introduces the least error. I achieve this by using singular value decomposition (SVD). It registers images reasonably well, provided that 42 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 image correspondences are correct. The method depends heavily on the accuracy of image correspondences. When errors exist, the homograph computed may not well register images. Therefore, a fine adjustment step is carried out. It minimizes the following sum of the squared intensity errors of the image pairs. 2E,[I2(x2,y2),I1(x1,y1)], (22) m1(x1,y1)m2(x2,y2)where and are the values of corresponding pixel pairs in images p1 and p2 respectively. The error function sums all pixels within the overlapping region of images p1 and p2. I employ the Levenberg-Marquardt method to perform this minimization. With this minimization, the computed homograph is further improved. 4.1.9 Image Mapping Image rectification (the epipolar rectification) is the process of determining a mapping for two given images such that corresponding epipolar lines become collinear and parallel to the horizontal image axes [41]. After the rectification, all corresponding epipolar lines are placed at the same level, i.e., two corresponding epipolar lines have the same vertical position. As a result, images are transformed to a stereo vision view, i.e., images from two cameras are displaced only laterally. This transformation is very important, because it simplifies the problem of dense matching. After finding the epipolar lines, the problem of matching pixels is reduced from 2D to 1D, where each matching point lies on an epipolar line that is a 1D subspace of the 2D space. Parameterization of a line is 1D, but in general, when moving along a line, both coordinates of a point belonging to a line change. After rectifying the images, the vertical coordinates of matched points do not change and the search for matching point from the second image can be performed by varying the horizontal coordinates only. Additionally, there are no special requirements for the cameras’ locations and orientation for the rectification to be possible. Fusiello at al. [41] show that rectification can always be performed, except for the case where movement of the camera is parallel to the principal ray of the camera. 43 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 4.2 Finding Correspondences Since the overlapping region among different views is limited (I try to minimize the number of cameras used), automatic correspondence determination methods usually cannot reliably function. Instead, correspondences are manually identified. Automatic refinement is done to obtain a better correspondence matches. 4.2.1 Feature Detection From the problem above, given image I(X) = I(x, y) where I indicates the intensity and X = (x, y) is the pixel horizontal and vertical position, the corner detection NCC matching T,Rfollowed by choosing a window size W (5x 5) with a threshold . To compute x and y image derivatives (gradients) here as follows: xI,,*I (23) ,x yI,,*I (24) ,y Edge point detection aims at identifying pixels with high variability. Usually, images containing an edge of a 3D shape are represented with areas of different colours on both sides of the edge. In most cases, this results from different lightning conditions on both sides of an edge. For similar reasons, flat areas are represented in an image by uniform patches. These patches do not contain much relevant information relative to the shape of a particular object. A Sobel filter is a gradient based edge detector that detects edges in an image. The Sobel filter consists of two masks that are convolved with the image (see Figure 4.2.1). The application of the Sobel filter results in high values for pixels with high changes of colour in their neighborhood. In the area where colour values are approximately at the same level, the Sobel filter produces small values. But any digital image is hardly ever ideally uniform, and as a result there are always some edges present in the areas representing flat surfaces. The Sobel filter works on grey-scaled images, therefore, before detecting the lines the image is converted to grey-scale levels. This converts the RGB colours into a grey-scale in a more natural way for human vision system than the (R + G + B)/3 formula. Denote by Gx(I) the result of convolving the mask Gx with an image I. An image Isobel resulting 44 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 from applying a Sobel filter to an image I is the square root of sum of two squared convolved images, i.e. I,0.299*R，0.587*G，0.114*B , (25) Edges detected by the Sobel filter are denoted in dark colour. As can be seen, the edge detector selected many points on the flat surface that should not be detected. One solution to this problem could be the threshold of the image with low threshold, but it would also remove some important parts of the image. What in fact is needed is a way of removing small details from the areas of low variability, while keeping the sharp edges. The solution for this problem is a ‖nonlinear diffusion‖ that is discussed in the next section. Figure 18: Original image (top) and the result of Sobel line detection (bottom). gradientI,(I,I),(g*I,g*I)I,g*II,g*Ixyxy, where and using sobel yyxx operator. To compute product of derivatives (gradient at every pixel) the following gradient equations are, I,I*I (26) x2xx 45 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 I,I*Ixyxy (27) I,I*Iy2yy (28) To compute the sum of the product of derivatives (gradients) at each pixel with window size of 5 x 5, with all the pixel within the window size centered at (x, y). g11,Gw*I (29) x2 g12,Gw*Ixy (30) g22,Gw*Iy2 (31) At all pixels within a window size W centered at (x, y) compute the 2x2 matrix to obtain overall gradient. g11g12，， G, (32) ,, g12g22，， To compute the response corner cost of the detector at each pixel, we use the following methods: 1. Harris method. 2R,Det(G),K(Trace(G)) , where k = 0.04. ,, The smallest Eigen value method. min(,(G)),I If the smallest singular value where i(1,2) the pixels(x, y) is i regarded as a corner point candidate. After that I can check uniqueness condition to verify if the pixel (x, y) has greatest min(,(G))in a neighborhood, where marked (x, y) as a corner point and eliminate the i other false alarmed candidates in the neighborhood. 4.2.2 Point Correspondences By beginning my program I use NCC (Normalize Cross Correlation) to match features (Harris corner) in one image with similar features to another image of the same scene that is not registered in the first image. The corner detection and feature matching methods 46 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 are normally used for determining an initial points correspondence set (Figure 19 and 20 below). Figure 19: Input images (camera A and B) with Harris Corners (green points) 47 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 20: Input images (camera A and B) with Harris Corners (green points) The cost of searching for matches between image frames is reduced by identifying ―key points‖ based on computed image-to-image overlap. Key points are then matched to all other key points, but intermediate image frames are only matched to temporally neighboring key points and neighboring intermediate frames to construct a ―match structure.‖ Matches between image pairs may be compressed to reduce computational overhead by replacing groups of feature points with representative measurements. Note that these putative correspondences measured by the NCC criterion are not always perfect 48 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 matched. In the result being noted that some of the correspondences are not true, in reality (Figure 21). The randomly selected putative correspondences are recursively checked for co-linearity. So if correspondence pixels are non-collinear continues, otherwise a new set of putative correspondences is selected again. The selected non-collinear correspondences are then normalized. The reason for this step is to use DLT to compute the homograph [n,4][A2nx9hh9x1,0] from the set of correspondences. Thus DLT is not invariant under similarity transformations. Therefore with the presence of noise the computed solution will diverge from the correct results. Therefore, to insure better results from the matrix A which has exactly rank = 8 including noise make sure DLT is invariant to similarity transformations data normalization is used , which is widely known as Hartley’s algorithm. Data’s normalization involves the following: 1. We choose to scale the coordinates so that the average distance of a point (corner) 2x from the origin is equal to. This means an average point is (1, 1, 1) T. (0,0)2. The points are translated so that their centroid is at the origin. 3. The points are then scaled so that the average distance from the origin is equal 2to. This transformation is applied to each of the two images ~~'(T,T)independently. ,,'wTherefore, the homograph H-norm between normalized pixels is computed first then ,(I(X),I)(I(X),I) homograph between original pixel H is given by H=T-1HnormT. 1122NCC(i,j),X(i,1,2,.....,N)NCC matching, for each corner point, computes the NCC value with i1' X(j,1,2,......,N)all the feature point in the camera picture by the following formula. j2 ~~ (33) ,2'2wxi,(I(X),I),(I(X),I)'wxj ~_~_()1122''IwhereX,W(X),, and and are mean intensities of all the points X,W(X)I12j()i within a window size W and N is the total number of these pixels. 'XAfter that we can determine the matched pointed j j,argmax(NCC(i,j))j,(1,2,...N) (34) 2 49 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 ,If there is another correspondence (i, j) determined before, i* as the matched pointed is chosen for j* by the following equation. i,argmax(NCC(i,j)) (35) '**i,ii(,) Figure 21: The result of correspondence matching using NCC (camera A and B). Figure 22: The result of correspondence matching using NCC (camera A and B). However, the RANSAC algorithm applied later is used to eliminate mismatches. After having the homograph, RANSAC comes to the picture to make sure that this estimated homograph computed previously is as correct as possible by insuring that no outliers 50 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 (mismatches) points have originally participated in the homograph computation. To do so the estimated homograph h is applied to both images, so that we transform the left image pixels by multiplication of H into the right image pixels or correspondents and transform the right image pixels by the multiplication of h-1 into the left image correspondents (as shown in Figure 23). This is the key to judge the correctness of the estimated homograph. Simply, we compute the distance threshold (sum of square of Euclidean distance) between every correspondent and its transform and then check if it is below a certain threshold t. So if the corresponding pair is below t, we add the correspondent pixels to the consensus set S holding only inliers correspondences and if no, we simply ignore these pixels and repeated the process for all the correspondences. Figure 23: The RANSAC results: green lines represent inliers and red ones are outliers. Figure 24: The RANSAC results: green lines represent inliers and red ones are outliers. 51 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Now, we check the size of the consensus set. So if the number of inliers in the consensus set is greater than a certain threshold, which is defined adaptively as maximum of inliers we have so far and can be initially one, then we re-estimate the homograph using all points in S and terminate by applying the obtained homograph to our images. If below the threshold, we have to start again all previous steps from the beginning (as shown in Figure 22).We repeat this process N trial after which the largest consensus set Si is selected and the homograph is re-estimated using all points in the subset Si. RANSAC matching with DLT: ,,'From figure.1 above, since we have been given a scene image (Homogeneous X ,,,,'representation) and a projective transformation H, we have the camera image. X,HX ,,,,T,,h1,,,,,,,,,,TTT''''TH,h,,If we write, and where is a row X,(x,y,w)2hX,(x,y,w)i,,,,iiiiiiiiTh,,3,, ,,,,,,,,thi''vector denoting the row of H, we have 0wXyXTTT,,,,,,,T,,h1iii,,,,,,,,T,0h,,,,,,,, (36) 2'',,,,,,wX0xXTh,,TTT,3,,,, iiii,,,,,,AAh,0This can be written as, where is a 2x9 matrix and is 9 x 1. Therefore, ihi n(n,4) correspondences will provide a matrix A with dimension 2nx9 and the solution ,,,,,,minAhh,1TThof can be written as and equivalent to. minhAAh From the numerical calculation issues in the DLT algorithm, a normalization process should be applied. This step is very important for less well conditioned problem such as ,,,,'Tn,4XDLT. Given point correspondences and a similarity transformation 1Xii 52 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 0,,stx,, ,,,0,,Tsty, which consists of a translation and scaling will take points to a new X1i,,001,, ,,^,,T(0,0)set points such that the centric of the new points has the coordinate X,TXi1i,, ,,,,T，,,2and their average distance from the origin is . Since, we have X,(x,y,1)^iiix,,,,sxtx,,i,,，,i,,,,^,,^ XTXsytyy . (37) ,,,, i1iii11,,,, ,, ,,^The centroid of is: Xi,,,^,,,x,,,,sx，tx,,,^,,,,,Ty,sy，ty,(0,0,1),, (38) ,,,,,,11,,,,,,,,, whereby the xdenote the mean. i ,,,,^t,,sxt,,syTherefore, and. The average distance between and origin is xyXi ^^1iix，y22The average distance = ,in1ii,,22The average distance = (sx,sx)，(sy,sy),i snii,,22The average distance = (x,x)，(y,y),i n2The average distance =. Therefore s can be computed as: 53 2A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 s, 1ii,, (39) 22(x,x)，(y,y),i ,,n,,^,,'T''Similarly, a transformation will take points to a new set of points. 2^XX,TXi^i2i^^ ,,,,,,^,,^^H'Apply DLT to the correspondences and we obtain a homograph. '1'11XXii ,,,,,,Since, we have the desired homograph X,HX,TX,THX,THTXii2i2i21i^1,,,, . (40) H,THT21 4.2.3 Frame Registration The first choice to be made is how to represent the final panoramic image. If only a few frames are stitched together, a natural approach is to select one of the images as the reference and then warp all the other images into the reference coordinate system. The resulting composite is sometimes called a flat panorama, since the projection onto the final surface is still a perspective projection, and hence lines remain straight (which is often a desirable attribute). After the global alignment has been run, there may still be localized mis-registration present in the image mosaic, due to deviations from the idealized parallax-free camera model. Such deviations might include camera translation (especially for hand-held cameras), radial distortion, the dislocation of the optical center (which can be significant for scanned photographs or Photo CDs), and moving objects. In real time stage, to reduce the computation, parameters for stitching are determined once during the system initialization. I register the frame sequence of different cameras on the same plane using the perspective matrix and synthesize the overlapped region using a nonlinear blending method in real time. Using the obtained parameters and the geometry of display screen, we compute a triangulated panorama and the texture coordinates of its vertices. These texture coordinates actually tell where on the screen the pixels from the video frames should be 54 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 drawn. Once the texture coordinates are computed, they are repeatedly used for stitching subsequent video frames. During video capture, very little amount of computing resource is available. The major task in the online phase is the real-time composition of panoramic frames. In order to speed up the panoramic composition process, I take advantage of the current commodity graphics accelerators which render millions of texture-mapped triangles in real-time. Upon receiving all video frames, they are used as the texture maps. Using the pre-computed triangulated panorama and its texture coordinates, the video frames are mapped onto the triangular mesh and blended together to form a panoramic frame in a real-time fashion. Each camera contributes part of the panorama. To visualize their contributions, we color-coded the contributions in Figure 25(b). Figure 25: Panoramic frame is composed by texture mapping and blending. 4.2.4 Image Blending Finally we decide how to produce the stitched image (Figure 25). This involves selecting a final composition surface (flat, cylindrical, spherical, etc.) and view (reference image). It also involves selecting which pixels contribute to the final composite and how to blend these pixels to minimize visible seams, blur, and ghosting. Once the source pixels have been mapped onto the final composite surface, we must still decide how to blend them in order to create an attractive looking panorama. The cost of mapping between image frames from different cameras is reduced based on computed 55 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 image-to-image overlap. Key points are then matched to all other key points, but intermediate image frames are only matched to temporally neighboring key points and neighboring intermediate frames to construct a ―match structure.‖ Image orientations are then estimated from this match structure and used to construct the video mosaic. Matches between image pairs are compressed to reduce computational overhead by replacing groups of feature points with representative measurements. If all the frames are in perfect registration and identically exposed, this is an easy problem where any pixel or combination of pixels will do. However, for real images, visible seams (due to exposure differences), blurring (due to mis-registration), or ghosting (due to moving objects) can occur. As described from the (Figure 26 in section 4.2.5), we present the Least Square Fitting method as a result is seamless and applicable. Creating clean, pleasing looking panoramas involves both deciding which pixels to use and how to weight or blend them. The distinction between these two stages is a little fluid, since per pixel weighting can be thought of as a combination of selection and blending. 4.2.5 LSF Technique In this section I assign the least square fitting technique for approximating an energy function in terms of pixel intensity hence emphasizes seamless video mosaic. The following is a sample of pixel’s values printed along seam artifacts. Figure 26(a): 56 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Original Graph I(x)I(x) 90 80 70 60 50 40 30 20 10 x-5-4-3-2-11234 Figure 26(b): Figure 26: (a): 2D Pixels value within rows and columns. (b) 1D Pixel value within a row As illustrated in the Figure 26(b), a group of nearby pixels with visible artifacts is analyzed at a local level as 2D graph (as shown in the Figure 26 above). A number of m=2x+1 equations from m pixels (ID = 10 pixels). Taylor’s series approximation of 1D n,I(x)1,I(x)1,I(x)patch nearly center 0, will depend on the n+1 constants, corresponding to the intensity derivatives at the patch origin. 2nI(x),I(x)，x，x，...，x 2000 (41) n,x2,xn!,x 02Since, the image intensities change as we move from a position (x, y) by a small amount (x，,cos,,y，,sin,),of in an arbitrary direction θ, which will take us to the position. ,(,,0)The change in intensity, along x direction can be computed as the distance we moved times the derivatives of the intensity with respect to the x direction. This tell us that the change in intensity as we move by a small amount in the image can be found by taking the inner product between the gradient and a vector that describes the movement (,)(1,)(1,) P(x,y)along x direction with at point is; 0 ,IxyIx，y,Ix,y000 (42) 2 , 57 ,x A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 The LSF method approximates a set of 1D patches based on the sum of the square of the errors either in linearly or non-linearly behavior. An error at a point is considered as the difference between true interpolated value and approximated value. An approximation xcan be achieved in either continuous or discrete forms. The discrete LSF is based on i (x,I(x))iinterpolated points (as shown from the Figure 26 above) for i=0, 1…m-1. The curve to be fitted is an n-degree polynomial that best represents all points. By solving the polynomial system in terms of derivative, minimizes the fit error along ( x-axis ) 1D patches, hence the solution of the vector gives the values of the polynomial at (-5,-4 -- 0, ,,,normvv,,,v,v,......,v--,4,5). Thus, having of vector v where, 12mm ,,1/,iv,(v),, (43) ,,,i, ,, 12,norm，，I,vv,xdFor this solution minimizes the error vector where and iii mderivative d become the least square fit, where a known intensity is proportional to a unknown derivative d at known position ―the equation seventeen below‖. ,,m1/2ii v,(I,v),, (44) ,2i,，，x2n,,,I，，1/2i,,,,I15...(0)，，nv,(I,xd),, (45) ,,,,,,I!21i,,,2(0)1x,,,,In,,x,,,,1,,,,,14...n,,,,!,,2...2...1,,,I (46) ,,............,,nI(0)，，xn,,,,,x，，mm,,15...n，， Alternatively, ―the equation sixteen above‖ setting the least square approximation using a ! polynomial equation ―the equation twelve above‖ produces system (n+1) x (n+1) linear equations. The objective here is to find the coefficient values by minimizing the sum of the square of the errors. Let us consider the following settings: 58 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 2 n,I(x)1,I(x)0c,,I(x)1c,00c,I(x)y,I2n,, , and . The 00ii2n1,xc,n!,x 2!,x,,,,,,,0,0,0minimization requires setting , and. c,c,,c1n0 Therefore, the equations can be generalized to approximate a polynomial of degree n into ，，，，m,1m,1m,1m,1m,1a set of m points using the least square fitting method. As following ―the equation n,1nxxxy,,,,,,,,,iiiieighteen below‖ is a generalization for fitting a polynomial of degree n into a set of m ,,,,，，ci,0i,0i,0i,0i,01.points using the least square fitting technique. m,1m,1m,1m,1m,10,,,,,,2nn，1xxxxyx,,,,,c,,,,,,iiiiii1i,0i,0i,0i,0i,0,,,,,,,. ,,,,,,m,1m,1m,1m,1m,1c,,,,nn，12n,22n,1n,1...,,xxxxyx..................,,,,, (47) n,1iiiiii,,,,,,ci,0i,0i,0i,0i,0，，.,,,,nm,1m,1m,1m,1m,1n，1n，22n,12nnxxxxyx,,,,,,,,,iiiiii，，，，i,0i,0i,0i,0i,0. An algorithm used: ，，x,I(x)ia) Define window patch centered a pixel. ，，I,Xdib) Fit n-degree polynomial to windows intensities and find d minimizing c) Assign the polynomial derivatives at x=0 to pixel at a window centered. d) Move one pixel over so that it is centered at pixel (i=x+1, y). e) Repeat a)-d) until window patch reaches at pixel (i=m, y). I(x)As a result, within single 1D patch the following values where estimated using least square fitting. 59 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Original Graph I(x)I(x)Estimated Graph I(x)f(x)=47-8.5x90 80 70 60 50 40 30 20 10 x-5-4-3-2-11234 Figure 27: Approximation using a polynomial LSF Figure 28: Experiment-1 Final result with visible artifacts 60 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 29: Experiment-1 Final result with LSF technique Figure 30: Experiment-2 Result with visible artifacts 61 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 31: Experiment-2 Final result with LSF technique 62 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 CHAPTER 5: System Architecture and Implementation Figure 32: System architecture and Configuration As illustrated from the (Figure 32) above. My current implementation of the Seamless Immersive Display System uses four color CCD cameras which are connected to USB HUB. The receiver side contains a Pentium IV 600 MHz PC connecting to an immersive display. The two PC (Pentium IV 1700) connected through an 100 Mbps Ethernet network. Two digital projectors then connected from each PC. Since we do not rely on any high-end workstation, the cost of our system is not high. Readers are referred to the following webpage for the supplementary video 63 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 5.1 Scene from Camera Indoor scene contains nearby objects. Hence, cameras should be placed as close as possible in order to minimize the error. Figure 8 shows the arrangement of cameras. Figure 8(a) shows all 4 captured views. The stitched panorama is shown in Figure 8(b). There are minor artifacts in the final panoramic video. When an object approaches the cameras, there is a slight ghosting artifact due to the violation of single-center of projection assumption. For the outdoor case, objects are expected to be far away. The cameras can be positioned farther apart without significant error due to parallax. In our experiment, we capture the environment surrounding our department building. 5.2 Performance Latency is an important issue in any tele-immersive applications. My current implementation assumes the distance between the cameras and the remote viewer is within the same building. The system can attain a latency of less than 220 milliseconds. Table I shows the breakdown of latency in our system. The latency is contributed by multiple components including, video encoding, network buffering, network latency and rendering. Source Latency (milliseconds) Video compression 15 - 30 ms Video decompression 15 - 30 ms Network buffering 120 ms Network latency 10 ms Rendering <30 ms Total < 220 ms TABLE 1 Video Codec avg. fps compression ratio Motion JPEG 24.256 16.63 MPEG-4 8.479 113.10 Indeo 1.250 34.04 TABLE 2 64 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Table II shows the performance against different video codecs. Under our high bandwidth network (100Mbps), motion JPEG outperforms other compression methods. Park and Kenyon [26] pointed out that a maximum latency of 100 milliseconds is required for tele-operation. The current implementation is suitable for passive viewing but may not be suitable for tele-operation. To satisfy such latency requirement, I need to reduce the time for network buffering. Currently, I assume the network is very noisy and keep a large buffer. Another bottleneck is the speed of video codec. Note that I am currently using low-end PCs (Pentium IV 1700MHz) and software video codec. The performance can be significantly improved if we use high-end PCs and install the hardware video codec. 5.3 Scalability My system is designed to be scalable. The number of video cameras for scene capture can be increased if necessary. By using 4 video cameras, we are able to cover almost the whole viewing hemisphere, resulting in about 150? horizontal FOV and about 110? vertical FOV. We investigate the system by varying the number of video cameras used. In our current arrangement, each basic capture unit (two cameras) captures a vertical band of the scene. Each addition of a basic capture unit increases the horizontal FOV by approximately 35?. Hence, about eleven units are enough to cover the whole 360?FOV. We have also measured the bandwidth usage against the number of cameras (Figure 15(a)). The relationship is about linear. Our current configuration still does not utilize the whole available bandwidth. There is still room to be scaled up. More cameras can be added. With a network bandwidth of 100Mbps, more than 20 cameras can be supported. If the Ethernet does not offer such high efficiency, we can increase the compression ratio to accommodate more video streams (as a trade-off of image quality) in the same network. 65 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 5.4 Limitations The generated panoramic video is sometimes not sharp enough. In some cases, ghosting effect is observable. Moreover, there is a problem of discontinuity in the color tone. These are mainly due to the following reasons: 1. The video streams may not be well stitched due to the errors in selecting feature points for correspondence. Another possible problem is the violation of far-away- scene assumption as some objects are too close to the cameras. 2. During the run-time, the depth of objects may change and cause stitching errors. 3. Ordinary CCD video cameras are usually equipped with a built-in automatic gain control (AGC) which automatically adjusts the brightness as well as the contrast. Different parts of the scene have different lighting conditions; hence different cameras (pointing at different parts of the scene) may have different gain values. This results in varying tone across the stitched images. This tone problem can be solved by using video cameras with computer-controllable exposure unit. Note that the exposure of all cameras should be adjusted in phase in order to maintain the tone consistency. 4. In capturing the video signals, the video merger shrinks the resolution by half in both width and height hence reduces the image quality. 5. The video codec degrades the video quality for compression. Setting a lower compression ratio can certainly improve the image quality, but with an increase of requested network bandwidth. 66 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 CHAPTER 6: Conclusion and Future Work I addressed the issue of implementing a real time seamless immersive display system based on video -panorama module that can stitch two images with sufficient overlapped webcams. Panoramic video stream can be generated on-the-fly without using specially designed video capture devices. By employing common CCD cameras, we reconstruct the remote environment on the immersive display. It does not depend on any parameters to be set by the user. I have shown that the resulting implementation by LSF method is fast and applicable. The outputs for the most cases are visually satisfactory with transitions across the image boundaries, although there still a need for further improvement. The current objective based on fitting artifacts works well for the small resolution (less than 640 x 480). Indeed, for a video flow of 15 fps, a real time video stitching based on a key frame every frames takes a computation time lower than 67ms, while it becomes slower for large resolution above (640 x 480). Further more there was a problem of exposure differences; from stronger light changes. The presence of parallax and distortion held unstable features correspondence with a minimum number of inliers from this implying the cameras and lens settings must be held fixed. The second main area of our future work is to develop prototype and algorithms to stitch overlapping images, even in the presence of parallax and exposure differences. We are also interested in methods to compensate for other artifacts seen in panoramic mosaics, such as variable color balance, glare, and lens flare. The final plan, I will extend my system to include 3D sound and remote control functions. This will be especially useful for industrial tele-immersive applications and usage in remote exploration. Currently, we use four cameras which cover around 150? FOV. I will increase the possible viewing direction by adding more cameras to the system. 67 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 APPENDIX A: Camera Placement Parallax is the difference in apparent direction of an object as seen from two different viewpoints. If two cameras see a 3D point with parallax less than half of the minimum angle subtended by a pixel, then the projected pixel coordinates will deviate less than half a pixel as if the cameras share a common optical center. This can be formulating as the following inequality: Ap2De/Ae In our current setup for outdoor scene, De is approximately 3 meters and Ae is 0.00356 radian. Hence the minimum distance D from the closest object should be at least 1.684 kilometers in order to keep the error within a pixel. 68 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 33: Parallax of two cameras 69 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 APPENDIX B: Software Development In this system I have implemented CFLTK Image Processing System using Fast Light Toolkit for GUI and integrated with C/C++ and Intel OpenCV Library for image processing. The following below is the structure of classes with their methods. Header Files , DirectShow.h: , Stitch.h: Window.h: , Source Files , DirectShow.cpp: , Main.cpp: , Stitch.cpp: , Window.cpp: 70 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 34: CFLTK Image Processing System 2010 1. Compute gradient using Sobel Operator Edge characterize boundary and are therefore a problem of fundamental importance image processing. Edges in images are areas with strong intensity contrast a jump in intensity from one pixel to the next. Edge detecting an image significantly reduces the amount of data and filters out useless information, while preserving the important structure properties of an image. There are many ways to perform edge detection. However the majority of different methods maybe grouped into two categories, gradient and laplacian. The gradient method detects the edge by looking for the maximum an minimum in the derivative of image. Laplacian methods search for zero crossing in the second derivative of the image to find edges. 71 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 35: Image gradients The number of masks used for edge detection is almost limitless. Researchers have used different techniques to derive masks and then experimented with them to discover more masks such as Kirsch, Prewitt, and Sobel masks. In this system I prefer to use Sobel operator as edge detector. Sobel operator based on 1D analysis, the theory can be carried over to two-dimensional as long as there is an accurate approximation to calculate the derivative of a two-dimensional image. Sobel perform a 2D spatial gradient measurement on at each point in an input grayscale image. The Sobel edge detector uses a pair of 3x3 convolution mask, one estimating the gradients in the x-direction and other estimating the gradient in y-direction. A convolution mask is usually much smaller than the actual image. As a result, the mask is slid over the image manipulate a square of pixels at a time. 72 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 Figure 36: Sobel Edge Detector void Stitch::compute_Gradient_Sobel(IplImage *img, CvMat* I_x, CvMat* I_y){ int width = img->width; int height = img->height; int i,j,i2,j2; double value_x, value_y; CvScalar curr_pixel; double sobel_x_data[3][3]; double sobel_y_data[3][3]; sobel_x_data[0][0] = -1; sobel_x_data[0][1] = 0; sobel_x_data[0][2] = 1; sobel_x_data[1][0] = -2; sobel_x_data[1][1] = 0; sobel_x_data[1][2] = 2; sobel_x_data[2][0] = -1; sobel_x_data[2][1] = 0; sobel_x_data[2][2] = 1; sobel_y_data[0][0] = 1; sobel_y_data[0][1] = 2; sobel_y_data[0][2] = 1; sobel_y_data[1][0] = 0; sobel_y_data[1][1] = 0; sobel_y_data[1][2] = 0; sobel_y_data[2][0] = -1; sobel_y_data[2][1] = -2; sobel_y_data[2][2] = -1; CvMat sobel_x = cvMat(3,3,CV_64FC1,sobel_x_data); CvMat sobel_y = cvMat(3,3,CV_64FC1,sobel_y_data); for(i=0; i= height || j+j2 < 0 || j+j2 >= width) continue; curr_pixel = cvGet2D(img,i+i2,j+j2); value_x += curr_pixel.val[0]*cvmGet(&sobel_x,i2+1,j2+1); value_y += curr_pixel.val[0]*cvmGet(&sobel_y,i2+1,j2+1); } 73 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 } // set Intensities for x and y directions cvmSet(I_x,i,j,(value_x)); cvmSet(I_y,i,j,(value_y)); } } } 2. Compute Corner Uniqueness int Stitch::compute_Corner_Uniqueness(CvPoint *corner, int num, double *corner_cost,CvPoint curr_point, double curr_cost) { int i,j; int idxnum = 0, newidx; int *idx; int isNeighbor = 0; // to record the neighborhood cornerpoint should be deleted idx = (int*) malloc(sizeof(int)* num); if(num == 0){ // the first point // add curr_point into queue corner[num] = cvPoint(curr_point.x, curr_point.y); corner_cost[num++] = curr_cost; }else{ // compare the curr_point with the points in queue for(i=0; i 0){ // delete the false alarm points corner[idx[0]] = cvPoint(curr_point.x, curr_point.y);; corner_cost[idx[0]] = curr_cost; // more than one false alarm points detected if(idxnum > 1){ // start from the 2nd point newidx = idx[1]; for(i=1; iheight; int width = img->width; int width_size; double g11,g12,g22; double corner_cost[MAX_CORNERPOINT_NUM]; double curr_cost; CvPoint curr_point; CvMat *G = cvCreateMat(2,2,CV_32FC1); CvMat *U = cvCreateMat(2,2,CV_32FC1); CvMat *V = cvCreateMat(2,2,CV_32FC1); CvMat *D = cvCreateMat(2,2,CV_32FC1); // set window size if(W_SIZE%2 == 0){ printf("error for window size\n"); return 0; }else width_size = (W_SIZE-1)/2; // compute the gradient I_x, I_y CvMat *I_x = cvCreateMat(height,width,CV_64FC1); CvMat *I_y = cvCreateMat(height,width,CV_64FC1); compute_Gradient_Sobel(img, I_x, I_y); double factor = 10000; // check each pixel exclude the boundary for(i=B_SIZE; i= height || j+j2 < 0 || j+j2 >= width) continue; g11 += pow(cvmGet(I_x,i+i2,j+j2),2.0)/factor; g12 += cvmGet(I_x,i+i2,j+j2)*cvmGet(I_y,i+i2,j+j2)/factor; g22 += pow(cvmGet(I_y,i+i2,j+j2),2.0)/factor; } } cvmSet(G,0,0,g11); cvmSet(G,0,1,g12); cvmSet(G,1,0,g12); cvmSet(G,1,1,g22); cvSVD(G, D, U, V, CV_SVD_U_T|CV_SVD_V_T); curr_cost = cvmGet(D,1,1); if(curr_cost > T_SMALLEST_EIG) num = compute_Corner_Uniqueness(corner, num, corner_cost, curr_point,curr_cost); if(num >= MAX_CORNERPOINT_NUM){ 77 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 printf("error. MAX_CORNERPOINT_NUM reached!"); return -1; } } cvReleaseMat(&G); cvReleaseMat(&U); cvReleaseMat(&V); cvReleaseMat(&D); cvReleaseMat(&I_x); cvReleaseMat(&I_y); return num; } 4. Compute Corner Matching from NCC The following I use the normalized form of cross correlation preferred for feature matching applications which does not have a simple frequency domain expression. Normalized cross correlation has been computed in the spatial domain for this reason. The correlation between two images is a standard approach to feature detection as well as a component of more sophisticated techniques. Unfortunately the normalized form of correlation (correlation coefficient) preferred in template matching does not have a correspondingly simple and efficient frequency domain expression. For this reason normalized cross-correlation I computed in the spatial domain. Due to the computational cost of spatial domain convolution, several inexact but fast spatial domain matching methods have also been developed. The use of cross-correlation for template matching is motivated by the distance measure (squared Euclidean distance) (53) (where f is the image and the sum is over x, y under the window containing the feature t 2positioned at u,v). In the expansion of d (54) 22t(x,u,y,v)f(x,y),,the term is constant. If the term is approximately constant 78 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 c(u,v),f(x,y)t(x,u,y,v),then the remaining cross-correlation term is a measure x,y of the similarity between the image and the feature. The correlation coefficient overcomes these difficulties by normalizing the image and feature vectors to unit length, yielding a cosine-like correlation coefficient (55) ,where is the mean of the feature and is the mean of f(x, y) in the region under the fu,v feature. We refer to as normalized cross-correlation. It is clear that normalized cross-correlation (NCC) is not the ideal approach to feature tracking since it is not invariant with respect to imaging scale, rotation, and perspective distortions. These limitations have been addressed in various schemes including some that incorporate NCC as a component. int Stitch::compute_CornerPointMatching_NCC(IplImage *img1, IplImage *img2, CvPoint *p1, int num1, CvPoint *p2, int num2, CvPoint2D64f *m1, CvPoint2D64f *m2){ ------------------------------- ------------------------------- for(i=0; i= height || cur_x+jj < 0|| cur_x+jj >=width) continue; intensity = cvGet2D(img1, cur_y+ii, cur_x+jj); mean1 += intensity.val[0]; intensity = cvGet2D(img2, match_y+ii, match_x+jj); 79 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 mean2 += intensity.val[0]; available_num++; } } mean1 /= available_num; mean2 /= available_num; v1 = 0; v2 = 0; v3 = 0; for(ii=-W_SIZE_MATCH; ii= height || cur_x+jj < 0|| cur_x+jj >=width) continue; intensity = cvGet2D(img1, cur_y+ii, cur_x+jj); tmp1 = intensity.val[0] - mean1; intensity = cvGet2D(img2, match_y+ii, match_x+jj); tmp2 = intensity.val[0] - mean2; v1 += tmp1*tmp2; v2 += pow(tmp1, 2.0); v3 += pow(tmp2, 2.0); } } cur_value = v1 / sqrt(v2*v3); if(cur_value > MAX_value){ // a better match MAX_value = cur_value; nccvalues[idx] = cur_value; m2[idx].x = (double)match_x; m2[idx].y = (double)match_y; matchedidx[idx] = j; } } check = 0; for(j=0; j sample_cnt){ // for a randomly chosen non-colinear correspondances iscolinear = true; 82 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 while(iscolinear == true){ iscolinear = false; for(i=0; i MAX_num || (numinlier == MAX_num && curr_dist_std width){ for(k=0;k<3;k++) wdata[i*wstep+j*wchannels+k]= idata1[i*step+j*channels+k]; } } } // backprojection for interpolation for(i=0;idata.db[0]=j; pi->data.db[1]=i; pi->data.db[2]=1; cvMatMul(H, pi, po); curpj = (int)(po->data.db[0]/po->data.db[2]); curpi = (int)(po->data.db[1]/po->data.db[2]); if(curpj>=image_1->width || curpj<0) continue; if(curpi>=hm || curpi<0) continue; for(k=0;k<3;k++){ wdata[i*wstep+j*wchannels+k] = idata2[curpi*step+curpj*channels+k]; } ------------------------------- ------------------------------- } } ------------------------------- ------------------------------- } 84 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 APPENDIX C: List of Publications and Achievements [1]. Kondela. E and Dong. J. H. Adaptive Seamless Video Mosaic using LSF Technique. World Computer Journal (IPCV’2010), pp 647 – 653. Las Vegas, Nevada, USA. [2]. Emmanuel. S. K. and Dong. J. H. Automatic Seamless Video Mosaic from Webcams using LSF Technique. Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on pp 17 – 24. San Francisco, CA, USA. [3]. Kondela. E. S. and Huang. D. J. (2011). A Real-Time Hand-fingertips Tracking System for Human- Computer Interaction. BIBLIOGRAPHY [4]. Milgram, D. L. (1975). Computer methods for creating photomosaics. IEEE Transactions on Slama, C. C., editor. (1980). Manual of Photogrammetry. American Society of Photogrammetry, Falls Church, Virginia, fourth edition. [5]. Triggs, B. et al.. (1999). Bundle adjustment — a modern synthesis. In International Workshop on Vision Algorithms, pages 298–372, Springer, Kerkyra, Greece. [6]. Uyttendaele, M., Eden, A., and Szeliski, R. (2001). Eliminating ghosting and exposure artifacts in image mosaics. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’2001), pages 509–516, Kauai, Hawaii. [7]. Szeliski, R. (1996). Video mosaics for virtual environments. IEEE Computer Graphics and Applications,16(2), 22–30. [8]. Mann, S. and Picard, R. W. (1994). Virtual bellows: Constructing high-quality images from video. In First IEEE International Conference on Image Processing (ICIP-94), pages 363–367, Austin. [9]. Szeliski, R. and Shum, H.-Y. (1997). Creating full view panoramic image mosaics and texture-mapped models. Computer Graphics (SIGGRAPH’97 Proceedings), , 251–258. [10]. Chen, S. E. (1995). QuickTime VR – an image-based approach to virtual environment navigation. Computer Graphics (SIGGRAPH’95), 29–38. [11]. Okutomi, M. and Kanade, T. (1993). A multiple baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4), 353–363. [12]. R. Szeliski, Image alignment and stitching: A tutorial, Tech.Report [13]. R. Hartley and A. Zisserman, Multiple view geometry in computer vision, Second ed., Cambridge University Press, 2003. [14]. D. Nister, Preemptive RANSAC for live structure and motion estimation, IEEE International Conference on Computer Vision (Nice, France), October 2003, pp. 199-206. [15]. Open source computer vision library home page; [16]. P. Perez, M. Gangnet, and A. Blake. Poisson image editing. Proceedings of SIGGRAPH 2003, pages 313– 318, July 2003. [17]. Snavely N., Seitz S.M., Szeliski R., Photo tourism: exploring photo collections in 3D,ACM Transactions on Graphics (TOG) July 2006,Volume 25 Issue 3 [18]. R. Szeliski. Image mosaicing for tele-reality applications. In IEEE Workshop on Applications of Computer Vision, pages 44–53, 1994. [19]. Kondela. E and Dong. J. H. Adaptive Seamless Video Mosaic using LSF Technique. World Computer Journal (IPCV’10), pp 647 – 653. Las Vegas, Nevada, USA. [20]. Conrad J. Poelman and Takeo Kanade, A Para-perspective Factorization Method for Shape and Motion Recovery, 1997, 19,97—108. [21]. Michitaka Hirose, Tetsuro Ogi, and Toshio Yamada. Integrating live video for immersive environments. IEEE MultiMedia, 6(3):14–22, July–September 1999. [22]. C. D. Kuglin and D. C. Hines. The phase correlation image alignment method. In Conference on Cybernetics and Society, pages 163–165, September 1975. [23]. A. Majumder, G. Meenakshisundaram, W. B. Seales, and H. Fuchs. Immersive teleconferencing: A new algorithm to generate seamless panoramic imagery. In Proceedings of ACM Multimedia, pages 169–178, November 1999. [24]. D. L. Milgram. Adaptive techniques for photo mosaicing. IEEE Transactions in Computers, C(26):1175– 1180, 1977. 85 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 [25]. Jane Mulligan and Kostas Daniilidis. View-independent scene acquisition for tele-presence. In Proceedings of International Symposium on Augmented Reality, October 2000. [26]. Shree K. Nayar. Omnidirectional video camera. In Proceedings of the 1997 DARPA Image Understanding Workshop, May 1997. [27]. Network Working Group. Real time streaming protocol (RTSP). Request for Comments 2326, April 1998. [28]. Panoram Technolgies, Inc. [29]. Kyoung Shin Park and Robert V. Kenyon. Effects of network characteristics on human performance in a collaborative virtual environment. In Proceedings of IEEE VR ‘99, pages 104–111, Houston, TX, March 1999. [30]. C. Partridge. Isochronous applications do not require jitter-controlled networks. Request for Comments 1257, September 1991. [31]. Shmuel Peleg and Joshua Herman. Panoramic mosaics by manifold projection. In Proceedings of Computer Vision and Pattern Recognition, pages 338–343, June 1997. [32]. Venkata N. Peri and Shree K. Nayar. Generation of perspective and panoramic video from omnidirectional video. In Proceedings of DARPA Image Understanding Workshop, pages 243–245, 1997. [33]. R. Raskar, M. S. Brown, R. Yang, W. C. Chen, G. Welch, H. Towles, B. Seales, and H. Fuchs. Multi- projector displays using camera-based registration. In Proceedings of IEEE Visualization 1999, pages 161– 168, San Fransisco, CA, October 24-29 1999. [34]. Ramesh Raskar. Immersive planar display using roughly aligned projectors. In Proceedings of IEEE Virtual Reality 2000, March 18–22 2000. [35]. Ramesh Raskar, Greg Welch, Matt Cutts, Adam Lake, Lev Stesin, and Henry Fuchs. The office of the future: A unified approach to image-based modeling and spatially immersive displays. In Proceedings of SIGGRAPH 98, pages 179–188, July 1998. [36]. B. S. Reddy and B. N. Chatterji. An FFT-based technique for translation, rotation, and scale-invariant image registration. IEEE Transactions on Image Processing, 5(8), 1996. [37]. Harpreet S. Sawhney, Steve Hsu, and Rakesh Kumar. Robust video mosaicing through topology inference and local to global alignment. In Proceedings of the European Conference on Computer Vision, 1998. [38]. Daniel R. Schikore, Richard A. Fischer, Randall Frank, Ross Gaunt, John Hobson, and Brad Whitlock. High- resolution multiprojector display wall. IEEE Computer Graphics and Applications, 20(4):38–44, July/August 2000. [39]. Prashant J. Shenoy, Pawan Goyal, and Harrick M. Vin. Issues in multimedia server design. Computing Surveys, 27(4):636–639, 1995. [40]. Heung-Yeung Shum and Li-Wei He. Rendering with concentric mosaics. In Proceedings of SIGGRAPH 99, pages 299–306, August 1999. [41]. Heung Yeung Shum and Richard Szeliski. Panoramic image mosaics. Technical Report MST-TR-97-23, Microsoft Re-search, 1997. [42]. Rahul Swaminathan and Shree K. Nayar. Polycameras: Camera clusters for wide angle imaging. Technical Report CUCS-012-99, Columbia University Technical Report, New York, 1999. [43]. Richard Szeliski. Image mosaicing for tele-reality applications. Technical Report CRL 94/2, DEC Cambridge Research Lab, May 1994. [44]. Richard Szeliski. Video mosaics for virtual environments. Virtual Reality, pages 22–30, March 1996. [45]. Richard Szeliski and Heung-Yeung Shum. Creating full view panoramic image mosaics and texture-mapped models. In Proceedings of SIGGRAPH 97, pages 251–258, August 1997. [46]. The PowerWall Team. Powerwall. [47]. Trimension Systems Ltd. [48]. R. Tsai. A Versatile Camera Calibration Technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras. IEEE Journal of Robotics and Automation, pages 323–344, 1987. [49]. Yalin Xiong and Ken Turkowski. Creating image-based VR using a self-calibrating fisheye lens. In Proceedings of CVPR, 1997. [50]. P. Dani and S. Chaudhuri. Automated assembling of images: Image montage preparation. In Pattern Recognition, volume 28, pages 431–445, March 1995. [51]. Alternate Realities Corporation. [52]. D. Anderson, S. Tzou, R.Wahbe, R. Govindan, and M. Andrews. Support for live digital audio and video. In Proceedingsof 10th International Conference on Distributed Computing Systems, pages 54–61, Paris, France, May 1990. [53]. J. Baldwin, A. Basu, and H. Zhang. Panoramic video with predictive windows for telepresence applications. In Proc. 1999 IEEE International Conference on Robotics and Automation, Detroit, U.S.A., May 1999. [54]. D. Browning, Cruz Neira, C. Sandin, and Tom DeFanti. The cave automatic virtual environment: Projection- based virtual environments and disability. In Proceedings of the First Annual International Conference, Virtual Reality and People with Disabilities, January 1993. [55]. Peter J. Burt and Edward H. Adelson. A multiresolution spline with application to image mosaics. ACM Transactions on Graphics, 2(4):217–236, 1983. 86 A Real-Time Seamless Immersive Display System Video Based-Panorama 8/20/2011 [56]. Carolina Cruz-Neira, Daniel J. Sandin, and Thomas A. DeFanti. Surround-screen projection-based virtual reality: The design and implementation of the CAVE. In Proceedings of SIGGRAPH 93, pages 135–142, August 1993. [57]. P. Dani and S. Chaudhuri. Automated assembling of images: Image montage preparation. In Pattern Recognition, volume 28, pages 431–445, March 1995. [58]. J. Davis. Mosaics of scenes with moving objects. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1998. [59]. Fakespace Systems. [60]. O. Faugeras. Three-Dimensional Computer Vision. MIT Press, 1993. [61]. Domenico Ferrari. Design and applications of a delay jitter control scheme for packet-switching internetworks. In Proceedings of the second International Conference on Network and Operating System Support for Digital Audio and Video, 1991. [62]. J. Foote and D. Kimber. FlyCam: Practical panoramic video and automatic camera control. In Proceedings of IEEE International Conference on Multimedia and Expo, volume 3, pages 1419–1422, 2000. [63]. Joshua Gluckman, Shree K. Nayar, and Keith J. Thoresz. Real-time omnidirectional and panoramic stereo. In Proceedings of the 198 DARPA Image Understanding Workshop, California, USA, November 1998. [64]. Manuel Guillen Gonzalez, Phil Holifield, and Martin Varley. Improved video mosaic construction by accumulated alignment error distribution. In Proceedings of the British Machine Vision Conference, 1998. [65]. Sevket G?um?us?tekin and RichardW. Hall. Mosaic image generation on a flattened gaussian sphere. In Proceedings of IEEE Workshop on Applications of Computer Vision, pages 50–55, 1996. [66]. Janne Heikkila and Olli Silven. A four-step camera calibration procedure with implicit image correction. In Proceedings of the Conference on Computer Vision and Pattern Recognition, 1997. [67]. Szeliski. R Avidan. S and Anandan. P (2000). Layer Extraction from multiple images containing reflections and transparency. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’2000). pp 246 – 253. Hilton Head Island. [68]. Fabri. R Costa. L. D. F. Torelli. J. C and Bruno. O. M. (2008). 2D Euclidean Distance Transform Algorithms: A comparative Survey. ACM Computing Surveys. 40(1). [69]. Engels. C. Stewenius. H and Nister. D. (2006). Bundle Adjustment Rules. In Photogrammetric Computer Vision. (PCV’06). Benin, Germany. [70]. Baudisch. P., Tan. D., Steedly. D., E. Uttendaele., Rudolph. E., Pal. C. and Szeliski. R. (2006). An Exploration of user interface designs for real-time panoramic photography. Australian Journal of Information Systems. 13(2). [71]. Emmanuel. S. K. and Dong. J. H. Automatic Seamless Video Mosaic from Webcams using LSF Technique. Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on pp 17 – 24. San Francisco, CA, USA. [72]. Szeliski. R., Winder. S and Uyttendaele. M and Steadly. D. (2008). Fast Poisson Blending using Multi Splines. Technical Report MSR-TR-2008-58. Microsoft Research. [73]. Bay. H., Ferrali. V. and Van Cool. L (2005). Wide-baseline stereo matching with line segments. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) . pp 329 – 336. San Diego, CA, USA. 87

                    本文档为【A Real-Time Seamless Immersive Display System Video Based Panorama】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

A Real-Time Seamless Immersive Display System Video Based Panorama

你可能还喜欢