ThesesThis page gives links to selected theses written by VASE Lab members.
- Position Sensing and Augmented
Reality by David Johnston (2001).
Abstract. One of the greatest challenges for the emerging discipline of Augmented Reality (AR) is solving the visual registration problem i.e., aligning virtual computer graphics accurately over the real world scene to provide the user with a usefully "augmented" interactive experience. To achieve this goal the position and orientation of the user's head must be measured with high accuracy and low latency. There is no general purpose technology that works outdoors or indoors (with unlimited range) to accomplish this.
Addressing the application of reconstructing ancient Roman buildings in situ which once existed at a now green field archaeological park in Colchester, an AR system has been developed. This consists of an optical look-through stereoscopic VR headset and a wearable computer, with GPS and computer vision being used for position determination. The Visual Positioning System (VPS) finds and identifies specially designed targets within the scene to work out camera location. The targets have unique signatures when the image is turned into a Region Adjacency Graph (RAG) resulting in robustness and reliability (no false positives). The GPS system uses two receivers: one on the archaeological tourist and one at a base station in order to perform differential processing for greater positional accuracy. Initial evaluations were made of the ADXL202 "accelerometer on a chip" and of the LAMBDA method for high accuracy GPS positioning, with a view towards a multi-component hybrid solution for the registration problem. Currently, the VPS can work unassisted indoors or automatically initialise the registration for use of the lower accuracy GPS outdoors. The systems developed are inexpensive: the VPS uses laser-printed targets and a commodity PC for the visual processing, while the GPS is based on two inexpensive Garmin G12 receivers. The software developed in-house has been put into the public domain.
- Periscopic Stereo and Large-Scale
Scene Reconstruction by Eddie Moxey (2002).
Abstract. The capture of three dimensional structure from two dimensional images has received considerable attention in computer vision. Existing work has concentrated on use of stereo camera systems and the reconstruction of small objects. Recently, single cameras in motion have been used to capture sections of scenery which are subsequently reconstructed by skilled technicians with a selection of computer vision and graphical modelling tools. However, large-scale, automated, reconstruction of scenery is limited by the "where to look next" problem. A number of imaging systems have been proposed to solve this problem but none have been realized. Periscopic stereo is a novel concept which implements stereo imaging using a single camera. A rotating mirror scans the horizon while a fixed relative geometry is maintained between the virtual stereo cameras.
This dissertation presents, for the first time, a practical design for a periscopic stereo head and investigates the computer vision tools necessary for 3D reconstruction from periscopic image data. It identifies two possibilities for processing periscopic image data. "Corrected," where a two dimensional rotation is applied to the image plane prior to standard stereo processing, or "uncorrected" which ignores the "tumbling" effect inherent in periscopic image data until the final stage of reconstruction, where the "late" correction circumvents the problem, apparent in many existing stereo algorithms, of resolving disparity measurement in imaged scene structure which is parallel with corresponding epipolar lines.
Many of the existing stereo processing tools used in the course of this research require little modification, but have all revealed issues requiring resolution not immediately apparent in previous treatments. This investigation stops short of the actual construction of 3D models but presents a method of generating the sets of depth data required for large-scale scene reconstruction. Feature extraction, image data correspondence, camera calibration and the generation of depth information from periscopic image data are all covered in the context of this dissertation. In particular a new method of combining existing camera calibration techniques, termed "calibration in a box", is presented together with conclusions regarding the tools and techniques employed.
While periscopic stereo is still in development, it is the only imaging system reported to date which is likely to be capable of large-scale, autonomous, 3D scene reconstruction, with particular application to remote operation in hazardous environments.
- Systems and Services for Wearable
Computers by Neill Newman (2002).
Abstract. The use of both portable computing and mobile communication has increased dramatically in the last few years. Mobile devices combining computing and communications are now being explored and there is competition between manufacturers to provide more features and push the technology.
Integrating an increasing number of features into a small package creates additional problems to those of mobile operation. Contextual considerations such as the location and activity of the user become relevant to the interaction between the human and computer. Therefore a mobile computer should be able to perceive the environment and adjust the presentation of information automatically.
The aims of this thesis are to analyse the capabilities of some mobile interaction devices; to design a user interface system which takes into account these capabilities; and to integrate this user interface with a software framework which enables the machine to perceive the environment and react accordingly.
The thesis starts by specifying and detailing the construction of a mobile platform for the remainder of this work. This wearable computer consists of a small PC with a head-mounted display and a commercial portable keyboard called a Twiddler. A study investigates these interaction devices and contrasts the Twiddler and various head-mounted displays with a standard keyboard and mouse. The results show that it is possible to design a user interface which can increase the speed and accuracy of use of the Twiddler and head-mounted display devices, but generally they perform poorly in comparison to the normal desktop devices. There are also indications of increased fatigue and user frustration when using these mobile interaction devices.
The observations of the results from the study show that the current desktop user interfaces are not as efficient in a mobile situation as they are in a desktop situation. This has prompted the author to investigate alternative user interface systems for mobile computing. A prototype software architecture called Sulawesi is presented which attempts to address the lack of an alternative user interface research platform. Sulawesi has been designed to encompasses contextual awareness, agent-based systems, and multi-modal user interfaces into a single development framework.
The software architecture allows physical and hybrid sensors and rendering mechanisms to be abstracted from applications, and a management layer allows communication between these subsystems. The user can command the system via constrained natural language statements. Speech recognition or textual input allow the user to command the machine, and a dedicated user interface, tailored to the head-mounted display, or speech rendition are used for output.
Also, a system which allows dedicated agents to process information from a sensor, and to affect the rendition of information to the user, have been incorporated into Sulawesi. A prototype agent which uses contextual information to affect how the information is displayed to the user is also described in this work, along with several novel mobile applications that make use of contextual information.
- Architectures for Untethered
Augmented Reality Using Wearable Computers by Panagiotis
Abstract. One of the most interesting fields of computer technology is that of Virtual Reality. People are fascinated by being immersed in three-dimensional, computer-synthesised virtual worlds. There are many example applications such as interactive visualisation and representation for entertainment and education, modelling of construction, manufacturing and maintenance processes, architecture, medicine, annotation and simulations for training. One step further is the notion of augmented reality (AR) where, unlike virtual reality where the user sees only virtual worlds, he or she can see both the real and the virtual at the same time.
One of the potential applications of augmented reality is the 3D reconstruction of archaeological sites in situ, where the user can be immersed while maintaining a composite view of the real and the virtual surroundings. By using an untethered, mobile, body-worn computer with a see-through head-mounted display and equipped with a location and orientation sensors the user can roam in the composite world as if the scene was entirely real.
The research effort described here concerns the construction of such an AR application, centred around the Roman remains in the Gosbecks Archaeological Park on the outskirts of Colchester. Two generations of wearable computers have been implemented. The first, similar to earlier prototypes, provided a test-bed for initial, in-the-field tests, in order to prove the concept and gain practical experience. As anticipated, this prototype provided inadequate performance; however the lessons learned influenced the design of a second, more mature platform. The second wearable, designed and built on the experience gained, is a novel prototype with improved processing power and ergonomics, low power consumption and low cost. The prototypes use GPS, the Global Positioning System, for measuring location and a magnetic sensor integrated into the head-mounted display for determining the wearer's direction of gaze.
A novel wearable AR framework was developed to work with both wearables but primarily to exploit the hardware rendering capabilities of the second prototype. The application software was written in C using the OpenGL graphics library for 3D rendering. The framework encompasses optimisation techniques such as view frustum culling and levels of detail in order to improve rendering speed.
The second wearable computer and the framework were used for fairly extensive field testing, in order to determine the accuracy and stability of the position and orientation sensing mechanisms. In addition, the system was assessed in-the-field by users by means of a novel, questionnaire-based user assessment. The assessment investigated the usability of the second wearable running the wearable AR framework, exploring its ergonomics, visual output quality, positional accuracy and stability as well as the sense of presence and overall impression.
The ultimate aim of this research was to ascertain whether wearable AR can be delivered in a form that can be used by the general public for in situ guides for archaeological sites at a reasonable cost. The findings show that functionality is achievable, though the cost is higher, and the location accuracy lower, than desired.
- Towards the Automatic Construction
of Machine Vision Systems using Genetic Programming by Olly
Abstract. Computer vision is a topic that has interested researchers and commercial organisations alike for some time: it provides both a considerable intellectual challenge, and a wide variety of useful applications, some of which are now becoming ubiquitous. In this thesis the author has studied means by which vision software may be constructed automatically using Genetic Programming (GP) — a technique that learns how to write programs during a simulation of Darwinian evolution. This research addresses the question of how one might create more "complete" vision systems using GP, beyond simply proving the applicability of evolutionary learning to particular image processing tasks. Research into making Genetic Programming more suitable for deployment as a generic learning tool is presented, evaluated and assessed, and novel means by which multi-stage vision systems can be constructed from evolved components is described. The author does not claim to have invented significant new paradigms in either GP or mainstream computer vision — rather, the focus is on bridging the gap between task-specific applications and a generic learning framework. An architecture for creating such applications is presented, along with software that permits non-expert users to create vision systems rapidly of a complexity first equal to, then beyond that so-far published by GP researchers.
Olly's thesis won the BMVA's Sullivan thesis prize in 2010, awarded to the best thesis examined during the calendar year 2009. You're welcome to have a go with Jasmine, Olly's GP framework and graphical front-end. It will work on any system that supports a reasonably up-to-date Java installation.
- Objective Methods of Evaluating
Colour Image Segmentation by Hassan Almuhairi (2010).
Abstract. Image segmentation constitutes an important step in automatic object recognition. However, there is still no agreement on a mathematical model that can represent the segmentation process. As a result, a large variety of segmentation algorithms have been introduced into the image processing literature.
The lack of a single standard segmentation solution has led to further research that provides different evaluation models, frameworks and a small variety of image data-sets for testing purposes. A researcher again faces a dilemma of choice: it is hard to decide on a standard evaluation solution to choose an appropriate algorithm. If the goal is to extensively test and evaluate the segmentation algorithms, the task reported in this thesis, then there are: many segmentation algorithms; a variety of input images; different input parameters; and diverse evaluation methods. Consequently, this evaluation task can prove to be computationally highly demanding and it may prove hard to approach an optimal solution.
To help research in this field, the author has firstly investigated the current range of segmentation algorithms, with the aim of composing a 'navigation map' of the available algorithms and evaluation methods. The thesis also proposes an evaluation methodology.
The research specifically involved using and customising a scripted evaluation framework that performed real-time colour image segmentation and as a result provided an objective assessment of the best-quality image segmentation algorithm for a given application. To enhance the computational performance of the framework: firstly, a cluster computer was used to enhance throughput; and secondly, a genetic algorithm module was added to the evaluation process to improve the evaluation's search efficiency. Furthermore, the introduction of a time-factor into the genetic algorithm proved to be beneficial in a variety of ways explored in the thesis.
Hassan's work was awarded a Young Emirati Researchers Prize in early 2012.
- Improving the Effectiveness of
Local Feature Detection by Shoaib Ehsan (2012).
Abstract. The last few years have seen the emergence of sophisticated computer vision systems that target complex real-world problems. Although several factors have contributed to this success, the ability to solve the image correspondence problem by utilizing local image features in the presence of various image transformations has made a major contribution. The trend of solving the image correspondence problem by a three-stage system that comprises detection, description, and matching is prevalent in most vision systems today. This thesis concentrates on improving the local feature detection part of this three-stage pipeline, generally targeting the image correspondence problem. The thesis presents offline and online performance metrics that reflect real-world performance for local feature detectors and shows how they can be utilized for building more effective vision systems, confirming in a statistically meaningful way that these metrics work. It then shows how knowledge of individual feature detectors’ functions allows them to be combined and made into an integral part of a robust vision system. Several improvements to feature detectors’ performance in terms of matching accuracy and speed of execution are presented. Finally, the thesis demonstrates how resource-efficient architectures can be designed for local feature detection methods, especially for embedded vision applications.
Shoaib's thesis won the BMVA's Sullivan thesis prize in 2013, awarded to the best thesis examined during the calendar year 2012.
- Low-Level Image Features and
Navigation Systems for Visually Impaired People by Nadia
Abstract. This thesis is concerned with the development of a computer-aided autonomous navigation system for a visually-impaired person. The system is intended to work in both indoor and outdoor locations and is based around the use of camera systems and computer vision.
Following a review of the literature to identify previous work in navigation systems for the blind, the location of accurate image features is shown to be a vital importance for a vision based navigation system. There are many operators that identify image features and it is shown that existing methods for identifying which has the best performance are inconsistent. A statistically valid evaluation and comparison methodology is established, centered around the use of McNemar's test and ANOVA.
It is shown that these statistical tests require a larger number of test images than is commonly used in the literature to establish which feature operators perform best. A ranking of feature operators is produced based on this rigorous statistical approach and compared with similar rankings in the literature.
Corner detectors are especially useful for a navigation system because they identify the boundaries of obstacles. However, the results from our testing suggest that the internal angle of a corner is one factor in determining whether a corner is detected correctly. Hence an in-depth study of angular sensitivity of corners is presented. This leads to the development of a pair of descriptors, known as CMIE and AMIE, which describe corners. Experiments show that these descriptors are able to be computed at video rate and are effective at matching corners in successive frames of video sequences.
Finally, a complete navigation system is presented. This makes use of both a conventional colour camera and a depth sensor combined in a device known as the Microsoft Kinect. It is shown that the system performs robustly in both indoor and outdoor environments, giving audio feedback to the user when an obstacle is detected. Audio instructions for obstacle avoidance are also given. Testing of the system by both blindfolded and blind users demonstrates its effectiveness.
- User Tracking Methods for
Augmented Reality Applications in Cultural Heritage by Erkan
Abstract. Augmented Reality provides an entertaining means for displaying 3D reconstructions of ancient buildings in situ for cultural heritage. Finding the pose, position and orientation, of the user is crucial for such applications since this information will be used to define the viewpoint that will be used for rendering the models. Images acquired from a camera can be used as the background for such augmentations. To make the most out of this available information, these images can also be utilized to find a pose estimate.
This thesis presents contributions for vision-based methods for estimating the pose of the user in both indoor and outdoor environments. First an evaluation of different feature detectors is presented, making use of spatial statistics to analyse the distribution of the features across the image, a property that is shown to affect the accuracy of the homography calculated from these features.
An analysis of various filtering methods used for tracking was performed and an implementation of a SLAM system is presented. Due to several problems faced with this implementation, there is insufficient tracking accuracy due to linearity problems. An alternative, keyframe-based tracking algorithm is presented.
Continuing with vision-based approaches, Kinect sensor was also used to find the pose of a user for in situ augmentations making use of the natural features in the environment. Skeleton-tracking was also found to be beneficial for such applications.
The thesis then investigates combining the vision-based estimates with measurements from other sensors, GPS and IMU, in order to improve the tracking accuracy in outdoor environments. The idea of using multiple models was investigated using a novel fuzzy rule-based approach to decide on the model that results in improved accuracy and faster convergence for the fusion filter.
Finally, several AR applications are presented that make use of these methods. The first one is for in situ augmentation for displaying historical columns and augmenting users, the second is a virtual visit to an ancient building and the third is a game which can also be played inside the augmentation of the building in the second application.
- Agricultural Produce Grading by
Computer Vision Based on Genetic Programming by Panitnat
Abstract. An objective of computer vision is to imitate the ability of the human visual system. Computer vision has been put forward to produce a wide range of applications. Most vision software does not proceed alone; machine learning is usually involved in many vision systems. Some vision systems are developed to replace human working because they operate more reliably, precisely and speedily, and some tasks are dangerous for humans.
This thesis presents contributions to extend a vision system based on genetic programming to solve classification problems. Instances in the field of agricultural produce are employed to verify the system performance. A new method is proposed to determine the shape and appearance of reconstructed 3D objects. The reconstruction is based on using 2D images taken by a few cameras in arbitrary positions. Furthermore, new techniques are presented to extract properties of 3D objects; morphological, coloured and textural features.
New techniques are proposed to incorporate new features and new classes of samples into a GP classifier. For the former, the new feature is accommodated into an existing solution by mutation. For the latter, as generating a multi-class classifier is based on a binary decomposition approach, a binary classifier of the new class is produced and executed before the series of the original binary classifiers. Both cases are intended to be done with less computation than evolving a new classifier from scratch.