This page gives links to selected theses written by VASE Lab
members.
- Objective Methods of
Evaluating Colour Image Segmentation by Hassan
Almuhairi (2010).
Abstract. Image segmentation constitutes an important step
in automatic object recognition. However, there is still no agreement
on a mathematical model that can represent the segmentation
process. As a result, a large variety of segmentation algorithms have
been introduced into the image processing literature.
The lack of a single standard segmentation solution has led to
further research that provides different evaluation models, frameworks
and a small variety of image data-sets for testing purposes. A
researcher again faces a dilemma of choice: it is hard to decide on a
standard evaluation solution to choose an appropriate algorithm. If
the goal is to extensively test and evaluate the segmentation
algorithms, the task reported in this thesis, then there are: many
segmentation algorithms; a variety of input images; different input
parameters; and diverse evaluation methods. Consequently, this
evaluation task can prove to be computationally highly demanding and
it may prove hard to approach an optimal solution.
To help research in this field, the author has firstly investigated
the current range of segmentation algorithms, with the aim of
composing a 'navigation map' of the available algorithms and
evaluation methods. The thesis also proposes an evaluation
methodology.
The research specifically involved using and customising a scripted
evaluation framework that performed real-time colour image
segmentation and as a result provided an objective assessment of the
best-quality image segmentation algorithm for a given application. To
enhance the computational performance of the framework: firstly, a
cluster computer was used to enhance throughput; and secondly, a
genetic algorithm module was added to the evaluation process to
improve the evaluation's search efficiency. Furthermore, the
introduction of a time-factor into the genetic algorithm proved to be
beneficial in a variety of ways explored in the thesis.
Hassan's work was awarded a Young Emirati Researchers Prize in
early 2012.
- Position Sensing and
Augmented Reality by David Johnston (2001).
Abstract. One of the greatest challenges for the emerging
discipline of Augmented Reality (AR) is solving the visual
registration problem i.e., aligning virtual computer graphics
accurately over the real world scene to provide the user with a
usefully "augmented" interactive experience. To achieve this goal the
position and orientation of the user's head must be measured with high
accuracy and low latency. There is no general purpose technology that
works outdoors or indoors (with unlimited range) to accomplish
this.
Addressing the application of reconstructing ancient Roman
buildings in situ which once existed at a now green field
archaeological park in Colchester, an AR system has been
developed. This consists of an optical look-through stereoscopic VR
headset and a wearable computer, with GPS and computer vision being
used for position determination. The Visual Positioning System (VPS)
finds and identifies specially designed targets within the scene to
work out camera location. The targets have unique signatures when the
image is turned into a Region Adjacency Graph (RAG) resulting in
robustness and reliability (no false positives). The GPS system uses
two receivers: one on the archaeological tourist and one at a base
station in order to perform differential processing for greater
positional accuracy. Initial evaluations were made of the ADXL202
"accelerometer on a chip" and of the LAMBDA method for high accuracy
GPS positioning, with a view towards a multi-component hybrid solution
for the registration problem. Currently, the VPS can work unassisted
indoors or automatically initialise the registration for use of the
lower accuracy GPS outdoors. The systems developed are inexpensive:
the VPS uses laser-printed targets and a commodity PC for the visual
processing, while the GPS is based on two inexpensive Garmin G12
receivers. The software developed in-house has been put into the
public domain.
- Periscopic Stereo and
Large-Scale Scene Reconstruction by Eddie Moxey
(2002).
Abstract. The capture of three dimensional structure from
two dimensional images has received considerable attention in computer
vision. Existing work has concentrated on use of stereo camera systems
and the reconstruction of small objects. Recently, single cameras in
motion have been used to capture sections of scenery which are
subsequently reconstructed by skilled technicians with a selection of
computer vision and graphical modelling tools. However, large-scale,
automated, reconstruction of scenery is limited by the "where to look
next" problem. A number of imaging systems have been proposed to solve
this problem but none have been realized. Periscopic stereo is a novel
concept which implements stereo imaging using a single camera. A
rotating mirror scans the horizon while a fixed relative geometry is
maintained between the virtual stereo cameras.
This dissertation presents, for the first time, a practical design for
a periscopic stereo head and investigates the computer vision tools
necessary for 3D reconstruction from periscopic image data. It
identifies two possibilities for processing periscopic image
data. "Corrected," where a two dimensional rotation is applied to
the image plane prior to standard stereo processing, or "uncorrected"
which ignores the "tumbling" effect inherent in periscopic image data
until the final stage of reconstruction, where the "late" correction
circumvents the problem, apparent in many existing stereo algorithms,
of resolving disparity measurement in imaged scene structure which is
parallel with corresponding epipolar lines.
Many of the existing stereo processing tools used in the course of
this research require little modification, but have all revealed
issues requiring resolution not immediately apparent in previous
treatments. This investigation stops short of the actual
construction of 3D models but presents a method of generating the sets
of depth data required for large-scale scene reconstruction. Feature
extraction, image data correspondence, camera calibration and the
generation of depth information from periscopic image data are all
covered in the context of this dissertation. In particular a new
method of combining existing camera calibration techniques, termed
"calibration in a box", is presented together with conclusions
regarding the tools and techniques employed.
While periscopic stereo is still in development, it is the only
imaging system reported to date which is likely to be capable of
large-scale, autonomous, 3D scene reconstruction, with particular
application to remote operation in hazardous environments.
- Systems and Services for
Wearable Computers by Neill Newman (2002).
Abstract. The use of both portable computing and mobile
communication has increased dramatically in the last few years.
Mobile devices combining computing and communications are now being
explored and there is competition between manufacturers to provide
more features and push the technology.
Integrating an increasing number of features into a small package
creates additional problems to those of mobile operation. Contextual
considerations such as the location and activity of the user become
relevant to the interaction between the human and computer. Therefore
a mobile computer should be able to perceive the environment and
adjust the presentation of information automatically.
The aims of this thesis are to analyse the capabilities of some mobile
interaction devices; to design a user interface system which takes
into account these capabilities; and to integrate this user interface
with a software framework which enables the machine to perceive the
environment and react accordingly.
The thesis starts by specifying and detailing the construction of a
mobile platform for the remainder of this work. This wearable
computer consists of a small PC with a head-mounted display and a
commercial portable keyboard called a Twiddler. A study investigates
these interaction devices and contrasts the Twiddler and various
head-mounted displays with a standard keyboard and mouse. The results
show that it is possible to design a user interface which can increase
the speed and accuracy of use of the Twiddler and head-mounted display
devices, but generally they perform poorly in comparison to the normal
desktop devices. There are also indications of increased fatigue and
user frustration when using these mobile interaction devices.
The observations of the results from the study show that the current
desktop user interfaces are not as efficient in a mobile situation as
they are in a desktop situation. This has prompted the author to
investigate alternative user interface systems for mobile computing.
A prototype software architecture called Sulawesi is presented which
attempts to address the lack of an alternative user interface research
platform. Sulawesi has been designed to encompasses contextual
awareness, agent-based systems, and multi-modal user interfaces into a
single development framework.
The software architecture allows physical and hybrid sensors and
rendering mechanisms to be abstracted from applications, and a
management layer allows communication between these subsystems. The
user can command the system via constrained natural language
statements. Speech recognition or textual input allow the user to
command the machine, and a dedicated user interface, tailored to the
head-mounted display, or speech rendition are used for output.
Also, a system which allows dedicated agents to process information
from a sensor, and to affect the rendition of information to the user,
have been incorporated into Sulawesi. A prototype agent which uses
contextual information to affect how the information is displayed to
the user is also described in this work, along with several novel
mobile applications that make use of contextual information.
- Towards the Automatic
Construction of Machine Vision Systems using Genetic
Programming by Olly Oechsle (2009).
Abstract. Computer vision is a topic that has interested
researchers and commercial organisations alike for some time: it
provides both a considerable intellectual challenge, and a wide
variety of useful applications, some of which are now becoming
ubiquitous. In this thesis the author has studied means by which
vision software may be constructed automatically using Genetic
Programming (GP) — a technique that learns how to write programs
during a simulation of Darwinian evolution. This research addresses
the question of how one might create more "complete" vision systems
using GP, beyond simply proving the applicability of evolutionary
learning to particular image processing tasks. Research into making
Genetic Programming more suitable for deployment as a generic learning
tool is presented, evaluated and assessed, and novel means by which
multi-stage vision systems can be constructed from evolved components
is described. The author does not claim to have invented significant
new paradigms in either GP or mainstream computer vision —
rather, the focus is on bridging the gap between task-specific
applications and a generic learning framework. An architecture for
creating such applications is presented, along with software that
permits non-expert users to create vision systems rapidly of a
complexity first equal to, then beyond that so-far published by GP
researchers.
Olly's thesis won the BMVA's Sullivan thesis prize in
2010, awarded to the best thesis examined during the calendar year
2009. You're welcome to have a go with Jasmine, Olly's GP
framework and graphical front-end. It will work on any system that
supports a reasonably up-to-date Java installation.
- Evolutionary
Personal Information Filtering Combating information overload caused
by habitual Web surfing using an evolutionary personal content-based
recommender system by John Pagonis (2009).
Abstract. This thesis examines the construction of an
evolutionary personal content recommender system and the genetic
algorithm classifier used in it. The author presents an evolutionary
personal Web content recommendation system that works with implicit
user feedback only. At the core of this recommender lies a novel
genetic algorithm based document ranking categorisation technique
named Engene that dispenses with explicit user feedback during
training and operation. Engene is a one-class soft document
classifier. The approach taken allows a genetic algorithm to evolve
better filters without needing a user's constant attention during
evolution. In other GA-based systems user attention is necessary
during evolution in order to provide feedback on the system's
performance for training and operation of the classifier. The Engene
filtering engine operates unattended. To the author's best knowledge
there is no other genetic algorithm document ranking categorisation
technique that does so. The thesis gives a critique of this genetic
algorithm classifier and of the shortcomings of GA-based text
classification that lead to Engene. An analysis of how competitive is
the aforementioned genetic algorithm text filtering technique as a
stand-alone batch classifier is also given. Additionally the author
investigates human factors such as how users use a personal digest and
what is the layout of choice which the prefer. Based on the same user
study, this thesis offers an analysis of how implicit user feedback of
a Web based recommender can be mapped to operations that can drive
Engene and machine learning in general.
- Architectures
for Untethered Augmented Reality Using Wearable Computers by
Panagiotis Ritsos (2006).
Abstract. One of the most interesting fields of computer
technology is that of Virtual Reality. People are fascinated by being
immersed in three-dimensional, computer-synthesised virtual
worlds. There are many example applications such as interactive
visualisation and representation for entertainment and education,
modelling of construction, manufacturing and maintenance processes,
architecture, medicine, annotation and simulations for training. One
step further is the notion of augmented reality (AR) where, unlike
virtual reality where the user sees only virtual worlds, he or she can
see both the real and the virtual at the same time.
One of the potential applications of augmented reality is the 3D
reconstruction of archaeological sites in situ, where the
user can be immersed while maintaining a composite view of the real
and the virtual surroundings. By using an untethered, mobile,
body-worn computer with a see-through head-mounted display and
equipped with a location and orientation sensors the user can roam in
the composite world as if the scene was entirely real.
The research effort described here concerns the construction of
such an AR application, centred around the Roman remains in the
Gosbecks Archaeological Park on the outskirts of Colchester. Two
generations of wearable computers have been implemented. The first,
similar to earlier prototypes, provided a test-bed for initial,
in-the-field tests, in order to prove the concept and gain practical
experience. As anticipated, this prototype provided inadequate
performance; however the lessons learned influenced the design of a
second, more mature platform. The second wearable, designed and
built on the experience gained, is a novel prototype with improved
processing power and ergonomics, low power consumption and low
cost. The prototypes use GPS, the Global Positioning System, for
measuring location and a magnetic sensor integrated into the
head-mounted display for determining the wearer's direction of
gaze.
A novel wearable AR framework was developed to work with both
wearables but primarily to exploit the hardware rendering capabilities
of the second prototype. The application software was written in C
using the OpenGL graphics library for 3D rendering. The framework
encompasses optimisation techniques such as view frustum culling and
levels of detail in order to improve rendering speed.
The second wearable computer and the framework were used for fairly
extensive field testing, in order to determine the accuracy and
stability of the position and orientation sensing mechanisms. In
addition, the system was assessed in-the-field by users by means of a
novel, questionnaire-based user assessment. The assessment
investigated the usability of the second wearable running the wearable
AR framework, exploring its ergonomics, visual output quality,
positional accuracy and stability as well as the sense of presence and
overall impression.
The ultimate aim of this research was to ascertain whether wearable
AR can be delivered in a form that can be used by the general public
for in situ guides for archaeological sites at a reasonable
cost. The findings show that functionality is achievable, though the
cost is higher, and the location accuracy lower, than desired.