About your lecturer | Lectures & labs | Notes etc | Accessibility | Lab scripts | Progress tests | Installing OpenCV | Further reading
This module explores the algorithms and software that lets a computer 'understand' the content of images and videos. This is far from being a solved problem and research into it is extraordinarily active, with the UK contributing some of the most important results globally. The development of the discipline is so rapid that about half the content of the module couldn't have been taught ten years ago. Given what it is trying to do, you might think that vision is an area of artificial intelligence, placing it squarely in the realm of computer science. While this is true it is far from the whole story, with researchers also having backgrounds in disciplines such as electronic engineering, mathematics, physics, psychology and medicine. The contributions of the various disciplines will become more apparent as you learn about vision techniques.
In fact, this is not one module but two, for it is delivered simultaneously to final-year undergraduate students (CE316) and to postgraduate students (CE866). Laboratories are also joint between CE316 and CE866 but the examinations for the two modules are different.
The last decade or so has seen three major technological influences on computer vision. Firstly, digital imaging has transformed the capture of image and video data from something that pushed at the boundaries of real-time hardware and software into an everyday process. The second influence is cheap data storage and processing, allowing large quantities of image and video data to be stored and manipulated in reasonable timescales. The final influence, a consequence of having more CPU cycles to burn, is an expansion in the use of machine learning and an improvement in the learning algorithms available.
Machine learning has actually been employed in real-world vision systems for a long time, though until recently this module stopped short of discussing it. However, the last few years have seen machine learning become much more prevalent in vision techniques, so there is an introduction to it in the second half of the module. It won't turn you into an expert in machine learning but it should give you an appreciation of what it is — and sometimes isn't — capable of doing.
Your friendly neighbourhood lecturer, Adrian Clark, has been researching digital image analysis and computer vision all his professional life. His PhD was in processing digital imagery from electron microscopes and his postdoctoral work was some of the earliest to explore the use of parallel computers for image processing (Google "ICL DAP"). He worked in industry, developing robust, real-time vision systems before joining Essex. Although he also researches virtual and augmented reality (the underlying maths is essentially the same as for vision), he has three major research interests:
Using genetic programming, a form of machine learning, to build computer vision systems from components. We have had some spectacular successes with this approach and I'll show you some of them towards the end of the module.
Reconstructing 3D models from images, captured either by humans or robots. We have reconstructed archaeological finds from photographs taken shortly after they were uncovered, recording their appearance in fine detail before any conservation has taken place. This was done most significantly for the Fenwick Treasure, an important discovery made as Colchester's department store was being extended. We have also been very successful in building 3D models of coral reefs in Indonesia and the Caribbean from imagery acquired by Jon Chamberlain, another academic in CSEE, using a rapid capture system that he and I designed. The 3D models we reconstruct are good enough to print (I have a colour 3D printer in my research lab), and you will see this work when we consider vision in a 3D world in the second half of the module.
Inspired by the problems encountered when developing vision systems in industry, I am a fervent evangelist of statistical approaches for evaluating and comparing the performance of vision systems. The material in lectures and laboratories on this topic is right at the forefront of research, and you'll evaluate systems during many of the laboratory experiments.
Lectures | Thursday 09:00–11:00, weeks 16–25 | Ivor Crewe Hall A |
---|---|---|
Laboratories | Monday 09:00–11:00 or 16:00–18:00, weeks 17–25 | CSEE Lab 7 |
The lectures are live events. In them, I explain important techniques used in computer vision: you'll see how they work, how they're programmed and what they're used for; and you'll be able to ask me questions if there are things I haven't made clear. Unlike many academics, I do not simply drone on while presenting a set of overheads; instead, I write and sketch to explain things so you can see how ideas are transformed into algorithms.
Each lecture has a comprehensive set of support material:
As the discipline is moving so quickly, no textbook that you can buy is really able to keep up, so these are the principal source material for you to work from. The notes are available as a single PDF file rather than separate chunks to make it easier for you to see how some ideas and principles underlie several techniques.
Each time you do one of the quizzes, you'll have a random selection of ten questions from a question bank and the order of the choices will be different. They are a little easier than the questions you'll experience in the formal progress tests discussed below but are a good way of checking that you understand things.
Even though I provide the much of the formal content in the notes and lectures, that does not mean that you shouldn't make notes yourself. The notes deliberately have a wide margin to make it easy for you to add your own notes alongside the text.
The laboratories are where you put the theory explained during lectures into practice by working through a series of exercises. It is essential that you record what you are doing and what you find during them because there are two progress tests during the module and they are specifically on the labs. They are open-book tests, so you are welcome to use any books etc that you like — especially your records of the experiments. (I'll go over this process during the first lecture in case it isn't clear from this explanation.)
The laboratories expect you to write code in Python. This is by far the most widely-used language in computer vision research and development, and AI in general, mostly because its write–compile–run cycle is so short. All of the laboratories should be fine under Linux, Windows or macOS, so you should be able to work in an environment you are already familiar with — though you are expected to be able to run software from the command line rather than from within your editor or IDE.
If you don't know Python, or all knowledge of it has slipped from your mind since you were taught it, you might find my notes on Python helpful. These date from when I taught an MSc module on Python programming and go from nothing to writing reasonably sophisticated programs.
In the laboratory exercises, images are represented as numpy ("Numerical Python") arrays, making it possible for you to employ the power of both numpy and scipy ("Scientific Python") in your solutions as well as OpenCV, the widely-used, open-source computer vision package intended for real-time applications. In the later laboratories, you'll see results obtained from machine learning packages Scikit-learn and TensorFlow (the latter in conjunction with Keras). You don't need to install these to do the experiments. Of course, if your project involves machine learning, you'll probably have them installed anyway.
If all this sounds very frighting, don't worry: I'll explain it all during the lectures.
If something doesn't make sense to you, don't be afraid to ask! Drop me an email explaining what the problem is and I'll either reply by email or we can meet up (in person or by Zoom) to discuss it.
If you have a specific learning difficulty and would like a version of the lecture notes and laboratory scripts using different fonts, contrast, background colour or whatever, please do contact your lecturer. He spent part of a summer vacation writing software to do this so please make use of it rather than struggle on!
topic | teaching material | test your understanding |
---|---|---|
The entire book of notes Summary sheet you will be provided with in the exams Programming in Python |
How much you know before the module starts? | |
Module overview |
overheads |
quiz on the module's organization |
Introduction to Computer Vision | overheads |
quiz,
Python code,
C++ code
histogram worksheet, solutions |
The Human Visual System (not examinable) |
quiz |
|
Convolution | overheads |
quiz,
convolution worksheets,
solutions |
Low-level vision | overheads |
quiz |
Evaluating vision systems | overheads |
quiz |
Intermediate level vision | overheads |
quiz |
Looking at humans | overheads |
quiz |
Vision in a 3D world (section 9.6 onwards is not examinable) |
overheads |
quiz |
High-level vision with machine learning Deep learning and neural networks |
overheads |
quiz |
Getting to grips with the Unix command line |
shell syntax |
[print double-sided and fold concertina-style along the lines between columns] shell reference Emacs reference |
The labs have been re-written this year to make it easier for people working to do them. This means some problems may have crept in; if you find one, please do let the author know and he will fix it pronto! This applies even to seemingly trivial things like typos.
Note that the labs are self-paced rather than intended to be done one per week. When you finish one, just go on and do the next. As will be clear from the schedule below, labs 1–4 are assessed in the first progress test and 5–9 in the second.
script | data | quiz |
---|---|---|
1. Getting to grips with OpenCV | sxcv.zip | quiz |
2. Histograms | [also sxcv.zip] | quiz |
3. Colour images | 03-colour.zip | quiz |
4. Processing regions | 04-regions.zip | quiz |
First progress test | ||
5. Counting stomata | 05-stomata.zip | quiz |
6. Broken biscuits | 06-biscuits.zip | quiz |
7. Counting cars | 07-counting-cars.zip | quiz |
8. Stereo | 08-stereo.zip | quiz |
9. Machine learning | 09-ml.zip | quiz |
Second progress test |
There are two progress tests, scheduled for:
CE316 | week 21 week 25 |
---|---|
CE866 | week 21 week 25 |
(Yes, they are both at the same time.) As discussed at length above and in the first chapter of the notes, these are open-book tests that focus on the experiments. For both CE316 and CE866, each progress test is worth 20% of the overall module mark.
The remainder of your marks come from an examination held in the first part of the summer term. This is worth 60% of the overall module mark. Previous examinations are available via the Moodle site.
OpenCV and the Python environment described in lectures are part of the standard installation on the machines in CSEE's computer laboratories. Of course, you can also install OpenCV on your own computer, either under Unix (macOS, Linux, etc) or Windows. The way you do that for all three operating systems is described in the first chapter of the lecture notes.
In such a fast-moving subject area, printed textbooks are almost always out of date — and this is especially the case in the use of machine learning in computer vision. A couple of widely-recommended texts are:
Roy Davies' book Computer
and Machine Vision: Theory, Algorithms, Practicalities (4th
edition, Academic Press, 2012)
This is a good book but unfortunately
one you would have to buy. The third edition is in the Library too and that
would be fine to use. (There's a new edition just about to appear as I
write this.)
Richard Szeliski's book Computer
Vision: Algorithms and Applications (Springer, 2010)
This is intended for graduates with some vision experience; it is not
really suitable for newbies.
To be honest, it's best to ask questions in lectures and lab sessions if you are having trouble with the principles of the subject, and to use a search engine to help you with programming difficulties. Of course, as mentioned above, do contact your lecturer for help too!
Regarding computer vision programming, there are about half a dozen books in the library that describe earlier versions of OpenCV, so looking through one of those might help you if you get stuck. Other places to look are:
The official OpenCV documentation — be sure to look at the right version: chapter 1 of the notes explains how to find out which version you're using.
Jan Erik Solem's book Programming
Computer Vision with Python (O'Reilly, 2012)
Rather than OpenCV, this book explains how to use pure Python, with numpy
and scipy extensions, to carry out computer vision tasks.
You might also be interested to look at some online resources:
CVonline, a collection of tutorials and papers which explain techniques
HIPR2, a compendium of interactive demonstrations of mostly low-level vision techniques
You'll see from my demonstrations that I work by typing commands into a terminal window— and the commands I use are identical under Linux and macOS. Although it requires a little effort to get to grips with the command line it is a tremendously efficient way to interact with the machine, especially if what you want to do isn't catered for by something pointy-clicky or you're accessing the machine remotely. If your Unix (Linux, macOS) skills need improvement, here are some places to look:
A Linux tutorial, written by Michael Stonebank at Surrey.
The first chapter of my notes on Programming in Python.
Some locally-written notes about program development under Linux.
A series of tutorials about the Unix shell.
Web page maintained by Adrian F. Clark using Emacs, the One True Editor ;-) |