From your previous laboratories and what you find on the Web, you could be forgiven for thinking that computer vision is only about classification tasks: stomata detection, identifying broken biscuits, and so on. However, that is far from being the case.
An important area of computer vision is using it to understand the real world in 3D. This might be for robot navigation, driver-less cars etc but equally it can be applied to the movie and industries. In fact, most of the special effects you see in movies would not be possible without computer vision. We don't have time to solve either of those two problems in a single lab session; instead, we shall investigate how accurate the computation of distance from a pair of cameras can be made.
Imagine that you have been approached by a company, Wivenhoe Intelligent Systems Engineering (WISE), because of the extensive knowledge of computer vision you have acquired through the Computer Vision module. ;-) WISE is working on a stereo system for capturing faces in 3D. To aid their development work, they are using a modified version of the Candide face model, which they have "imaged" from a simulation of their stereo rig using the POV-ray ray-tracer. A ray-tracer is a program that takes a description of the components of a 3D scene, including lighting, camera and appearance, and calculates how it should appear. POV-ray is particularly attractive because this description is a kind of program, which you can write by hand or (as your lecturer usually does) generate them from a program and render them to produce images.
WISE's virtual stereo rig has identical cameras arranged so that their optical axes are precisely parallel --- one of the advantages of working in simulation is that this can be guaranteed. This means that it should be possible to use the equation described in the lecture notes on stereo:
\[Z = \frac{fB}{D}\]
where \(Z\) is the distance to the object, \(f\) the focal length, \(B\) the baseline and \(D\) the disparity (parallax) of a feature between the left and right images. They have supplied you with a zip-file containing programs and imagery.
The virtual cameras are at \((x,y,z)\) locations \((\pm75,300,800)\), which means that \(B = 150\) mm. Vertex 5 of Candide, the tip of her nose, is at location \((0,289,280)\); hence, the distance from the cameras to it, \(Z\), is approximately \(800 - 280 = 520\) mm (why only approximately?).
Measure the locations of that point in the images from the left and
right cameras: you can use xv
to display each image and
press the middle mouse button on the tip of her nose to read the
position; other pieces of software can be used for this too. Estimate
your accuracy in reading the position and, using all this information,
work out using a calculator the focal length and its uncertainty
("error") as described in the lecture notes.
The calculation above gives a rough estimate of the camera's focal length. However, it should be possible to do better using computer vision, a process known as camera calibration --- and there are routines in OpenCV to perform this. They were designed for a person to move a calibration target of known size, which looks like a chessboard, around in the field of view; images should be captured in a variety of positions and orientations, and the locations of the corners of the pattern are used to compute several camera parameters, including the focal length.
To this end, a series of images has been rendered using POV-ray with the same cameras as for Candide but with a model of a calibration target; these are in the zip-file along with the program you need to use. Make sure you understand what this calibration program does and then run it as follows:
python calibrate.py
It should output the computed focal length and an estimate of its error (uncertainty).
Finally, using the focal length output by the calibration program, calculate the distance to the tip of Candide's nose and its uncertainty --- you will find a worked example of how to do the calculations in the lecture notes. Is this more accurate than the rough-and-ready calculation in the first part of the experiment? Even though everything here was done on a computer, why might it not produce an answer that is exactly correct?
Web page maintained by Adrian F. Clark using Emacs, the One True Editor ;-) |