In this tutorial, I'll introduce a basic machine vision system which is capable of solving several problems. It uses some imaging code from the Java Machine Learning Library, a basic genetic learning technique and an algorithm called AdaBoost.
Please note - this is not a walkthrough. It is intended to give an overview of this kind of system - you can then look at the source code to fully understand how it works.
The system uses a basic learning technique (a Genetic Algorithm) to search for solutions to a particular problem. These solutions use basic data (features) from the image, together with a threshold to make a binary decision about each image, deciding whether it is something that we are interested in, or not.
The kind of problem we will solve is a detection problem. That is we would like to look at an image and locate all objects of a particular type in it. In this case we will use it to learn a detector which can recognise human faces.
In order to locate the objects, we move a small "window" across the image, which returns a true/false report for each point in the image. We can then view the result on the whole image, which should point out the locations the faces are to be found.
A feature consists of one or more adjacent rectangular areas. The average intensity of pixels for each is calculated. The feature returns the difference between the areas.
A low value will indicate that the two areas appear to be about the same, whereas a high difference indicates that one rectangle is darker than the other. Depending on the shapes of the rectangles, this may indicate that we're looking at the nose of a person, or their eyes, or something more subtle.
Each feature is defined by five important values: the offset in X and Y of the first rectangle from the top-left-corner of the window, and the width and height of the rectangles. These values are expressed as percentages, between 0-1, so they can be applied to a window of any size. The final value determines what combination of rectangles is used, there are three choices:
Haar Features | ||
Horizontal difference between two rectangles | Vertical difference between two rectangles | Horizontal difference between three rectangles |
As the system is expected to learn, we need to give it some examples of what is a face, and what is not. There are two sets of images - a set of faces cropped using the Image Grabber and a set of pictures of things that are not faces. We'll expect the detector to return "true" for all the faces, and "false" for all the non-faces. We can figure out a metric for how good each detector is simply by counting how many mistakes it makes on this data.
You'll notice that there are a lot more instances of "non faces" than there are of faces
As it isn't easy to guess which features will work well and which will not on our training data, we'll use a simple learning system to find good ones automatically. Many learning algorithms exist which are capable of solving this kind of problem, but we'll use a Genetic Algorithm (GA).
It is very hard for a single feature to be able to discriminate all faces from all non faces, so we'll actually evolve a "detector", which may use several features. It will add up the results of each feature and then use a threshold to determine whether the image is "true" or "false". We'll set it up so that the result of each feature can be either positive or negative, thus allowing the detector to make decisions based on the difference between two or more features.
The Genetic Algorithm helps us to find the combination of features that works well to differentiate our training data. The GA starts by creating a population of detectors randomly, and then makes new features based on the ones which performed best. As the process continues better and better solutions should be found.
The aim of Genetic learning methods is to choose two good "parents", and put them together in some way that produces an individual that is a combination of the two. It is hoped that the offspring will work slightly better too, although this isn't always the case. In our implementation, a new detector is created by taking one feature from each parent in turn so it has half of each parent's features. We also implement the idea of "mutation", where a feature is occasionally "miscopied".
Parents are chosen during a "tournament", where a set number of individuals is randomly selected from the population, and the highest performing one is chosen. Larger tournaments, with more individuals, encourages only the very best to be selected as parents, but smaller ones may result in a wider variety of good parents to be chosen.
These ideas are borrowed from Darwin's theory of evolution by natural selection and genetics. While at the mercy of good and bad luck, this type of learning algorithm can be shown to be substantially better than by just producing detectors by random chance.
GA Procedure 1. Create a random population with N individuals 2. Evaluate every member of the population 3a.Stop - If best individual is perfect) 3b.Stop - If reached max number of generations) 4. Create a new, empty population 5. Select two parents, using tournament selection 6. Create offspring from both parents and add to new population 7. Goto step 4 - If new population size is less than N 8. Goto step 2 |
Creating Offspring
for (i = 1; i < feature_count; i++) {
|
As we've already mentioned, we have many more examples of "non faces" than "faces", which is reasonable considering the variety of different objects in the world. However, if we have a training set consisting of 90% "non faces", it is quite possible for a detector to simply classify everything as a "non" face, and immediately only make 10% mistakes. On face value this score is quite good, but the detector is actually useless.
Even if the solution achieved 95%, again a good sounding score, it would still have classified half of the faces wrong, which again isn't acceptable. We have to ensure that the solution is able to solve all the data correctly.
Genetic learning systems are very good at finding "good" solutions to problems, but they are not always able to find the "perfect" solution that we're looking for. This is because the randomness which initially discovers good solutions actually becomes more and more destructive as the solutions become better, and therefore more fragile.
So it is unlikely that one GA detector will be able to separate all the data perfectly - it will probably make some mistakes. This is where the boosting process comes in. The AdaBoost algorithm requests a solution be learned to solve the training data, and then decides which pieces of data are solved easily, and which pieces of data were not solved. It then increases a "weight" associated with the difficult data, so that the learning system places more importance on those items when it is run again. Adaboost keeps on re-running the learning process until a combination of detectors has been found which can solve all the data. In our example, we may need up ten separate detectors which can solve the problem. The whole learning process should take about 20 minutes.
The example is not trivial, so it may take you a little while to understand, but some complexity is required in order to produce a system which is capable of solving these tasks.
Training Data
The first program is for supplying training data to the system. This consists of a set of images, some of which
contain pictures of faces, and half of which contain pictures that do not contain faces.
See: FaceTrainingData.java
Genetic Algorithm
The second part of the system is the Genetic Algorithm, found in a file called GeneticLearningSystem.java. Any learning system
consists of some means of "searching" for solutions, and a means of evaluating which solutions are best. In this Genetic system
there are two search operators - first is "crossover", where the features of two parents are combined to form a single "offspring",
and "mutation", which simulates an error copying values, which leads to slightly different features being used.
See: GeneticLearningSystem.java
Evaluation Function
The evaluation function counts up how many pieces of training data are classified correctly by the classifier, and how many mistakes
it makes. The mistakes is equivalent to the "fitness" of the solution, with a perfect solution making zero mistakes.
See: weakLearn() in MachineVisionSystem.java
Detector
The detector consists of a set of features, together with a threshold, so that it can make binary decisions. The threshold is calculated
to choose an optimal value, but the features are each evolved using the Genetic Algorithm.
See: FeatureBasedDetector.java
The features produces a numeric output, but we need to make a boolean classification saying whether an image is a face - or not. Therefore we use a threshold to define the point which marks the boundary between two classifications. In this case we calculate the threshold dynamically, by looking at the average output on the "true" training data, and the average output on the "false" training data, and choosing the threshold to be exactly inbetween, with the hope that it will be able to split the training data into two separate groups.
See: ThresholdClassifier.java
Boosted Learning System
The boosted learning system is responsible for starting the Genetic Algorithm. If the solution evolved by the Genetic Algorithm
is not good enough, it updates the weights in the evaluation function and starts the learning process again. As I've already
written the AdaBoost code, we actually only need to extend the AdaBoost class which demands two methods be implemented - the first
is to provide the training data to AdaBoost, and the second is to start a learning process - it can be any kind of learning so long
as the resulting detector is compatible.
See: MachineVisionSystem.java
Solution
The eventual solution is a set of detectors evolved by the Genetic Algorithm. Adaboost assigns a score to each of them, indicating
how much each is to be trusted. When it is used to classify, each detector is executed in turn, and its "score" is added onto either
the "true" or "false" case, depending on which it classifies. When all the detectors have been executed, the case with the highest
total score is deemed to be the overall classification.
This class is serialisable, so you can save it to disk and use it in other experiments!
See: AdaBoost library source code
Training data is split into two subdirectories - true and false. The FaceTrainingData class simply extracts all the images from each directory and instantiates them as AdaBoostSample objects which is part of my AdaBoost library. It is simply a storage "box" which can hold the piece of training data (in this case an image) together with a class identifier, which in this case is a number, "1" indicates a face, and "0" indicates a non-face image.
The bulk of the logic is handled in MachineVisionSystem.java. As which implements two methods required by its parent class, AdaBoost. You don't need to understand the workings of AdaBoost (although feel free to browse the code and look!), except to know that it requires some means of accessing the training data, which is achieved through the getSamples() method, and some means of calling a learning method which can produce a new solution.
To run the learning process you can either run the main() method in MachineVisionSystem, or run main() on MachineVisionSystemGUI, whic is an extension of MachineVisionSystem which allows you to run the program visually, and also allows you to test and save the detector.
After running the GUI, click the "Start" button to start the learning process. You can also set various parameters at run time, such as the maximum number of boosting steps to take, and the Genetic Algorithm parameters:
GA Generations | The number of generations for which to run the evolution. |
GA Population Size | How many individuals are in the population per generation. Higher numbers may increase the variety, and therefore the quality of the best individuals, but are slower to run |
GA Tournament Size | How many individuals take part in the tournament selection where parents are chosen. A higher number increases selective pressure (to choose the very best parents only), a lower number decreases it |
Once you start the learning process, a solution will start to be learned. This may take 20 minutes or longer. The status message in the bottom left corner will inform you of the stage of AdaBoost, with the result on the training data also being displayed. Once the result reaches 100% or the number of AdaBoost runs has been exhausted the learning is completed.
As each detector is evolved, a preview is displayed in the main window of what each detector looks like roughly. Positive and Negative features are represented by green and red colours respectively. Remember that each feature is defined by percentage values, so the window can be scaled to any size to detect faces of any size.
Results of running the detector on a set of images. You'll notice that the detector is quite specific to one "size" of face - the faces in the
further behind are not detected because they are a little too small. You often have to run the detector at different resolutions to capture
all the faces in an image.