A short quiz on vision system evaluation

Adrian F. Clark

You are developing software for the Police to show mugshots of suspects to the witness of a crime. Which of the following is the best approach to take?
- minimize the number of false negatives
- minimize the number of false positives
- maximize the number of true positives, even if the false positive rate is high
- maximize the number of true positives
For this type of application, we want to show anyone who stands any chance of being the perpetrator; it doesn't particularly matter if we have a high false positive rate as long as the true positive rate is high.
What is a false negative?
- A false result from an algorithm that should be negative
- A false result from an algorithm
- A negative result that is false
- A false result from an algorithm that should have succeeded
A false negative arises when an algorithm reports failure (or crashes) when it should have succeeded.
What do we do if we want to see if algorithms' performances differ?
- Look up the Z-score in one-tailed tables
- Look up the Z-score in two-tailed tables
- Look up the Z-score in binomial tables
- Look up the Z-score in Normal distribution tables
If algorithms' performances differ, it doesn't matter which is better than the other, so we use two-tailed tables.
What assumption underlies a null hypothesis test?
- that the algorithms differ in performance
- that there is no performance difference between algorithms
- that the algorithms don't work
- that the algorithms return a null result
The null hypothesis test assumes that there is no difference in performance between algorithms, then examines whether the statistics support that assumption or provide evidence that is is wrong.
What is `ground truth'?
- the true values obtained by an algorithm
- values obtained by an algorithm that are known to be true
- images of the ground
- data known to be correct
Ground truth are data (usually images) for which the correct answer is known; they are used for training and testing algorithms.
You are developing a automatic passport system for use by immigration, where pictures of people are compared to those in their passports. Which of the following is the best approach to take?
- minimize the number of false positives
- maximize the number of true positives, even if the false positive rate is high
- minimize the number of false negatives
- maximize the number of true positives
For this type of application, we need to keep the number of false positives as low as possible; otherwise, we would admit lots of people who don't look like the picture on their passports.
What is a false positive?
- A positive result that is false
- A false result from an algorithm that should be correct
- A false result from an algorithm
- A true result from an algorithm that is incorrect
A false positive arises when an algorithm reports success but has actually found an incorrect result.
Which test is most appropriate for comparing algorithms' performances?
- Laplace's test
- Canny's test
- McNemar's test
- Gauss's test
McNemar's test is the most appopriate test for comparing algorithms; it is a chi-squared test with one degree of freedom for paired data.
Which corner of a ROC curve indicates the best performance?
- upper right
- lower left
- lower right
- upper left
We want the smallest number of false positives for the largest number of true positives, so the best performance is the upper left corner of the plot.
When evaluating vision systems, it is normal to:
- use the same training and test sets
- have different training and test sets
- train on the training set and test using both training and test sets
- train on all data but test on only the test set
We know that algorithms work better on the data they were trained on than on unseen data; hence, we use different training and test sets.