Vision by learning

Computer vision systems are normally bespoke, constructed by an experienced developer to solve a specific task. Many modern vision systems involve an element of machine learning — but in the vast majority of cases, the machine learning component is used only to aid the classification process, one of the final stages of a complete system. The early stages, which typically involve segmenting features from background and grouping these features into objects, are produced manually and usually fine-tuned to work reliably for the task of the vision system. This process is typically fairly slow — indeed, a typical vision PhD student will spend a large proportion of his or her research performing system tuning.

This research explores a different approach. Our aim is to devise a set of generic vision components and use machine learning not to tune up a human-defined pipeline of components but to use it to learn the best way in which the individual components may be combined into a complete vision system. This means that the time-consuming stages of trying out different algorithms to see if they work and then tuning them to yield their best performance are done by a computer, not a human.