According to an article published on February 21, 2007 in Technology Review, a group of neuroscientists at MIT has developed a computer model that mimics the human vision system to accurately detect and recognize objects such as cars and motorcycles, on a busy street. According to Thomas Serre, a neuroscientist at MIT, these types of vision systems could soon be used in surveillance systems or in smart sensors that alert drivers to the presence of pedestrians or other objects.
For years, researchers have tried to mimic biological vision systems, due to their perfection. But teaching a computer to classify objects has turned out to be more complicated than it first seemed, says Serre, who did the work with Tomaso Poggio. First, to recognize a specific type of object, the computer needs a template or specific computational representation of that specific object, which is what allows the computer to distinguish, for example, a car from objects that are not cars. However, the template needs to be flexible enough to accommodate all different types of cars at different angles and positions and under different lighting conditions.
The best way to achieve this is to train a learning algorithm with a series of images to extract the characteristics that they have in common. Serre and Poggio believe that the human vision system follows a similar approach, but that it depends on a hierarchy of successive layers in the visual cortex. The first layers of the crust would thus detect the simplest characteristics of an object and the last ones would combine that information to form our perception of the object as a whole.
To test their theory, Serre and Poggio worked with Stanley Bileschi, from MIT, and Lior Wolf, from the University of Tel Aviv, Israel, in creating a computer model with 10 million computational units, designed to behave like groups of neurons. of the visual cortex. As in the visual cortex, the units are divided into layers.
First, the simplest units extract rudimentary features from the scene (for example, oriented profiles) by analyzing very small groups of pixels. The more complex units then analyze larger portions of the image and recognize characteristics related to the size or position of objects. With each successive layer, increasingly complex characteristics are extracted, such as the distance between two parts of an object or the different orientation angles of these parts. This allows you to recognize the same object from different angles.
When they tested the system, their results were very good, being able to compete with those of the best systems on the market. Also, due to its learning ability, the more images you analyze, the more accurate your results are.
At the moment, the system has only been designed to analyze static images. However, according to Serre, the process is similar to that of the human vision system, where one part of the system is in charge of forms and another of movement. The team is now working on incorporating a parallel system that works with videos.
Source: Technology Review