Police and security teams guarding airports, docks and border crossings from terrorist attack or illegal entry need to know immediately when someone enters a prohibited area, and who they are. A network of surveillance cameras is typically used to monitor these at-risk locations 24 hours a day, but these can generate too many images for human eyes to analyze.
Now a system being developed by Christopher Amato, a postdoc at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), can perform this analysis more accurately and in a fraction of the time it would take a human camera operator. “You can’t have a person staring at every single screen, and even if you did the person might not know exactly what to look for,” Amato says. “For example, a person is not going to be very good at searching through pages and pages of faces to try to match [an intruder] with a known criminal or terrorist.”
Existing computer vision systems designed to carry out this task automatically tend to be fairly slow, Amato says. “Sometimes it’s important to come up with an alarm immediately, even if you are not yet positive exactly what it is happening,” he says. “If something bad is going on, you want to know about it as soon as possible.”
So Amato and colleagues Komal Kapoor, Nisheeth Srivastava and Paul Schrater at the University of Minnesota are developing a system that uses mathematics to reach a compromise between accuracy — so the system does not trigger an alarm every time a cat walks in front of the camera, for example — with the speed needed to allow security staff to act on an intrusion as quickly as possible.
For camera-based surveillance systems, operators typically have a range of different computer vision algorithms they could use to analyze the video feed. These include skin detection algorithms that can identify a person in an image, or background detection systems that detect unusual objects, or when something is moving through the scene.
To decide which of these algorithms to use in a given situation, Amato’s system first carries out a learning phase, in which it assesses how each piece of software works in the type of setting in which it is being applied, such as an airport. To do this, it runs each of the algorithms on the scene, to determine how long it takes to perform an analysis, and how certain it is of the answer it comes up with. It then adds this information to its mathematical framework, known as a partially observable Markov decision process (POMDP).