This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations .(May 2023) |
Artificial intelligence for video surveillance utilizes computer software programs that analyze the audio and images from video surveillance cameras in order to recognize humans, vehicles, objects, attributes, and events. Security contractors program the software to define restricted areas within the camera's view (such as a fenced off area, a parking lot but not the sidewalk or public street outside the lot) and program for times of day (such as after the close of business) for the property being protected by the camera surveillance. The artificial intelligence ("A.I.") sends an alert if it detects a trespasser breaking the "rule" set that no person is allowed in that area during that time of day. [1]
The A.I. program functions by using machine vision. Machine vision is a series of algorithms, or mathematical procedures, which work like a flow-chart or series of questions to compare the object seen with hundreds of thousands of stored reference images of humans in different postures, angles, positions and movements. The A.I. asks itself if the observed object moves like the reference images, whether it is approximately the same size height relative to width, if it has the characteristic two arms and two legs, if it moves with similar speed, and if it is vertical instead of horizontal. Many other questions are possible, such as the degree to which the object is reflective, the degree to which it is steady or vibrating, and the smoothness with which it moves. Combining all of the values from the various questions, an overall ranking is derived which gives the A.I. the probability that the object is or is not a human. If the value exceeds a limit that is set, then the alert is sent. It is characteristic of such programs that they are self-learning to a degree, learning, for example that humans or vehicles appear bigger in certain portions of the monitored image – those areas near the camera – than in other portions, those being the areas farthest from the camera.
In addition to the simple rule restricting humans or vehicles from certain areas at certain times of day, more complex rules can be set. The user of the system may wish to know if vehicles drive in one direction but not the other. Users may wish to know that there are more than a certain preset number of people within a particular area. The A.I. is capable of maintaining surveillance of hundreds of cameras simultaneously. Its ability to spot a trespasser in the distance or in rain or glare is superior to humans' ability to do so.
This type of A.I. for security is known as "rule-based" because a human programmer must set rules for all of the things for which the user wishes to be alerted. This is the most prevalent form of A.I. for security. Many video surveillance camera systems today include this type of A.I. capability. The hard-drive that houses the program can either be located in the cameras themselves or can be in a separate device that receives the input from the cameras.
A newer, non-rule based form of A.I. for security called "behavioral analytics" has been developed. This software is fully self-learning with no initial programming input by the user or security contractor. In this type of analytics, the A.I. learns what is normal behaviour for people, vehicles, machines, and the environment based on its own observation of patterns of various characteristics such as size, speed, reflectivity, color, grouping, vertical or horizontal orientation and so forth. The A.I. normalises the visual data, meaning that it classifies and tags the objects and patterns it observes, building up continuously refined definitions of what is normal or average behaviour for the various observed objects. After several weeks of learning in this fashion it can recognise when things break the pattern. When it observes such anomalies it sends an alert. For example, it is normal for cars to drive in the street. A car seen driving up onto a sidewalk would be an anomaly. If a fenced yard is normally empty at night, then a person entering that area would be an anomaly.
Limitations in the ability of humans to vigilantly monitor video surveillance live footage led to the demand for artificial intelligence that could better serve the task. Humans watching a single video monitor for more than twenty minutes lose 95% of their ability to maintain attention sufficient to discern significant events. [2] With two monitors this is cut in half again. [3] Given that many facilities have dozens or even hundreds of cameras, the task is clearly beyond human ability. In general, the camera views of empty hallways, storage facilities, parking lots or structures are exceedingly boring and thus attention quickly diminishes. When multiple cameras are monitored, typically employing a wall monitor or bank of monitors with split screen views and rotating every several seconds between one set of cameras and the next, the visual tedium is quickly overwhelming. While video surveillance cameras proliferated with great adoption by users ranging from car dealerships and shopping plazas to schools and businesses to highly secured facilities such as nuclear plants, it was recognized in hindsight that video surveillance by human officers (also called "operators") was impractical and ineffective. Extensive video surveillance systems were relegated to merely recording for possible forensic use to identify someone, after the fact of a theft, arson, attack or incident. Where wide angle camera views were employed, particularly for large outdoor areas, severe limitations were discovered even for this purpose due to insufficient resolution. [4] In these cases it is impossible to identify the trespasser or perpetrator because their image is too tiny on the monitor.[ citation needed ]
In response to the shortcomings of human guards to watch surveillance monitors long-term, the first solution was to add motion detectors to cameras. It was reasoned that an intruder's or perpetrator's motion would send an alert to the remote monitoring officer obviating the need for constant human vigilance. The problem was that in an outdoor environment there is constant motion or changes of pixels that comprise the total viewed image on screen. The motion of leaves on trees blowing in the wind, litter along the ground, insects, birds, dogs, shadows, headlights, sunbeams and so forth all comprise motion. This caused hundreds or even thousands of false alerts per day, rendering this solution inoperable except in indoor environments during times of non-operating hours.
The next evolution reduced false alerts to a degree but at the cost of complicated and time-consuming manual calibration. Here, changes of a target such as a person or vehicle relative to a fixed background are detected. Where the background changes seasonally or due to other changes, the reliability deteriorates over time. The economics of responding to too many false alerts again proved to be an obstacle and this solution was not sufficient.
Machine learning of visual recognition relates to patterns and their classification. [5] [6] True video analytics can distinguish the human form, vehicles and boats or selected objects from the general movement of all other objects and visual static or changes in pixels on the monitor. It does this by recognizing patterns. When the object of interest, for example a human, violates a preset rule, for example that the number of people shall not exceed zero in a pre-defined area during a defined time interval, then an alert is sent. A red rectangle or so-called "bounding box" will typically automatically follow the detected intruder, and a short video clip of this is sent as the alert.
The detection of intruders using video surveillance has limitations based on economics and the nature of video cameras. Typically, cameras outdoors are set to a wide angle view and yet look out over a long distance. Frame rate per second and dynamic range to handle brightly lit areas and dimly lit ones further challenge the camera to actually be adequate to see a moving human intruder. At night, even in illuminated outdoor areas, a moving subject does not gather enough light per frame per second and so, unless quite close to the camera, will appear as a thin wisp or barely discernible ghost or completely invisible. Conditions of glare, partial obscuration, rain, snow, fog, and darkness all compound the problem. Even when a human is directed to look at the actual location on a monitor of a subject in these conditions, the subject will usually not be detected. The A.I. is able to impartially look at the entire image and all cameras' images simultaneously. Using statistical models of degrees of deviation from its learned pattern of what constitutes the human form it will detect an intruder with high reliability and a low false alert rate even in adverse conditions. [7] Its learning is based on approximately a quarter million images of humans in various positions, angles, postures, and so forth.
A one megapixel camera with the onboard video analytics was able to detect a human at a distance of about 350' and an angle of view of about 30 degrees in non-ideal conditions. Rules could be set for a "virtual fence" or intrusion into a pre-defined area. Rules could be set for directional travel, object left behind, crowd formation and some other conditions. Artificial intelligence for video surveillance is widely used in China. See Mass surveillance in China.
One of the most powerful features of the system is that a human officer or operator, receiving an alert from the A.I., could immediately talk down over outdoor public address loudspeakers to the intruder. This had high deterrence value as most crimes are opportunistic and the risk of capture to the intruder becomes so pronounced when a live person is talking to them that they are very likely to desist from intrusion and to retreat. The security officer would describe the actions of the intruder so that the intruder had no doubt that a real person was watching them. The officer would announce that the intruder was breaking the law and that law enforcement was being contacted and that they were being video-recorded. [8]
The police receive a tremendous number of false alarms from burglar alarms. In fact the security industry reports that over 98% of such alarms are false ones. Accordingly, the police give very low priority response to burglar alarms and can take from twenty minutes to two hours to respond to the site. By contrast, the video analytic-detected crime is reported to the central monitoring officer, who verifies with his or her own eyes that it is a real crime in progress. He or she then dispatches to the police who give such calls their highest priority.
While rule-based video analytics worked economically and reliably for many security applications there are many situations in which it cannot work. [9] For an indoor or outdoor area where no one belongs during certain times of day, for example overnight, or for areas where no one belongs at any time such as a cell tower, traditional rule-based analytics are perfectly appropriate. In the example of a cell tower the rare time that a service technician may need to access the area would simply require calling in with a pass-code to put the monitoring response "on test" or inactivated for the brief time the authorized person was there.
But there are many security needs in active environments in which hundreds or thousands of people belong all over the place all the time. For example, a college campus, an active factory, a hospital or any active operating facility. It is not possible to set rules that would discriminate between legitimate people and criminals or wrong-doers.
Using behavioral analytics, a self-learning, non-rule-based A.I. takes the data from video cameras and continuously classifies objects and events that it sees. For example, a person crossing a street is one classification. A group of people is another classification. A vehicle is one classification, but with continued learning a public bus would be discriminated from a small truck and that from a motorcycle. With increasing sophistication, the system recognizes patterns in human behavior. For example, it might observe that individuals pass through a controlled access door one at a time. The door opens, the person presents their proximity card or tag, the person passes through and the door closes. This pattern of activity, observed repeatedly, forms a basis for what is normal in the view of the camera observing that scene. Now if an authorized person opens the door but a second "tail-gating" unauthorized person grabs the door before it closes and passes through, that is the sort of anomaly that would create an alert. This type of analysis is much more complex than the rule-based analytics. While the rule-based analytics work mainly to detect intruders into areas where no one is normally present at defined times of day, the behavioral analytics works where people are active to detect things that are out of the ordinary.
A fire breaking out outdoors would be an unusual event and would cause an alert, as would a rising cloud of smoke. Vehicles driving the wrong way into a one-way driveway would also typify the type of event that has a strong visual signature and would deviate from the repeatedly observed pattern of vehicles driving the correct one-way in the lane. Someone thrown to the ground by an attacker would be an unusual event that would likely cause an alert. This is situation-specific. So if the camera viewed a gymnasium where wrestling was practiced the A.I. would learn it is usual for one human to throw another to the ground, in which case it would not alert on this observation.
The A.I. does not know or understand what a human is, or a fire, or a vehicle. It is simply finding characteristics of these things based on their size, shape, color, reflectivity, angle, orientation, motion, and so on. It then finds that the objects it has classified have typical patterns of behavior. For example, humans walk on sidewalks and sometimes on streets but they don't climb up the sides of buildings very often. Vehicles drive on streets but don't drive on sidewalks. Thus the anomalous behavior of someone scaling a building or a vehicle veering onto a sidewalk would trigger an alert.
Typical alarm systems are designed to not miss true positives (real crime events) and to have as low of a false alarm rate as possible. In that regard, burglar alarms miss very few true positives but have a very high false alarm rate even in the controlled indoor environment. Motion detecting cameras miss some true positives but are plagued with overwhelming false alarms in an outdoor environment. Rule-based analytics reliably detect most true positives and have a low rate of false positives but cannot perform in active environments, only in empty ones. Also they are limited to the simple discrimination of whether an intruder is present or not.
Something as complex or subtle as a fight breaking out or an employee breaking a safety procedure is not possible for a rule based analytics to detect or discriminate. With behavioral analytics, it is. Places where people are moving and working do not present a problem. However, the A.I. may spot many things that appear anomalous but are innocent in nature. For example, if students at a campus walk on a plaza, that will be learned as normal. If a couple of students decided to carry a large sheet outdoors flapping in the wind, that might indeed trigger an alert. The monitoring officer would be alerted to look at his or her monitor and would see that the event is not a threat and would then ignore it. The degree of deviation from norm that triggers an alert can be set so that only the most abnormal things are reported. However, this still constitutes a new way of human and A.I. interaction not typified by the traditional alarm industry mindset. This is because there will be many false alarms that may nevertheless be valuable to send to a human officer who can quickly look and determine if the scene requires a response. In this sense, it is a "tap on the shoulder" from the A.I. to have the human look at something.
Because so many complex things are being processed continuously, the software samples down to the very low resolution of only 1 CIF to conserve computational demand. The 1 CIF resolution means that an object the size of a human will not be detected if the camera utilized is wide angle and the human is more than sixty to eighty feet distant depending on conditions. Larger objects like vehicles or smoke would be detectable at greater distances.
The utility of artificial intelligence for security does not exist in a vacuum, and its development was not driven by purely academic or scientific study. Rather, it is addressed to real-world needs, and hence, economic forces. Its use for non-security applications such as operational efficiency, shopper heat-mapping of display areas (meaning how many people are in a certain area in retail space), and attendance at classes are developing uses. [10] Humans are not as well qualified as A.I. to compile and recognize patterns consisting of very large data sets requiring simultaneous calculations in multiple remote viewed locations. There is nothing natively human about such awareness. Such multitasking has been shown to defocus human attention and performance. A.I.s have the ability to handle such data. For the purposes of security interacting with video cameras they functionally have better visual acuity than humans or the machine approximation to it. For judging subtleties of behaviors or intentions of subjects or degrees of threat, humans remain far superior at the present state of the technology. So the A.I. in security functions to broadly scan beyond human capability and to vet the data to a first level of sorting of relevance and to alert the human officer who then takes over the function of assessment and response.
Security in the practical world is economically determined so that the expenditure of preventative security will never typically exceed the perceived cost of the risk to be avoided. Studies have shown that companies typically only spend about one twenty-fifth the amount on security that their actual losses cost them. [11] [ predatory publisher ] What by pure economic theory should be an equivalence or homeostasis, thus falls vastly short of it. One theory that explains this is cognitive dissonance, or the ease with which unpleasant things like risk can be shunted from the conscious mind. Nevertheless, security is a major expenditure, and comparison of the costs of different means of security is always foremost amongst security professionals.
Another reason that future security threats or losses are under-assessed is that often only the direct cost of a potential loss is considered instead of the spectrum of consequential losses that are concomitantly experienced. For example, the vandalism-destruction of a custom production machine in a factory or of a refrigerated tractor-trailer would result in a long replacement time during which customers could not be served, resulting in loss of their business. A violent crime will have extensive public relations damage for an employer, beyond the direct liability for failing to protect the employee.
Behavioral analytics uniquely functions beyond simple security and, due to its ability to observe breaches in standard patterns of protocols, it can effectively find unsafe acts of employees that may result in workers comp or public liability incidents. Here too, the assessment of future incidents' costs falls short of the reality. A study by Liberty Mutual Insurance Company showed that the cost to employers is about six times the direct insured cost, since uninsured costs of consequential damages include temporary replacement workers, hiring costs for replacements, training costs, managers' time in reports or court, adverse morale on other workers, and effect on customer and public relations. [12] The potential of A.I. in the form of behavioral analytics to proactively intercept and prevent such incidents is significant.
Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.
Physical security describes security measures that are designed to deny unauthorized access to facilities, equipment, and resources and to protect personnel and property from damage or harm. Physical security involves the use of multiple layers of interdependent systems that can include CCTV surveillance, security guards, protective barriers, locks, access control, perimeter intrusion detection, deterrent systems, fire protection, and other systems designed to protect persons and property.
Closed-circuit television (CCTV), also known as video surveillance, is the use of closed-circuit television cameras to transmit a signal to a specific place, on a limited set of monitors. It differs from broadcast television in that the signal is not openly transmitted, though it may employ point-to-point, point-to-multipoint (P2MP), or mesh wired or wireless links. Even though almost all video cameras fit this definition, the term is most often applied to those used for surveillance in areas that require additional security or ongoing monitoring.
Surveillance is the monitoring of behavior, many activities, or information for the purpose of information gathering, influencing, managing, or directing. This can include observation from a distance by means of electronic equipment, such as closed-circuit television (CCTV), or interception of electronically transmitted information like Internet traffic. It can also include simple technical methods, such as human intelligence gathering and postal interception.
An intrusion detection system (IDS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically either reported to an administrator or collected centrally using a security information and event management (SIEM) system. A SIEM system combines outputs from multiple sources and uses alarm filtering techniques to distinguish malicious activity from false alarms.
Motion detection is the process of detecting a change in the position of an object relative to its surroundings or a change in the surroundings relative to an object. It can be achieved by either mechanical or electronic methods. When it is done by natural organisms, it is called motion perception.
A security alarm is a system designed to detect intrusions, such as unauthorized entry, into a building or other areas, such as a home or school. Security alarms protect against burglary (theft) or property damage, as well as against intruders. Examples include personal systems, neighborhood security alerts, car alarms, and prison alarms.
A passive infrared sensor is an electronic sensor that measures infrared (IR) light radiating from objects in its field of view. They are most often used in PIR-based motion detectors. PIR sensors are commonly used in security alarms and automatic lighting applications.
A motion detector is an electrical device that utilizes a sensor to detect nearby motion. Such a device is often integrated as a component of a system that automatically performs a task or alerts a user of motion in an area. They form a vital component of security, automated lighting control, home control, energy efficiency, and other useful systems.
An Internet Protocol camera, or IP camera, is a type of digital video camera that receives control data and sends image data via an IP network. They are commonly used for surveillance, but, unlike analog closed-circuit television (CCTV) cameras, they require no local recording device, only a local area network. Most IP cameras are webcams, but the term IP camera or netcam usually applies only to those that can be directly accessed over a network connection.
Perimeter surveillance radar (PSR) is a class of radar sensors that monitor activity surrounding or on critical infrastructure areas such as airports, seaports, military installations, national borders, refineries and other critical industry and the like. Such radars are characterized by their ability to detect movement at ground level of targets such as an individual walking or crawling towards a facility. Such radars typically have ranges of several hundred metres to over 10 kilometres.
A flame detector is a sensor designed to detect and respond to the presence of a flame or fire, allowing flame detection. Responses to a detected flame depend on the installation, but can include sounding an alarm, deactivating a fuel line, and activating a fire suppression system. When used in applications such as industrial furnaces, their role is to provide confirmation that the furnace is working properly; it can be used to turn off the ignition system though in many cases they take no direct action beyond notifying the operator or control system. A flame detector can often respond faster and more accurately than a smoke or heat detector due to the mechanisms it uses to detect the flame.
Countersurveillance refers to measures that are usually undertaken by the public to prevent surveillance, including covert surveillance. Countersurveillance may include electronic methods such as technical surveillance counter-measures, which is the process of detecting surveillance devices. It can also include covert listening devices, visual surveillance devices, and countersurveillance software to thwart unwanted cybercrime, such as accessing computing and mobile devices for various nefarious reasons. More often than not, countersurveillance will employ a set of actions (countermeasures) that, when followed, reduce the risk of surveillance. Countersurveillance is different from sousveillance, as the latter does not necessarily aim to prevent or reduce surveillance.
The Unattended Ground Sensor (UGS) are a variety of small sensors, generally covert, dedicated to detect and identify activities on the ground such as enemy soldiers or vehicles. UGS come as systems with an integrated communication network and processing capabilities.
Smart Eye AB, is a Swedish artificial intelligence (AI) company founded in 1999 and headquartered in Gothenburg, Sweden. Smart Eye develops Human Insight AI, technology that understands, supports and predicts human behavior in complex environments. Smart Eye develops and deploys several core technologies that help gain insights from subtle and nuanced changes in human behavior, reactions and expressions. These technologies include head tracking, eye tracking, facial expression analysis and Emotion AI, activity and object detection, and multimodal sensor data analysis.
Fraud represents a significant problem for governments and businesses and specialized analysis techniques for discovering fraud using them are required. Some of these methods include knowledge discovery in databases (KDD), data mining, machine learning and statistics. They offer applicable and successful solutions in different areas of electronic fraud crimes.
Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events.
The Domain Awareness System is the largest digital surveillance system in the world as part of the Lower Manhattan Security Initiative in partnership between the New York Police Department and Microsoft to monitor New York City. It allows the NYPD to track surveillance targets and gain detailed information about them, and is overseen by the counterterrorism bureau.
Sound recognition is a technology, which is based on both traditional pattern recognition theories and audio signal analysis methods. Sound recognition technologies contain preliminary data processing, feature extraction and classification algorithms. Sound recognition can classify feature vectors. Feature vectors are created as a result of preliminary data processing and linear predictive coding.
Human bycatch is a term for people who are unintentionally caught on film, in photos, or acoustically recorded on equipment used to monitor wildlife or habitats for the purpose of conservation, or environmental law enforcement. It comes from the term bycatch, which is used in fishing practices to designate non-target species that are caught in a fishing net. Nearly every remote monitoring study contains human by-catch, yet there are no standardized rules or policies regarding what the researchers can or should do with their data.