Video content analysis

Last updated

Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events.

Contents

This technical capability is used in a wide range of domains including entertainment, [1] video retrieval and video browsing, [2] health-care, retail, automotive, transport, home automation, flame and smoke detection, safety, and security. [3] The algorithms can be implemented as software on general-purpose machines, or as hardware in specialized video processing units.

Many different functionalities can be implemented in VCA. Video Motion Detection is one of the simpler forms where motion is detected with regard to a fixed background scene. More advanced functionalities include video tracking [4] and egomotion estimation. [5]

Based on the internal representation that VCA generates in the machine, it is possible to build other functionalities, such as video summarization, [6] identification, behavior analysis, or other forms of situation awareness.

VCA relies on good input video, so it is often combined with video enhancement technologies such as video denoising, image stabilization, unsharp masking, and super-resolution.[ citation needed ]

Functionalities

Several articles provide an overview of the modules involved in the development of video analytic applications. [7] [8] This is a list of known functionalities and a short description.

FunctionDescription
Dynamic maskingBlocking a part of the video signal based on the signal itself, for example because of privacy concerns.
Flame and smoke detectionIP cameras with intelligent video surveillance technology can be used to detect flame and smoke in 15–20 seconds or even less because of the built-in DSP chip. The chip processes algorithms that analyzes the videos captured for flame and smoke characteristics such as color chrominance, flickering ratio, shape, pattern and moving direction.
Egomotion estimation Egomotion estimation is used to determine the location of a camera by analyzing its output signal.
Motion detectionMotion detection is used to determine the presence of relevant motion in the observed scene.
Shape recognitionShape recognition is used to recognize shapes in the input video, for example circles or squares. This functionality is typically used in more advanced functionalities such as object detection.
Object detection Object detection is used to determine the presence of a type of object or entity, for example a person or car. Other examples include fire and smoke detection.
Recognition Face recognition and Automatic Number Plate Recognition are used to recognize, and therefore possibly identify, persons or cars.
Style detectionStyle detection is used in settings where the video signal has been produced, for example for television broadcast. Style detection detects the style of the production process. [9]
Tamper detectionTamper detection is used to determine whether the camera or output signal is tampered with.
Video tracking Video tracking is used to determine the location of persons or objects in the video signal, possibly with regard to an external reference grid.
Video error level analysisVideo scene content tamper analysis using free software. Video Error level analysis (VELA)
Object co-segmentation Joint object discovery, classification and segmentation of targets in one or multiple related video sequences

Commercial applications

VCA is a relatively new technology, with numerous companies releasing VCA-enhanced products in the mid-2000s. [10] [11] [12] While there are many applications, the track record of different VCA solutions differ widely. Functionalities such as motion detection, people counting and gun detection are available as commercial off-the-shelf products and believed to have a decent track-record (for example, even freeware such as dsprobotics Flowstone can handle movement and color analysis). In response to the COVID-19 pandemic, many software manufacturers have introduced new public health analytics like face mask detection or social distancing tracking. [13] [14] [15]

In many domains VCA is implemented on CCTV systems, either distributed on the cameras (at-the-edge) or centralized on dedicated processing systems. Video Analytics and Smart CCTV are commercial terms for VCA in the security domain. In the UK the BSIA has developed an introduction guide for VCA in the security domain. [16] In addition to video analytics and to complement it, audio analytics can also be used. [17]

Video management software manufacturers are constantly expanding the range of the video analytics modules available. With the new suspect tracking technology, it is then possible to track all of this subject's movements easily: where they came from, and when, where, and how they moved. Within a particular surveillance system, the indexing technology is able to locate people with similar features who were within the cameras’ viewpoints during or within a specific period of time. Usually, the system finds a lot of different people with similar features and presents them in the form of snapshots. The operator only needs to click on those images and subjects which need to be tracked. Within a minute or so, it's possible to track all the movements of a particular person, and even to create a step-by-step video of the movements.

Kinect is an add-on peripheral for the Xbox 360 gaming console that uses VCA for part of the user input. [18]

In retail industry, VCA is used to track shoppers inside the store. [19] By this way, a heatmap of the store can be obtained, which is beneficial for store design and marketing optimisations. Other applications include dwell time when looking at a products and item removed/left detection.

The quality of VCA in the commercial setting is difficult to determine. It depends on many variables such as use case, implementation, system configuration and computing platform. Typical methods to get an objective idea of the quality in commercial settings include independent benchmarking [20] and designated test locations.

VCA has been used for crowd management purposes, notably at The O2 Arena in London and The London Eye.

Law enforcement

Police and forensic scientists analyse CCTV video when investigating criminal activity. Police use software, such as Kinesense, which performs video content analysis to search for key events in video and find suspects. Surveys have shown that up to 75% of cases involve CCTV. Police use video content analysis software to search long videos for important events. [21] [22]

Academic research

Video content analysis is a subset of computer vision and thereby of artificial intelligence. Two major academic benchmark initiatives are TRECVID, [23] which uses a small portion of i-LIDS video footage, and the PETS Benchmark Data. [24] They focus on functionalities such as tracking, left luggage detection and virtual fencing. Benchmark video datasets such as the UCF101 [25] enables action recognition researches incorporating temporal and spatial visual attention with convolutional neural network and long short-term memory. Video analysis software is also being paired with footage from body-worn and dashboard cameras in order to more easily redact footage for public disclosure and to identify events and people in videos. [26]

The EU is funding a FP7 project called P-REACT [27] to integrate video content analytics on embedded systems with police and transport security databases. [28]

Artificial Intelligence

Artificial intelligence for video surveillance utilizes computer software programs that analyze the audio and images from video surveillance cameras in order to recognize humans, vehicles, objects and events. Security contractors program is the software to define restricted areas within the camera's view (such as a fenced off area, a parking lot but not the sidewalk or public street outside the lot) and program for times of day (such as after the close of business) for the property being protected by the camera surveillance. The artificial intelligence ("A.I.") sends an alert if it detects a trespasser breaking the "rule" set that no person is allowed in that area during that time of day.

See also

Related Research Articles

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

<span class="mw-page-title-main">Physical security</span> Measures designed to deny unauthorized access

Physical security describes security measures that are designed to deny unauthorized access to facilities, equipment, and resources and to protect personnel and property from damage or harm. Physical security involves the use of multiple layers of interdependent systems that can include CCTV surveillance, security guards, protective barriers, locks, access control, perimeter intrusion detection, deterrent systems, fire protection, and other systems designed to protect persons and property.

<span class="mw-page-title-main">Closed-circuit television</span> Use of video cameras to transmit a signal to a specific place on a limited set of monitors

Closed-circuit television (CCTV), also known as video surveillance, is the use of video cameras to transmit a signal to a specific place, on a limited set of monitors. It differs from broadcast television in that the signal is not openly transmitted, though it may employ point-to-point (P2P), point-to-multipoint (P2MP), or mesh wired or wireless links. Even though almost all video cameras fit this definition, the term is most often applied to those used for surveillance in areas that require additional security or ongoing monitoring.

<span class="mw-page-title-main">Surveillance</span> Monitoring something for the purposes of influencing, protecting, or suppressing it

Surveillance is the monitoring of behavior, many activities, or information for the purpose of information gathering, influencing, managing or directing. This can include observation from a distance by means of electronic equipment, such as closed-circuit television (CCTV), or interception of electronically transmitted information like Internet traffic. It can also include simple technical methods, such as human intelligence gathering and postal interception.

<span class="mw-page-title-main">Mass surveillance</span> Intricate surveillance of an entire or a substantial fraction of a population

Mass surveillance is the intricate surveillance of an entire or a substantial fraction of a population in order to monitor that group of citizens. The surveillance is often carried out by local and federal governments or governmental organizations, such as organizations like the NSA, but it may also be carried out by corporations. Depending on each nation's laws and judicial systems, the legality of and the permission required to engage in mass surveillance varies. It is the single most indicative distinguishing trait of totalitarian regimes. It is also often distinguished from targeted surveillance.

<span class="mw-page-title-main">Facial recognition system</span> Technology capable of matching a face from an image against a database of faces

A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and works by pinpointing and measuring facial features from a given image.

<span class="mw-page-title-main">Axis Communications</span> Swedish manufacturer of surveillance cameras

Axis Communications AB is a Swedish manufacturer of network cameras, access control, and network audio devices for the physical security and video surveillance industries. Since 2015, it operates as an independent subsidiary of Canon Inc.

<span class="mw-page-title-main">Closed-circuit television camera</span> Type of surveillance camera

A closed-circuit television camera is a type of surveillance camera that transmits video signals to a specific set of monitors or video recording devices, rather than broadcasting the video over public airwaves. The term "closed-circuit" indicates that the video feed is only accessible to a limited number of people or devices with authorized access. Cameras can be either analog or digital. Walter Bruch was the inventor of the CCTV camera.

Video motion analysis is a technique used to get information about moving objects from video. Examples of this include gait analysis, sport replays, speed and acceleration calculations and, in the case of team or individual sports, task performance analysis. The motion analysis technique usually involves a high-speed camera and a computer that has software allowing frame-by-frame playback of the video.

<span class="mw-page-title-main">IP camera</span> Network-connected digital video camera

An Internet Protocol camera, or IP camera, is a type of digital video camera that receives control data and sends image data via an IP network. They are commonly used for surveillance, but, unlike analog closed-circuit television (CCTV) cameras, they require no local recording device, only a local area network. Most IP cameras are webcams, but the term IP camera or netcam usually applies only to those that can be directly accessed over a network connection.

Physical security information management (PSIM) is a category of software that provides a platform and applications created by middleware developers, designed to integrate multiple unconnected security applications and devices and control them through one comprehensive user interface. It collects and correlates events from existing disparate security devices and information systems to empower personnel to identify and proactively resolve situations. PSIM integration enables numerous organizational benefits, including increased control, improved situation awareness and management reporting. Ultimately, these solutions allow organizations to reduce costs through improved efficiency and to improve security through increased intelligence.

INDECT is a research project in the area of intelligent security systems performed by several European universities since 2009 and funded by the European Union. The purpose of the project is to involve European scientists and researchers in the development of solutions to and tools for automatic threat detection through e.g. processing of CCTV camera data streams, standardization of video sequence quality for user applications, threat detection in computer networks as well as data and privacy protection.

<span class="mw-page-title-main">Domain Awareness System</span>

The Domain Awareness System is the largest digital surveillance system in the world as part of the Lower Manhattan Security Initiative in partnership between the New York Police Department and Microsoft to monitor New York City. It allows the NYPD to track surveillance targets and gain detailed information about them, and is overseen by the counterterrorism bureau.

<span class="mw-page-title-main">Avigilon</span>

Avigilon is a Canadian subsidiary of Motorola Solutions, which specializes in the design and development of video analytics, network video management software, surveillance cameras, and access control products.

<span class="mw-page-title-main">Pelco</span>

Pelco Incorporated is an American security and surveillance technologies company. Founded in 1957 and headquartered in Fresno, California, Pelco is a wholly owned subsidiary of Motorola Solutions. The company's products include security cameras, recording and management systems, and video analytics software.

<span class="mw-page-title-main">Mass surveillance in India</span> Overview of mass surveillance in India

Mass surveillance is the pervasive surveillance of an entire or a substantial fraction of a population. Mass surveillance in India includes Surveillance, Telephone tapping, Open-source intelligence, Lawful interception, and surveillance under Indian Telegraph Act, 1885.

Prism Skylabs is a technology company headquartered in San Francisco, California that connects cameras within businesses to machine learning and A.I. technology in the cloud, to transform these devices into tools for Business Intelligence. Prism launched as a company in 2011 at the TechCrunch Disrupt awards, where it was featured as a finalist in the Startup Battlefield competition. Prism Skylabs' technology is deployed in over 80 countries and is used by more than 300 customers around the world.

<span class="mw-page-title-main">Artificial intelligence for video surveillance</span> Overview of artificial intelligence for surveillance

Artificial intelligence for video surveillance utilizes computer software programs that analyze the audio and images from video surveillance cameras in order to recognize humans, vehicles, objects, attributes, and events. Security contractors program the software to define restricted areas within the camera's view and program for times of day for the property being protected by the camera surveillance. The artificial intelligence ("A.I.") sends an alert if it detects a trespasser breaking the "rule" set that no person is allowed in that area during that time of day.

Kinesense is computer vision and video analytics company based in Dublin, Ireland. The company is one of largest suppliers of computer vision products to the UK police, who use the technology to search CCTV content in the course of criminal investigations.

There is an estimated 400,000 privately owned and 10,000 publicly owned security cameras in New Zealand. They are primarily used for security, but are also used for monitoring traffic, weather, dumping, and parking, among others. Taxpayers pay approximately $5.4 million per year on the running costs of security cameras, and for the five years prior to 2022, spent $29.8 million on installation costs. At least three councils use facial recognition. The police have access to over 5,000 cameras owned by businesses, councils and government agencies, which can be accessed by 4,000 police officers on their smartphones. The New Zealand Security Intelligence Service is responsible for human intelligence collection in New Zealand.

References

  1. KINECT Archived September 12, 2010, at the Wayback Machine , add-on peripheral for the Xbox 360 console
  2. Dimitrova, Nevenka, et al. "Applications of video-content analysis and retrieval." IEEE multimedia 9.3 (2002): 42-55.
  3. VCA usage increase in British Security Archived 2014-03-16 at the Wayback Machine , BSIA report
  4. Cavaliere, Danilo, Vincenzo Loia, and Sabrina Senatore. "Towards an ontology design pattern for UAV video content analysis." IEEE Access 7 (2019): 105342-105353.
  5. Cavaliere, Danilo; Loia, Vincenzo; Saggese, Alessia; Senatore, Sabrina; Vento, Mario (2019-08-15). "A human-like description of scene events for a proper UAV-based video content analysis". Knowledge-Based Systems. 178: 163–175. doi:10.1016/j.knosys.2019.04.026. ISSN   0950-7051. S2CID   155625544.
  6. Ma, Yu-Fei, et al. "A user attention model for video summarization." Proceedings of the tenth ACM international conference on Multimedia. 2002.
  7. Nik Gagvani, Introduction to Video Analytics
  8. Cheng Peng, Video Analytics
  9. Style detection Archived 2016-03-03 at the Wayback Machine , Cees G.M. Snoek et al., Detection of TV news monologues by style analysis, ICME'04
  10. Kwet, Michael (2020-01-27). "The Rise of Smart Camera Networks, and Why We Should Ban Them". The Intercept. Retrieved 2020-10-19.
  11. "Aimetis", Wikipedia, 2020-01-28, retrieved 2020-10-19
  12. "Infographic: History of Video Surveillance". IFSEC Global | Security and Fire News and Resources. 2013-12-12. Retrieved 2020-10-19.
  13. "COVID-19 makes face mask detection essential video analytics - asmag.com". www.asmag.com. Retrieved 2020-10-06.
  14. Looveren, Pieter van de. "Functionality Beyond Security: The Advent of Open Platform Cameras". www.securityinformed.com. Retrieved 2020-10-06.
  15. "StackPath". www.securityinfowatch.com. 9 July 2020. Retrieved 2020-10-06.
  16. British Industry VCA Guide Archived 2018-05-17 at the Wayback Machine , 262 An Introduction to Video Content Analysis Industry Guide
  17. UK based startup that provides audio analytics into the CCTV industry
  18. "Project Natal 101". Microsoft. 2009-06-01. Archived from the original on 2012-01-21. Retrieved 2009-06-02.
  19. "Heat map Intelligent module". Archived from the original on 2017-07-30. Retrieved 2016-07-13.
  20. i-Lids [ permanent dead link ], Benchmarking initiative by the UK Home Office
  21. "Northgate offers police forces improved CCTV analysis system". Archived from the original on 4 March 2016. Retrieved 29 Dec 2015.
  22. "Northgate teams with Dublin tech firm Kinesense to help police video analysis". Risk Manager Online. Retrieved 26 May 2014.
  23. TRECVID, Academic benchmark initiative by NIST
  24. PETS Benchmark Data Archived 2006-09-24 at the Wayback Machine , Performance Evaluation of Tracking and Surveillance (PETS) by University of Reading
  25. Center, UCF (2013-10-17). "UCF101 – Action Recognition Data Set". CRCV. Retrieved 2018-09-12.
  26. "Police Body Cameras Will Do More Than Just Record You | Fast Company | The Future Of Business". Fast Company. 2017-03-03. Retrieved 2017-03-08.
  27. P-REACT Project Website
  28. "Kinesense launches P-REACT, an FP7 project against Petty Crime". 7 April 2014. Retrieved 27 May 2014.