Crowd counting

Last updated
The Million Man March, Washington, D.C., October 1995 was the focus of a large crowd counting dispute. The million march man.jpg
The Million Man March, Washington, D.C., October 1995 was the focus of a large crowd counting dispute.

Crowd counting is known to be act of counting the total crowd present in a certain area. The people in a certain area are called a crowd. The most direct method is to actually count each person in the crowd. For example, turnstiles are often used to precisely count the number of people entering an event. [1]

Contents

Modern understanding

Since the early 2000s, there has been a shift in the understanding of the phrase “crowd counting”. Having moved from a simpler crowd counting method to that of clusters and density maps, there are several improvements for crowd counting methods. Crowd counting can also be defined as estimating the number of people present in a single picture. [2]

Methods of counting crowds

Due to the rapid progress in technology and growth of CNN (Convolutional Neural Network) over the last decade, the usage of CNN in crowd counting has skyrocketed. The CNN based methods can largely be grouped under the following different models: [3]

Jacobs' method

The most common technique for counting crowds at protests and rallies is Jacobs' method, named for its inventor, Herbert Jacobs. Jacobs' method involves dividing the area occupied by a crowd into sections, determining an average number of people in each section, and multiplying by the number of sections occupied. According to a report by Life's Little Mysteries, technologies sometimes used to assist such estimations include "lasers, satellites, aerial photography, 3-D grid systems, recorded video footage and surveillance balloons, usually tethered several blocks around an event's location and flying 400 to 800 feet (120 to 240 meters) overhead." [2]

Direct regression-based counting

This crowd counting method involves using regression on global image features to the whole image. Global image features refer to the different properties of certain areas of the photo. For example, global image features include “ contour representations, shape descriptions, texture features.” [4]

As distribution information of objects are not accounted for, object localisation cannot be processed via regressions. [5] Additionally, as this model estimates the crowd density on descriptions of crowd patterns, it ignores individual trackers. [2] This allows regression based models to be very efficient in crowded pictures; if the density per pixel is very high regression models are best suited.

Earlier crowd counting methods employed classical regression models. [6]

Density-based counting

Object density maps rely on finding the total number of objects located in a particular area. This is determined by the integral summation of the number of objects in that area. [5] Due to the density values being estimated through low values, density-based counting allows the user to experience advantages of regression-based models alongside localisation of information. [5] Localisation of information refers to the act of maintaining location information.

Strengthening crowd counting

In order to use the above-mentioned models efficiently, it is important to have a large amount of data. However, as users, we are stuck with limited data i.e. the original image. In order to compensate for these issues, we employ tricks such as random cropping. Random cropping refers to the act of randomly choosing certain sub images from the existing original image.

After performing several iterations of random cropping, the sub images are then fed into the machine learning algorithm to help the algorithm generalize better.

To tackle the problems associated with crowd counting in heavy density areas  density based counting methods can be employed. These image pyramids are generally employed for crowd counting in places where people gather to perform rituals or practice their religious beliefs. This is because there are different scales of people in different locations within the image.

However, as employing the required algorithms for image pyramids is very expensive, it is financially unstable to depend on these methods. As a result, deep fusion models can be involved. [7]

These deep fusion models will employ “neural network(s) to promote the density map regression accuracy.” [8] These models will first mark the location of each civilian within the picture. Then, the models shall decide the density maps of the area by using the “pedestrian’s location, shape, and perspective distortion.” [8]  As there are many iterations of the algorithm and scanning processes taking place, the number of people is counted via the head of the person. This is also because there will be many instances when the bodies of the civilians will be overlapping with one another.

Importance

Crowd counting plays an important role in “public safety, assembly language, and video surveillance” [9]   amongst many things. Without crowd control, through poor planning, several terrible accidents can occur. Some of the most notable ones are the Hillborough disaster which took place on April 15 in England. Another memorable incident occurred when Louis Farrakhan threatened to sue the Washington, D.C. Park Police for announcing that only 400,000 people attended the 1995 Million Man March he organized.

At events in streets or a park rather than an enclosed venue, crowd counting is more difficult and less precise. For many events, especially political rallies or protests, the number of people in a crowd carries political significance and count results are controversial. For example, the global protests against the Iraq war had many protests with widely differing counts offered by organizers on one side and the police on the other side.

Related Research Articles

<span class="mw-page-title-main">Supervised learning</span> A paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data on expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

<span class="mw-page-title-main">Artificial neural network</span> Computational model used in machine learning, based on connected, hierarchical functions

Artificial neural networks are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains.

Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression. It works by dividing a large set of points (vectors) into groups having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms. In simpler terms, vector quantization chooses a set of points to represent a larger set of points.

<span class="mw-page-title-main">Image segmentation</span> Partitioning a digital image into segments

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

<span class="mw-page-title-main">Cluster analysis</span> Grouping a set of objects by similarity

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning.

<span class="mw-page-title-main">Bootstrap aggregating</span> Ensemble method within machine learning

Bootstrap aggregating, also called bagging, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting. Although it is usually applied to decision tree methods, it can be used with any type of method. Bagging is a special case of the model averaging approach.

<span class="mw-page-title-main">Sensor fusion</span> Combining of sensor data from disparate sources

Sensor fusion is the process of combining sensor data or data derived from disparate sources such that the resulting information has less uncertainty than would be possible when these sources were used individually. For instance, one could potentially obtain a more accurate location estimate of an indoor object by combining multiple data sources such as video cameras and WiFi localization signals. The term uncertainty reduction in this case can mean more accurate, more complete, or more dependable, or refer to the result of an emerging view, such as stereoscopic vision.

<span class="mw-page-title-main">Spatial analysis</span> Formal techniques which study entities using their topological, geometric, or geographic properties

Spatial analysis is any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques using different analytic approaches, especially spatial statistics. It may be applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, or to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is geospatial analysis, the technique applied to structures at the human scale, most notably in the analysis of geographic data. It may also be applied to genomics, as in transcriptomics data.

Group method of data handling (GMDH) is a family of inductive algorithms for computer-based mathematical modeling of multi-parametric datasets that features fully automatic structural and parametric optimization of models.

In computer vision, the bag-of-words model sometimes called bag-of-visual-words model can be applied to image classification or retrieval, by treating image features as words. In document classification, a bag of words is a sparse vector of occurrence counts of words; that is, a sparse histogram over the vocabulary. In computer vision, a bag of visual words is a vector of occurrence counts of a vocabulary of local image features.

<span class="mw-page-title-main">Ensemble learning</span> Statistics and machine learning technique

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

There are many types of artificial neural networks (ANN).

In computer science, landmark detection is the process of finding significant landmarks in an image. This originally referred to finding landmarks for navigational purposes – for instance, in robot vision or creating maps from satellite images. Methods used in navigation have been extended to other fields, notably in facial recognition where it is used to identify key points on a face. It also has important applications in medicine, identifying anatomical landmarks in medical images.

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is the subset of machine learning methods which are based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

Multimedia information retrieval is a research discipline of computer science that aims at extracting semantic information from multimedia data sources. Data sources include directly perceivable media such as audio, image and video, indirectly perceivable sources such as text, semantic descriptions, biosignals as well as not perceivable sources such as bioinformation, stock prices, etc. The methodology of MMIR can be organized in three groups:

  1. Methods for the summarization of media content. The result of feature extraction is a description.
  2. Methods for the filtering of media descriptions
  3. Methods for the categorization of media descriptions into classes.
<span class="mw-page-title-main">Outline of machine learning</span> Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning:

Multi-focus image fusion is a multiple image compression technique using input images with different focus depths to make one output image that preserves all information.

Land cover maps are tools that provide vital information about the Earth's land use and cover patterns. They aid policy development, urban planning, and forest and agricultural monitoring.

Applications of machine learning in earth sciences include geological mapping, gas leakage detection and geological features identification. Machine learning (ML) is a type of artificial intelligence (AI) that enables computer systems to classify, cluster, identify and analyze vast and complex sets of data while eliminating the need for explicit instructions and programming. Earth science is the study of the origin, evolution, and future of the planet Earth. The Earth system can be subdivided into four major components including the solid earth, atmosphere, hydrosphere and biosphere.

Small object detection is a particular case of object detection where various techniques are employed to detect small objects in digital images and videos. "Small objects" are objects having a small pixel footprint in the input image. In areas such as aerial imagery, state-of-the-art object detection techniques under performed because of small objects.

References

  1. "What are Turnstiles? (with pictures)". EasyTechJunkie. Retrieved 2022-10-11.
  2. 1 2 3 Loy, Chen Change; Chen, Ke; Gong, Shaogang; Xiang, Tao (2021). "Fine-Grained Crowd Counting". IEEE Transactions on Image Processing. 30: 2114–2126. arXiv: 2007.06146 . doi:10.1109/TIP.2021.3049938. PMID   33439838. S2CID   220496399.
  3. Chu, Huanpeng; Tang, Jilin; Hu, Haoji (2021-10-01). "Attention guided feature pyramid network for crowd counting". Journal of Visual Communication and Image Representation. 80: 103319. doi:10.1016/j.jvcir.2021.103319. ISSN   1047-3203. S2CID   241591128.
  4. Lisin, Dimitri A.; Mattar, Marwan A.; Blaschko, Matthew B.; Benfield, Mark C.; Learned-Mille, Erik G. "Combining Local and Global Image Features for Object Class Recognition" (PDF).
  5. 1 2 3 Kang, D.; Ma, Z.; Chan, A. B. (May 2019). "Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Counting, Detection, and Tracking". IEEE Transactions on Circuits and Systems for Video Technology. 29 (5): 1408–1422. arXiv: 1705.10118 . doi:10.1109/TCSVT.2018.2837153. S2CID   19706288.
  6. Delussu, Rita; Putzu, Lorenzo; Fumera, Giorgio (2022). "Scene-specific crowd counting using synthetic training images". Pattern Recognition. 124: 108484. doi:10.1016/j.patcog.2021.108484. hdl: 11584/341493 . S2CID   245109866.
  7. Khan, Sultan Daud; Salih, Yasir; Zafar, Basim; Noorwali, Abdulfattah (2021-09-28). "A Deep-Fusion Network for Crowd Counting in High-Density Crowded Scenes". International Journal of Computational Intelligence Systems. 14 (1): 168. doi: 10.1007/s44196-021-00016-x . ISSN   1875-6883.
  8. 1 2 Tang, Siqi; Pan, Zhisong; Zhou, Xingyu (2017-01-01). "Low-Rank and Sparse Based Deep-Fusion Convolutional Neural Network for Crowd Counting". Mathematical Problems in Engineering. 2017: 1–11. doi: 10.1155/2017/5046727 .
  9. Chu, Huanpeng; Tang, Jilin; Hu, Haoji (2021-10-01). "Attention guided feature pyramid network for crowd counting". Journal of Visual Communication and Image Representation. 80: 103319. doi:10.1016/j.jvcir.2021.103319. ISSN   1047-3203. S2CID   241591128.