Pedestrian detection

Last updated September 30, 2025

Pedestrian detection is an essential and significant task in any intelligent video surveillance system, as it provides the fundamental information for semantic understanding of the video footages. It has an obvious extension to automotive applications due to the potential for improving safety systems. Many car manufacturers (e.g. Volvo, Ford, GM, Nissan) offer this as an ADAS option in 2017.

Challenges

Various style of clothing in appearance
Different possible articulations
The presence of occluding accessories
Frequent occlusion between pedestrians

Existing approaches

Despite the challenges, pedestrian detection still remains an active research area in computer vision in recent years. Numerous approaches have been proposed.

Holistic detection

Detectors are trained to search for pedestrians in the video frame by scanning the whole frame. The detector would “fire” if the image features inside the local search window meet certain criteria. Some methods employ global features such as edge template ,^[1] others uses local features like histogram of oriented gradients ^[2] descriptors. The drawback of this approach is that the performance can be easily affected by background clutter and occlusions.

Part-based detection

Pedestrians are modeled as collections of parts. Part hypotheses are firstly generated by learning local features, which include edgelet^[3] and orientation features.^[4] These part hypotheses are then joined to form the best assembly of existing pedestrian hypotheses. Though this approach is attractive, part detection itself is a difficult task. Implementation of this approach follows a standard procedure for processing the image data that consists of first creating a densely sampled image pyramid, computing features at each scale, performing classification at all possible locations, and finally performing non-maximal suppression to generate the final set of bounding boxes.^[5]

Patch-based detection

In 2005, Leibe et al.^[6] proposed an approach combining both the detection and segmentation with the name Implicit Shape Model (ISM). A codebook of local appearance is learned during the training process. In the detecting process, extracted local features are used to match against the codebook entries, and each match casts one vote for the pedestrian hypotheses. Final detection results can be obtained by further refining those hypotheses. The advantage of this approach is only a small number of training images are required.

Motion-based detection

When the conditions permit (fixed camera, stationary lighting conditions, etc.), background subtraction can help to detect pedestrians. Background subtraction classifies the pixels of video streams as either background, where no motion is detected, or foreground, where motion is detected. This procedure highlights the silhouettes (the connected components in the foreground) of every moving element in the scene, including people. An algorithm has been developed,^[7]^[8] at the university of Liège, to analyze the shape of these silhouettes in order to detect the humans. Since the methods that consider the silhouette as a whole and perform a single classification are, in general, highly sensitive to shape defects, a part-based method splitting the silhouettes in a set of smaller regions has been proposed to decrease the influence of defects. To the contrary of other part-based approaches, these regions do not have any anatomical meaning. This algorithm has been extended to the detection of humans in 3D video streams.^[9]

Detection using multiple cameras

Fleuret et al.^[10] suggested a method for integrating multiple calibrated cameras for detecting multiple pedestrians. In this approach, The ground plane is partitioned into uniform, non-overlapping grid cells, typically with size of 25 by 25 (cm). The detector produces a Probability Occupancy Map (POM), it provides an estimation of the probability of each grid cell to be occupied by a person. Given two to four synchronized video streams taken at eye level and from different angles, this method can effectively combine a generative model with dynamic programming to accurately follow up to six individuals across thousands of frames in spite of significant occlusions and lighting changes. It can also derive metrically accurate trajectories for each one of them.

Related seminal work

References

↑ C. Papageorgiou and T. Poggio, "A Trainable Pedestrian Detection system", International Journal of Computer Vision (IJCV), pages 1:15–33, 2000
↑ N. Dalal, B. Triggs, “Histograms of oriented gradients for human detection”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pages 1:886–893, 2005
↑ Bo Wu and Ram Nevatia, "Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors", IEEE International Conference on Computer Vision (ICCV), pages 1:90–97, 2005
↑ Mikolajczyk, K. and Schmid, C. and Zisserman, A. "Human detection based on a probabilistic assembly of robust part detectors", The European Conference on Computer Vision (ECCV), volume 3021/2004, pages 69–82, 2005
↑ Hyunggi Cho, Paul E. Rybski, Aharon Bar-Hillel and Wende Zhang "Real-time Pedestrian Detection with Deformable Part Models"
↑ B.Leibe, E. Seemann, and B. Schiele. "Pedestrian detection in crowded scenes" IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 1:878–885, 2005
↑ O. Barnich, S. Jodogne, and M. Van Droogenbroeck. "Robust analysis of silhouettes by morphological size distributions" Advanced Concepts for Intelligent Vision Systems(ACIVS), pages 734–745, 2006
↑ S. Piérard, A. Lejeune, and M. Van Droogenbroeck. "A probabilistic pixel-based approach to detect humans in video streams" IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), pages 921–924, 2011
↑ S. Piérard, A. Lejeune, and M. Van Droogenbroeck. "3D information is valuable for the detection of humans in video streams" Proceedings of 3D Stereo MEDIA, pages 1–4, 2010
↑ F. Fleuret, J. Berclaz, R. Lengagne and P. Fua, Multi-Camera People Tracking with a Probabilistic Occupancy Map, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, Nr. 2, pp. 267–282, February 2008.

External links

Code for POM – Pedestrian Detection from multiple cameras using Probabilistic Occupancy Map
Pedestrian detection system for heavy equipment Archived 2018-05-16 at the Wayback Machine – Example of pedestrian detection system
Blaxtair pedestrian detection system for mobile plant

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] C. Papageorgiou and T. Poggio, "A Trainable Pedestrian Detection system", International Journal of Computer Vision (IJCV), pages 1:15–33, 2000

[2] N. Dalal, B. Triggs, “Histograms of oriented gradients for human detection”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pages 1:886–893, 2005

[3] Bo Wu and Ram Nevatia, "Detection of Multiple, Partially Occluded Humans in a Single Image by Bayesian Combination of Edgelet Part Detectors", IEEE International Conference on Computer Vision (ICCV), pages 1:90–97, 2005

[4] Mikolajczyk, K. and Schmid, C. and Zisserman, A. "Human detection based on a probabilistic assembly of robust part detectors", The European Conference on Computer Vision (ECCV), volume 3021/2004, pages 69–82, 2005

[5] Hyunggi Cho, Paul E. Rybski, Aharon Bar-Hillel and Wende Zhang "Real-time Pedestrian Detection with Deformable Part Models"

[6] B.Leibe, E. Seemann, and B. Schiele. "Pedestrian detection in crowded scenes" IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 1:878–885, 2005

[7] O. Barnich, S. Jodogne, and M. Van Droogenbroeck. "Robust analysis of silhouettes by morphological size distributions" Advanced Concepts for Intelligent Vision Systems(ACIVS), pages 734–745, 2006

[8] S. Piérard, A. Lejeune, and M. Van Droogenbroeck. "A probabilistic pixel-based approach to detect humans in video streams" IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), pages 921–924, 2011

[9] S. Piérard, A. Lejeune, and M. Van Droogenbroeck. "3D information is valuable for the detection of humans in video streams" Proceedings of 3D Stereo MEDIA, pages 1–4, 2010

[10] F. Fleuret, J. Berclaz, R. Lengagne and P. Fua, Multi-Camera People Tracking with a Probabilistic Occupancy Map, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, Nr. 2, pp. 267–282, February 2008.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]