Small object detection

Last updated December 14, 2023

Small object detection is a particular case of object detection where various techniques are employed to detect small objects in digital images and videos. "Small objects" are objects having a small pixel footprint in the input image. In areas such as aerial imagery, state-of-the-art object detection techniques under performed because of small objects.

Uses

An example of object tracking

Small object detection has applications in various fields such as Video surveillance (Traffic video Surveillance,^[1]^[2] Small object retrieval,^[3]^[4] Anomaly detection,^[5] Maritime surveillance, Drone surveying, Traffic flow analysis,^[6] and Object tracking.

Problems with small objects

Modern-day object detection algorithms such as You Only Look Once(YOLO)^[7]^[8]^[9]^[10]^[11]^[12]^[13] heavily uses convolution layers to learn features. As an object passes through convolution layers, its size gets reduced. Therefore, the small object disappears after several layers and becomes undetectable.
Sometimes, the shadow of an object is detected as a part of object itself.^[14] So, the placement of the bounding box tends to centre around a shadow rather than an object. In the case of vehicle detection, pedestrian and two-wheeler detection suffer because of this.
At present, drones are very widely used in aerial imagery.^[15] They are equipped with hardware (sensors) and software (algorithms) that help maintain a particular stable position during their flight. In windy conditions, the drone automatically makes fine moves to maintain its position and that changes the view near the boundary. It may be possible that some new objects appear near the image boundary. Overall, these affect classification, detection, and eventually tracking accuracy.

Shadow and drone movement effect

Methods

Various methods^[16] are available to detect small objects, which fall under three categories:

YOLOv5 detection result Yolov5.jpg — YOLOv5 detection result

YOLOv7 detection output Yolov7.jpg — YOLOv7 detection output

Improvising existing techniques

There are various ways to detect small objects with existing techniques. Some of them are mentioned below,

Choosing a data set that has small objects

The machine learning model's output depends on "How well it is trained."^[17] So, the data set must include small objects to detect such objects. Also, modern-day detectors, such as YOLO, rely on anchors.^[18] Latest versions of YOLO (starting from YOLOv5^[19]) uses an auto-anchor algorithm to find good anchors based on the nature of object sizes in the data set. Therefore, it is mandatory to have smaller objects in the data set.

Generating more data via augmentation, if required

Deep learning models have billions of neurons that settle down to some weights after training. Therefore, it requires a good amount of quantitative and qualitative data for better training.^[20] Data augmentation is useful technique to generate more diverse data^[17] from an existing data set.

Increasing image capture resolution and model’s input resolution

These help to get more features from objects and eventually learn the best from them. For example, a bike object in the 1280 X 1280 resolution image has more features than the 640 X 640 resolution.

Auto learning anchors

Selecting anchor size plays a vital role in small object detection.^[21] Instead of hand picking it, use algorithms that identify it based on the data set. YOLOv5 uses a K-means algorithm to define anchor size.

Tiling approach during training and inference

State-of-the-art object detectors allow only the fixed size of image and change the input image size according to it. This change may deform the small objects in the image. The tiling approach^[22] helps when an image has a high resolution than the model's fixed input size; instead of scaling it down, the image is broken down into tiles and then used in training. The same approach is used during inference as well.

Feature Pyramid Network (FPN)

Use a feature pyramid network^[23] to learn features at a multi-scale: e.g., Twin Feature Pyramid Networks (TFPN),^[24] Extended Feature Pyramid Network (EFPN).^[25] FPN helps to sustain features of small objects against convolution layers.

Add-on techniques

Instead of modifying existing methods, some add-on techniques are there, which can be directly placed on top of existing approaches to detect smaller objects. One such technique is Slicing Aided Hyper Inference(SAHI).^[26] The image is sliced into different-sized multiple overlapping patches. Hyper-parameters define their dimensions. Then patches are resized, while maintaining the aspect ratio during fine-tuning. These patches are then provided for training the model.

Well-Optimised techniques for small object detection

Various deep learning techniques are available that focus on such object detection problems: e.g., Feature-Fused SSD,^[27] YOLO-Z.^[28] Such methods work on "How to sustain features of small objects while they pass through convolution networks."

Other applications

Crowd counting^[29]^[30]^[31]^[32]
Vehicle re-identification^[33]
Animal detection^[34]^[35]^[36]^[37]
Fish detection^[38]

Related Research Articles

<span class="mw-page-title-main">Object detection</span> Computer technology related to computer vision and image processing

Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance.

Deep learning is the subset of machine learning methods based on artificial neural networks with representation learning. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

<span class="mw-page-title-main">MNIST database</span> Database of handwritten digits

The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns feature engineering by itself via filters optimization. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by using regularized weights over fewer connections. For example, for each neuron in the fully-connected layer 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution kernels, only 25 neurons are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features.

An AI accelerator or neural processing unit is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2018, a typical AI integrated circuit chip contains billions of MOSFET transistors. A number of vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.

The CIFAR-10 dataset is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par or outperform hand-designed architectures. Methods for NAS can be categorized according to the search space, search strategy and performance estimation strategy used:

A Siamese neural network is an artificial neural network that uses the same weights while working in tandem on two different input vectors to compute comparable output vectors. Often one of the output vectors is precomputed, thus forming a baseline against which the other output vector is compared. This is similar to comparing fingerprints but can be described more technically as a distance function for locality-sensitive hashing.

An event camera, also known as a neuromorphic camera, silicon retina or dynamic vision sensor, is an imaging sensor that responds to local changes in brightness. Event cameras do not capture images using a shutter as conventional (frame) cameras do. Instead, each pixel inside an event camera operates independently and asynchronously, reporting changes in brightness as they occur, and staying silent otherwise.

Energy-based generative neural networks is a class of generative models, which aim to learn explicit probability distributions of data in the form of energy-based models whose energy functions are parameterized by modern deep neural networks. Its name is due to the fact that this model can be derived from the discriminative neural networks. The parameter of the neural network in this model is trained in a generative manner by Markov chain Monte Carlo(MCMC)-based maximum likelihood estimation. The learning process follows an ''analysis by synthesis'' scheme, where within each learning iteration, the algorithm samples the synthesized examples from the current model by a gradient-based MCMC method, e.g., Langevin dynamics, and then updates the model parameters based on the difference between the training examples and the synthesized ones. This process can be interpreted as an alternating mode seeking and mode shifting process, and also has an adversarial interpretation. The first energy-based generative neural network is the generative ConvNet proposed in 2016 for image patterns, where the neural network is a convolutional neural network. The model has been generalized to various domains to learn distributions of videos, and 3D voxels. They are made more effective in their variants. They have proven useful for data generation, data recovery, data reconstruction.

An audio deepfake is a type of artificial intelligence used to create convincing speech sentences that sound like specific people saying things they did not say. This technology was initially developed for various applications to improve human life. For example, it can be used to produce audiobooks, and also to help people who have lost their voices to get them back. Commercially, it has opened the door to several opportunities. This technology can also create more personalized digital assistants and natural-sounding text-to-speech as well as speech translation services.

Machine learning-based attention is a mechanism mimicking cognitive attention. It calculates "soft" weights for each word, more precisely for its embedding, in the context window. It can do it either in parallel or sequentially. "Soft" weights can change during each runtime, in contrast to "hard" weights, which are (pre-)trained and fine-tuned and remain frozen afterwards.

Video super-resolution (VSR) is the process of generating high-resolution video frames from the given low-resolution video frames. Unlike single-image super-resolution (SISR), the main goal is not only to restore more fine details while saving coarse ones, but also to preserve motion consistency.

Self-supervised learning (SSL) is a paradigm in machine learning where a model is trained on a task using the data itself to generate supervisory signals, rather than relying on external labels provided by humans. In the context of neural networks, self-supervised learning aims to leverage inherent structures or relationships within the input data to create meaningful training signals. SSL tasks are designed so that solving it requires capturing essential features or relationships in the data. The input data is typically augmented or transformed in a way that creates pairs of related samples. One sample serves as the input, and the other is used to formulate the supervisory signal. This augmentation can involve introducing noise, cropping, rotation, or other transformations. Self-supervised learning more closely imitates the way humans learn to classify objects.

The Fashion MNIST dataset is a large freely available database of fashion images that is commonly used for training and testing various machine learning systems. Fashion-MNIST was intended to serve as a replacement for the original MNIST database for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.

Alexander Wong is a professor in the Department of Systems Design Engineering and a Co-Director of the Vision and Image Processing Research Group at the University of Waterloo. He is the Canada Research Chair in Artificial Intelligence and Medical Imaging, a Founding Member of the Waterloo Artificial Intelligence Institute and a Member of the College of the Royal Society of Canada and a Fellow of the Institute of Engineering and Technology. He is also a Fellow of the Institute of Physics, a Fellow in the International Society for Design and Development in Education, a Fellow of the Royal Society for Public Health and a Fellow of the Royal Society of Medicine.

Tensor informally refers in machine learning to two different concepts that organize and represent data. Data may be organized in a multidimensional array (M-way array) that is informally referred to as a "data tensor"; however in the strict mathematical sense, a tensor is a multilinear mapping over a set of domain vector spaces to a range vector space. Observations, such as images, movies, volumes, sounds, and relationships among words and concepts, stored in an M-way array ("data tensor") may be analyzed either by artificial neural networks or tensor methods.

Xiaoming Liu is a Chinese-American computer scientist and an academic. He is a Professor in the Department of Computer Science and Engineering, MSU Foundation Professor as well as Anil K. and Nandita Jain Endowed Professor of Engineering at Michigan State University.

References

↑ Saran K B; Sreelekha G (2015). "Traffic video surveillance: Vehicle detection and classification". 2015 International Conference on Control Communication & Computing India (ICCC). Trivandrum, Kerala, India: IEEE. pp. 516–521. doi:10.1109/ICCC.2015.7432948. ISBN 978-1-4673-7349-4. S2CID 14779393.
↑ Nemade, Bhushan (2016-01-01). "Automatic Traffic Surveillance Using Video Tracking". Procedia Computer Science. Proceedings of International Conference on Communication, Computing and Virtualization (ICCCV) 2016. 79: 402–409. doi: 10.1016/j.procs.2016.03.052 . ISSN 1877-0509.
↑ Guo, Haiyun; Wang, Jinqiao; Xu, Min; Zha, Zheng-Jun; Lu, Hanqing (2015-10-13). "Learning Multi-view Deep Features for Small Object Retrieval in Surveillance Scenarios". Proceedings of the 23rd ACM international conference on Multimedia. MM '15. New York, NY, USA: Association for Computing Machinery. pp. 859–862. doi:10.1145/2733373.2806349. ISBN 978-1-4503-3459-4. S2CID 9041849.
↑ Galiyawala, Hiren; Raval, Mehul S.; Patel, Meet (2022-05-20). "Person retrieval in surveillance videos using attribute recognition". Journal of Ambient Intelligence and Humanized Computing. doi:10.1007/s12652-022-03891-0. ISSN 1868-5145. S2CID 248951090.
↑ Ingle, Palash Yuvraj; Kim, Young-Gab (2022-05-19). "Real-Time Abnormal Object Detection for Video Surveillance in Smart Cities". Sensors. 22 (10): 3862. Bibcode:2022Senso..22.3862I. doi: 10.3390/s22103862 . ISSN 1424-8220. PMC 9143895 . PMID 35632270.
↑ Tsuboi, Tsutomu; Yoshikawa, Noriaki (2020-03-01). "Traffic flow analysis in Ahmedabad (India)". Case Studies on Transport Policy. 8 (1): 215–228. doi: 10.1016/j.cstp.2019.06.001 . ISSN 2213-624X. S2CID 195543435.
↑ Redmon, Joseph; Divvala, Santosh; Girshick, Ross; Farhadi, Ali (2016-05-09). "You Only Look Once: Unified, Real-Time Object Detection". arXiv: 1506.02640 [cs.CV].
↑ Redmon, Joseph; Farhadi, Ali (2016-12-25). "YOLO9000: Better, Faster, Stronger". arXiv: 1612.08242 [cs.CV].
↑ Redmon, Joseph; Farhadi, Ali (2018-04-08). "YOLOv3: An Incremental Improvement". arXiv: 1804.02767 [cs.CV].
↑ Bochkovskiy, Alexey; Wang, Chien-Yao; Liao, Hong-Yuan Mark (2020-04-22). "YOLOv4: Optimal Speed and Accuracy of Object Detection". arXiv: 2004.10934 [cs.CV].
↑ Wang, Chien-Yao; Bochkovskiy, Alexey; Liao, Hong-Yuan Mark (2021-02-21). "Scaled-YOLOv4: Scaling Cross Stage Partial Network". arXiv: 2011.08036 [cs.CV].
↑ Li, Chuyi; Li, Lulu; Jiang, Hongliang; Weng, Kaiheng; Geng, Yifei; Li, Liang; Ke, Zaidan; Li, Qingyuan; Cheng, Meng; Nie, Weiqiang; Li, Yiduo; Zhang, Bo; Liang, Yufei; Zhou, Linyuan; Xu, Xiaoming (2022-09-07). "YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications". arXiv: 2209.02976 [cs.CV].
↑ Wang, Chien-Yao; Bochkovskiy, Alexey; Liao, Hong-Yuan Mark (2022-07-06). "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors". arXiv: 2207.02696 [cs.CV].
↑ Zhang, Mingrui; Zhao, Wenbing; Li, Xiying; Wang, Dan (2020-12-11). "Shadow Detection of Moving Objects in Traffic Monitoring Video". 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). Vol. 9. Chongqing, China: IEEE. pp. 1983–1987. doi:10.1109/ITAIC49862.2020.9338958. ISBN 978-1-7281-5244-8. S2CID 231824327.
↑ "Interactive workshop "How drones are changing the world we live in"". 2016 Integrated Communications Navigation and Surveillance (ICNS). Herndon, VA: IEEE. 2016. pp. 1–17. doi:10.1109/ICNSURV.2016.7486437. ISBN 978-1-5090-2149-9. S2CID 21388151.
↑ Nguyen, Nhat-Duy; Do, Tien; Ngo, Thanh Duc; Le, Duy-Dinh (2020). "An Evaluation of Deep Learning Methods for Small Object Detection". Journal of Electrical and Computer Engineering. 2020: 1–18. doi: 10.1155/2020/3189691 .
1 2 Gong, Zhiqiang; Zhong, Ping; Hu, Weidong (2019). "Diversity in Machine Learning". IEEE Access. 7: 64323–64350. doi: 10.1109/ACCESS.2019.2917620 . ISSN 2169-3536. S2CID 206491718.
↑ Christiansen, Anders (2022-06-10). "Anchor Boxes — The key to quality object detection". Medium. Retrieved 2022-09-14.
↑ Jocher, Glenn; Chaurasia, Ayush; Stoken, Alex; Borovec, Jirka; NanoCode012; Kwon, Yonghye; TaoXie; Michael, Kalen; Fang, Jiacong (2022-08-17). "ultralytics/yolov5: v6.2 - YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai integrations". doi:10.5281/zenodo.3908559 . Retrieved 2022-09-14.{{cite journal}}: Cite journal requires |journal= (help)CS1 maint: numeric names: authors list (link)
↑ "The Size and Quality of a Data Set | Machine Learning". Google Developers. Retrieved 2022-09-14.
↑ Zhong, Yuanyi; Wang, Jianfeng; Peng, Jian; Zhang, Lei (2020-01-26). "Anchor Box Optimization for Object Detection". arXiv: 1812.00469 [cs.CV].
↑ Unel, F. Ozge; Ozkalayci, Burak O.; Cigla, Cevahir (2019). "The Power of Tiling for Small Object Detection". 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Long Beach, CA, USA: IEEE. pp. 582–591. doi:10.1109/CVPRW.2019.00084. ISBN 978-1-7281-2506-0. S2CID 198903617.
↑ Lin, Tsung-Yi; Dollár, Piotr; Girshick, Ross; He, Kaiming; Hariharan, Bharath; Belongie, Serge (2017-04-19). "Feature Pyramid Networks for Object Detection". arXiv: 1612.03144 [cs.CV].
↑ Liang, Yi; Changjian, Wang; Fangzhao, Li; Yuxing, Peng; Qin, Lv; Yuan, Yuan; Zhen, Huang (2019). "TFPN: Twin Feature Pyramid Networks for Object Detection". 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). Portland, OR, USA: IEEE. pp. 1702–1707. doi:10.1109/ICTAI.2019.00251. ISBN 978-1-7281-3798-8. S2CID 211211764.
↑ Deng, Chunfang; Wang, Mengmeng; Liu, Liang; Liu, Yong (2020-04-09). "Extended Feature Pyramid Network for Small Object Detection". arXiv: 2003.07021 [cs.CV].
↑ Akyon, Fatih Cagatay; Altinuc, Sinan Onur; Temizel, Alptekin (2022-07-12). "Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection". 2022 IEEE International Conference on Image Processing (ICIP). pp. 966–970. arXiv: 2202.06934 . doi:10.1109/ICIP46576.2022.9897990. ISBN 978-1-6654-9620-9. S2CID 246823962.
↑ Cao, Guimei; Xie, Xuemei; Yang, Wenzhe; Liao, Quan; Shi, Guangming; Wu, Jinjian (2018-04-10). "Feature-fused SSD: Fast detection for small objects". In Dong, Junyu; Yu, Hui (eds.). Ninth International Conference on Graphic and Image Processing (ICGIP 2017). Vol. 10615. SPIE. pp. 381–388. arXiv: 1709.05054 . Bibcode:2018SPIE10615E..1EC. doi:10.1117/12.2304811. ISBN 9781510617414. S2CID 20592770.
↑ Benjumea, Aduen; Teeti, Izzeddin; Cuzzolin, Fabio; Bradley, Andrew (2021-12-23). "YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles". arXiv: 2112.11798 [cs.CV].
↑ Rajendran, Logesh; Shyam Shankaran, R (2021). "Bigdata Enabled Realtime Crowd Surveillance Using Artificial Intelligence and Deep Learning". 2021 IEEE International Conference on Big Data and Smart Computing (BigComp). Jeju Island, Korea (South): IEEE. pp. 129–132. doi:10.1109/BigComp51126.2021.00032. ISBN 978-1-7281-8924-6. S2CID 232236614.
↑ Sivachandiran, S.; Mohan, K. Jagan; Nazer, G. Mohammed (2022-03-29). "Deep Transfer Learning Enabled High-Density Crowd Detection and Classification using Aerial Images". 2022 6th International Conference on Computing Methodologies and Communication (ICCMC). Erode, India: IEEE. pp. 1313–1317. doi:10.1109/ICCMC53470.2022.9753982. ISBN 978-1-6654-1028-1. S2CID 248131806.
↑ Santhini, C.; Gomathi, V. (2018). "Crowd Scene Analysis Using Deep Learning Network". 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT). pp. 1–5. doi:10.1109/ICCTCT.2018.8550851. ISBN 978-1-5386-3702-9. S2CID 54438440.
↑ Sharath, S.V.; Biradar, Vidyadevi; Prajwal, M.S.; Ashwini, B. (2021-11-19). "Crowd Counting in High Dense Images using Deep Convolutional Neural Network". 2021 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER). Nitte, India: IEEE. pp. 30–34. doi:10.1109/DISCOVER52564.2021.9663716. ISBN 978-1-6654-1244-5. S2CID 245707782.
↑ Wang, Hongbo; Hou, Jiaying; Chen, Na (2019). "A Survey of Vehicle Re-Identification Based on Deep Learning". IEEE Access. 7: 172443–172469. doi: 10.1109/ACCESS.2019.2956172 . ISSN 2169-3536. S2CID 209319743.
↑ Santhanam, Sanjay; B, Sudhir Sidhaarthan; Panigrahi, Sai Sudha; Kashyap, Suryakant Kumar; Duriseti, Bhargav Krishna (2021-11-26). "Animal Detection for Road safety using Deep Learning". 2021 International Conference on Computational Intelligence and Computing Applications (ICCICA). Nagpur, India: IEEE. pp. 1–5. doi:10.1109/ICCICA52458.2021.9697287. ISBN 978-1-6654-2040-2. S2CID 246663727.
↑ Li, Nopparut; Kusakunniran, Worapan; Hotta, Seiji (2020). "Detection of Animal Behind Cages Using Convolutional Neural Network". 2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). Phuket, Thailand: IEEE. pp. 242–245. doi:10.1109/ECTI-CON49241.2020.9158137. ISBN 978-1-7281-6486-1. S2CID 221086279.
↑ Oishi, Yu; Matsunaga, Tsuneo (2010). "Automatic detection of moving wild animals in airborne remote sensing images". 2010 IEEE International Geoscience and Remote Sensing Symposium. pp. 517–519. doi:10.1109/IGARSS.2010.5654227. ISBN 978-1-4244-9565-8. S2CID 16812504.
↑ Ramanan, D.; Forsyth, D.A.; Barnard, K. (2006). "Building models of animals from video". IEEE Transactions on Pattern Analysis and Machine Intelligence. 28 (8): 1319–1334. doi:10.1109/TPAMI.2006.155. ISSN 0162-8828. PMID 16886866. S2CID 1699015.
↑ Cui, Suxia; Zhou, Yu; Wang, Yonghui; Zhai, Lujun (2020). "Fish Detection Using Deep Learning". Applied Computational Intelligence and Soft Computing. 2020: 1–13. doi: 10.1155/2020/3738108 .

External links

VisDrone dataset by AISKYEYE team at Lab of Machine Learning and Data Mining, Tianjin University, China.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Saran K B; Sreelekha G (2015). "Traffic video surveillance: Vehicle detection and classification". 2015 International Conference on Control Communication & Computing India (ICCC). Trivandrum, Kerala, India: IEEE. pp. 516–521. doi:10.1109/ICCC.2015.7432948. ISBN 978-1-4673-7349-4. S2CID 14779393.

[2] Nemade, Bhushan (2016-01-01). "Automatic Traffic Surveillance Using Video Tracking". Procedia Computer Science. Proceedings of International Conference on Communication, Computing and Virtualization (ICCCV) 2016. 79: 402–409. doi: 10.1016/j.procs.2016.03.052 . ISSN 1877-0509.

[3] Guo, Haiyun; Wang, Jinqiao; Xu, Min; Zha, Zheng-Jun; Lu, Hanqing (2015-10-13). "Learning Multi-view Deep Features for Small Object Retrieval in Surveillance Scenarios". Proceedings of the 23rd ACM international conference on Multimedia. MM '15. New York, NY, USA: Association for Computing Machinery. pp. 859–862. doi:10.1145/2733373.2806349. ISBN 978-1-4503-3459-4. S2CID 9041849.

[4] Galiyawala, Hiren; Raval, Mehul S.; Patel, Meet (2022-05-20). "Person retrieval in surveillance videos using attribute recognition". Journal of Ambient Intelligence and Humanized Computing. doi:10.1007/s12652-022-03891-0. ISSN 1868-5145. S2CID 248951090.

[5] Ingle, Palash Yuvraj; Kim, Young-Gab (2022-05-19). "Real-Time Abnormal Object Detection for Video Surveillance in Smart Cities". Sensors. 22 (10): 3862. Bibcode:2022Senso..22.3862I. doi: 10.3390/s22103862 . ISSN 1424-8220. PMC 9143895 . PMID 35632270.

[6] Tsuboi, Tsutomu; Yoshikawa, Noriaki (2020-03-01). "Traffic flow analysis in Ahmedabad (India)". Case Studies on Transport Policy. 8 (1): 215–228. doi: 10.1016/j.cstp.2019.06.001 . ISSN 2213-624X. S2CID 195543435.

[7] Redmon, Joseph; Divvala, Santosh; Girshick, Ross; Farhadi, Ali (2016-05-09). "You Only Look Once: Unified, Real-Time Object Detection". arXiv: 1506.02640 [cs.CV].

[8] Redmon, Joseph; Farhadi, Ali (2016-12-25). "YOLO9000: Better, Faster, Stronger". arXiv: 1612.08242 [cs.CV].

[9] Redmon, Joseph; Farhadi, Ali (2018-04-08). "YOLOv3: An Incremental Improvement". arXiv: 1804.02767 [cs.CV].

[10] Bochkovskiy, Alexey; Wang, Chien-Yao; Liao, Hong-Yuan Mark (2020-04-22). "YOLOv4: Optimal Speed and Accuracy of Object Detection". arXiv: 2004.10934 [cs.CV].

[11] Wang, Chien-Yao; Bochkovskiy, Alexey; Liao, Hong-Yuan Mark (2021-02-21). "Scaled-YOLOv4: Scaling Cross Stage Partial Network". arXiv: 2011.08036 [cs.CV].

[12] Li, Chuyi; Li, Lulu; Jiang, Hongliang; Weng, Kaiheng; Geng, Yifei; Li, Liang; Ke, Zaidan; Li, Qingyuan; Cheng, Meng; Nie, Weiqiang; Li, Yiduo; Zhang, Bo; Liang, Yufei; Zhou, Linyuan; Xu, Xiaoming (2022-09-07). "YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications". arXiv: 2209.02976 [cs.CV].

[13] Wang, Chien-Yao; Bochkovskiy, Alexey; Liao, Hong-Yuan Mark (2022-07-06). "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors". arXiv: 2207.02696 [cs.CV].

[14] Zhang, Mingrui; Zhao, Wenbing; Li, Xiying; Wang, Dan (2020-12-11). "Shadow Detection of Moving Objects in Traffic Monitoring Video". 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). Vol. 9. Chongqing, China: IEEE. pp. 1983–1987. doi:10.1109/ITAIC49862.2020.9338958. ISBN 978-1-7281-5244-8. S2CID 231824327.

[15] "Interactive workshop "How drones are changing the world we live in"". 2016 Integrated Communications Navigation and Surveillance (ICNS). Herndon, VA: IEEE. 2016. pp. 1–17. doi:10.1109/ICNSURV.2016.7486437. ISBN 978-1-5090-2149-9. S2CID 21388151.

[16] Nguyen, Nhat-Duy; Do, Tien; Ngo, Thanh Duc; Le, Duy-Dinh (2020). "An Evaluation of Deep Learning Methods for Small Object Detection". Journal of Electrical and Computer Engineering. 2020: 1–18. doi: 10.1155/2020/3189691 .

[:0-17] 1 2 Gong, Zhiqiang; Zhong, Ping; Hu, Weidong (2019). "Diversity in Machine Learning". IEEE Access. 7: 64323–64350. doi: 10.1109/ACCESS.2019.2917620 . ISSN 2169-3536. S2CID 206491718.

[18] Christiansen, Anders (2022-06-10). "Anchor Boxes — The key to quality object detection". Medium. Retrieved 2022-09-14.

[19] Jocher, Glenn; Chaurasia, Ayush; Stoken, Alex; Borovec, Jirka; NanoCode012; Kwon, Yonghye; TaoXie; Michael, Kalen; Fang, Jiacong (2022-08-17). "ultralytics/yolov5: v6.2 - YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai integrations". doi:10.5281/zenodo.3908559 . Retrieved 2022-09-14.{{cite journal}}: Cite journal requires |journal= (help)CS1 maint: numeric names: authors list (link)

[20] "The Size and Quality of a Data Set | Machine Learning". Google Developers. Retrieved 2022-09-14.

[21] Zhong, Yuanyi; Wang, Jianfeng; Peng, Jian; Zhang, Lei (2020-01-26). "Anchor Box Optimization for Object Detection". arXiv: 1812.00469 [cs.CV].

[22] Unel, F. Ozge; Ozkalayci, Burak O.; Cigla, Cevahir (2019). "The Power of Tiling for Small Object Detection". 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Long Beach, CA, USA: IEEE. pp. 582–591. doi:10.1109/CVPRW.2019.00084. ISBN 978-1-7281-2506-0. S2CID 198903617.

[23] Lin, Tsung-Yi; Dollár, Piotr; Girshick, Ross; He, Kaiming; Hariharan, Bharath; Belongie, Serge (2017-04-19). "Feature Pyramid Networks for Object Detection". arXiv: 1612.03144 [cs.CV].

[24] Liang, Yi; Changjian, Wang; Fangzhao, Li; Yuxing, Peng; Qin, Lv; Yuan, Yuan; Zhen, Huang (2019). "TFPN: Twin Feature Pyramid Networks for Object Detection". 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). Portland, OR, USA: IEEE. pp. 1702–1707. doi:10.1109/ICTAI.2019.00251. ISBN 978-1-7281-3798-8. S2CID 211211764.

[25] Deng, Chunfang; Wang, Mengmeng; Liu, Liang; Liu, Yong (2020-04-09). "Extended Feature Pyramid Network for Small Object Detection". arXiv: 2003.07021 [cs.CV].

[26] Akyon, Fatih Cagatay; Altinuc, Sinan Onur; Temizel, Alptekin (2022-07-12). "Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection". 2022 IEEE International Conference on Image Processing (ICIP). pp. 966–970. arXiv: 2202.06934 . doi:10.1109/ICIP46576.2022.9897990. ISBN 978-1-6654-9620-9. S2CID 246823962.

[27] Cao, Guimei; Xie, Xuemei; Yang, Wenzhe; Liao, Quan; Shi, Guangming; Wu, Jinjian (2018-04-10). "Feature-fused SSD: Fast detection for small objects". In Dong, Junyu; Yu, Hui (eds.). Ninth International Conference on Graphic and Image Processing (ICGIP 2017). Vol. 10615. SPIE. pp. 381–388. arXiv: 1709.05054 . Bibcode:2018SPIE10615E..1EC. doi:10.1117/12.2304811. ISBN 9781510617414. S2CID 20592770.

[28] Benjumea, Aduen; Teeti, Izzeddin; Cuzzolin, Fabio; Bradley, Andrew (2021-12-23). "YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles". arXiv: 2112.11798 [cs.CV].

[29] Rajendran, Logesh; Shyam Shankaran, R (2021). "Bigdata Enabled Realtime Crowd Surveillance Using Artificial Intelligence and Deep Learning". 2021 IEEE International Conference on Big Data and Smart Computing (BigComp). Jeju Island, Korea (South): IEEE. pp. 129–132. doi:10.1109/BigComp51126.2021.00032. ISBN 978-1-7281-8924-6. S2CID 232236614.

[30] Sivachandiran, S.; Mohan, K. Jagan; Nazer, G. Mohammed (2022-03-29). "Deep Transfer Learning Enabled High-Density Crowd Detection and Classification using Aerial Images". 2022 6th International Conference on Computing Methodologies and Communication (ICCMC). Erode, India: IEEE. pp. 1313–1317. doi:10.1109/ICCMC53470.2022.9753982. ISBN 978-1-6654-1028-1. S2CID 248131806.

[31] Santhini, C.; Gomathi, V. (2018). "Crowd Scene Analysis Using Deep Learning Network". 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT). pp. 1–5. doi:10.1109/ICCTCT.2018.8550851. ISBN 978-1-5386-3702-9. S2CID 54438440.

[32] Sharath, S.V.; Biradar, Vidyadevi; Prajwal, M.S.; Ashwini, B. (2021-11-19). "Crowd Counting in High Dense Images using Deep Convolutional Neural Network". 2021 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER). Nitte, India: IEEE. pp. 30–34. doi:10.1109/DISCOVER52564.2021.9663716. ISBN 978-1-6654-1244-5. S2CID 245707782.

[33] Wang, Hongbo; Hou, Jiaying; Chen, Na (2019). "A Survey of Vehicle Re-Identification Based on Deep Learning". IEEE Access. 7: 172443–172469. doi: 10.1109/ACCESS.2019.2956172 . ISSN 2169-3536. S2CID 209319743.

[34] Santhanam, Sanjay; B, Sudhir Sidhaarthan; Panigrahi, Sai Sudha; Kashyap, Suryakant Kumar; Duriseti, Bhargav Krishna (2021-11-26). "Animal Detection for Road safety using Deep Learning". 2021 International Conference on Computational Intelligence and Computing Applications (ICCICA). Nagpur, India: IEEE. pp. 1–5. doi:10.1109/ICCICA52458.2021.9697287. ISBN 978-1-6654-2040-2. S2CID 246663727.

[35] Li, Nopparut; Kusakunniran, Worapan; Hotta, Seiji (2020). "Detection of Animal Behind Cages Using Convolutional Neural Network". 2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). Phuket, Thailand: IEEE. pp. 242–245. doi:10.1109/ECTI-CON49241.2020.9158137. ISBN 978-1-7281-6486-1. S2CID 221086279.

[36] Oishi, Yu; Matsunaga, Tsuneo (2010). "Automatic detection of moving wild animals in airborne remote sensing images". 2010 IEEE International Geoscience and Remote Sensing Symposium. pp. 517–519. doi:10.1109/IGARSS.2010.5654227. ISBN 978-1-4244-9565-8. S2CID 16812504.

[37] Ramanan, D.; Forsyth, D.A.; Barnard, K. (2006). "Building models of animals from video". IEEE Transactions on Pattern Analysis and Machine Intelligence. 28 (8): 1319–1334. doi:10.1109/TPAMI.2006.155. ISSN 0162-8828. PMID 16886866. S2CID 1699015.

[38] Cui, Suxia; Zhou, Yu; Wang, Yonghui; Zhai, Lujun (2020). "Fish Detection Using Deep Learning". Applied Computational Intelligence and Soft Computing. 2020: 1–13. doi: 10.1155/2020/3738108 .

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]