Visual sensor network

Last updated

A visual sensor network is a network of spatially distributed smart camera devices capable of processing and fusing images of a scene from a variety of viewpoints into some form more useful than the individual images. A visual sensor network may be a type of wireless sensor network, and much of the theory and application of the latter applies to the former. The network generally consists of the cameras themselves, which have some local image processing, communication and storage capabilities, and possibly one or more central computers, where image data from multiple cameras is further processed and fused (this processing may, however, simply take place in a distributed fashion across the cameras and their local controllers). Visual sensor networks also provide some high-level services to the user so that the large amount of data can be distilled into information of interest using specific queries. [1] [2] [3]

Smart camera

A smart camera or intelligent camera is a machine vision system which, in addition to image capture circuitry, is capable of extracting application-specific information from the captured images, along with generating event descriptions or making decisions that are used in an intelligent and automated system. A smart camera is a self-contained, standalone vision system with built-in image sensor in the housing of an industrial video camera. It contains all necessary communication interfaces, e.g. Ethernet, as well as industry-proof 24V I/O lines for connection to a PLC, actuators, relays or pneumatic valves. It is not necessarily larger than an industrial or surveillance camera. A capability in machine vision generally means a degree of development such that these capabilities are ready for use on individual applications. This architecture has the advantage of a more compact volume compared to PC-based vision systems and often achieves lower cost, at the expense of a somewhat simpler user interface. Less powerful versions are often referred to as smart sensors.

Wireless sensor network

Wireless sensor network (WSN) refers to a group of spatially dispersed and dedicated sensors for monitoring and recording the physical conditions of the environment and organizing the collected data at a central location. WSNs measure environmental conditions like temperature, sound, pollution levels, humidity, wind, and so on.

Sensor fusion

Sensor fusion is combining of sensory data or data derived from disparate sources such that the resulting information has less uncertainty than would be possible when these sources were used individually. The term uncertainty reduction in this case can mean more accurate, more complete, or more dependable, or refer to the result of an emerging view, such as stereoscopic vision.


The primary difference between visual sensor networks and other types of sensor networks is the nature and volume of information the individual sensors acquire: unlike most sensors, cameras are directional in their field of view, and they capture a large amount of visual information which may be partially processed independently of data from other cameras in the network. Alternatively, one may say that while most sensors measure some value such as temperature or pressure, visual sensors measure patterns. In light of this, communication in visual sensor networks differs substantially from traditional sensor networks.

Field of view extent of the observable world seen at any given moment

The field of view (FoV) is the extent of the observable world that is seen at any given moment. In the case of optical instruments or sensors it is a solid angle through which a detector is sensitive to electromagnetic radiation.


Visual sensor networks are most useful in applications involving area surveillance, tracking, and environmental monitoring. Of particular use in surveillance applications is the ability to perform a dense 3D reconstruction of a scene and storing data over a period of time, so that operators can view events as they unfold over any period of time (including the current moment) from any arbitrary viewpoint in the covered area, even allowing them to "fly" around the scene in real time. High-level analysis using object recognition and other techniques can intelligently track objects (such as people or cars) through a scene, and even determine what they are doing so that certain activities could be automatically brought to the operator's attention. Another possibility is the use of visual sensor networks in telecommunications, where the network would automatically select the "best" view (perhaps even an arbitrarily generated one) of a live event.

Surveillance monitoring of behavior, activities, or other changing information

In espionage and counterintelligence, surveillance is the monitoring of behavior, activities, or other changing information for the purpose of influencing, managing, directing, or protecting people. This can include observation from a distance by means of electronic equipment or interception of electronically transmitted information. It can also include simple no- or relatively low-technology methods such as human intelligence agent and postal interception. The word surveillance comes from a French phrase for "watching over" and is in contrast to more recent developments such as sousveillance.

See also

Computer vision is an interdisciplinary scientific field that deals with how computers can be made to gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do.

Related Research Articles

Automatic image annotation is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

Hyperspectral imaging

Hyperspectral imaging, like other spectral imaging, collects and processes information from across the electromagnetic spectrum. The goal of hyperspectral imaging is to obtain the spectrum for each pixel in the image of a scene, with the purpose of finding objects, identifying materials, or detecting processes. There are two general branches of spectral imagers. There are push broom scanners and the related whisk broom scanners, which read images over time, and snapshot hyperspectral imaging, which uses a staring array to generate an image in an instant.

The image fusion process is defined as gathering all the important information from multiple images, and their inclusion into fewer images, usually a single one. This single image is more informative and accurate than any single source image, and it consists of all the necessary information. The purpose of image fusion is not only to reduce the amount of data but also to construct images that are more appropriate and understandable for the human and machine perception. In computer vision, Multisensor Image fusion is the process of combining relevant information from two or more images into a single image. The resulting image will be more informative than any of the input images.

Sensor node

A sensor node, also known as a mote, is a node in a sensor network that is capable of performing some processing, gathering sensory information and communicating with other connected nodes in the network. A mote is a node but a node is not always a mote.

Virtual graffiti consists of virtual objects and/or digital messages, images, multimedia or other annotations or graphics applied to public locations, landmarks or surfaces such as walls, train stations, bridges, etc. Virtual graffiti applications utilize augmented reality and ubiquitous computing to anchor virtual graffiti to physical landmarks or objects in the real world. The virtual content is then viewable through devices such as personal computers, smartglasses, set-top boxes or mobile handsets, such as mobile phones or PDAs. The virtual world provides content, graphics, and applications to the user that are not available in the real world. Virtual graffiti is a novel initiative aimed at delivering messaging and social multimedia content to mobile applications and devices based on the location, identity, and community of the participating entity.

An indoor positioning system (IPS) is a system used to locate objects or people inside a building using lights, radio waves, magnetic fields, acoustic signals, or other sensory information. There are several commercial systems on the market, but there is no standard for an IPS system.

A sensor grid integrates wireless sensor networks with grid computing concepts to enable real-time sensor data collection and the sharing of computational and storage resources for sensor data processing and management. It is an enabling technology for building large-scale infrastructures, integrating heterogeneous sensor, data and computational resources deployed over a wide area, to undertake complicated surveillance tasks such as environmental monitoring.

An area of computer vision is active vision, sometimes also called active computer vision. An active vision system is one that can manipulate the viewpoint of the camera(s) in order to investigate the environment and get better information from it.

Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents' actions and the environmental conditions. Since the 1980s, this research field has captured the attention of several computer science communities due to its strength in providing personalized support for many different applications and its connection to many different fields of study such as medicine, human-computer interaction, or sociology.

3D reconstruction

In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape in time, this is referred to as non-rigid or spatio-temporal reconstruction.

Foveated imaging

Foveated imaging is a digital image processing technique in which the image resolution, or amount of detail, varies across the image according to one or more "fixation points." A fixation point indicates the highest resolution region of the image and corresponds to the center of the eye's retina, the fovea.

Visual Privacy is the relationship between collection and dissemination of visual information, the expectation of privacy, and the legal issues surrounding them. These days cameras are ubiquitous. They are one of the most common sensors found in electronic devices, ranging from smartphones to tablets, and laptops to surveillance cams. However, privacy and trust implications surrounding it limit its ability to seamlessly blend into our computing environment. In particular, large-scale camera networks have created increasing interest in understanding the advantages and disadvantages of such deployments. It is estimated that over 4 Million Cameras Deployed in the UK. Due to increasing security concerns, camera networks have continued to proliferate across other countries such as the United States. While the impact of such systems continues to be evaluated, in parallel, tools for controlling how these camera networks are used and modifications to the images or video sent to end-users have been explored.

Dust Networks, Inc. is an American company specializing in the design and manufacture of wireless sensor networks for industrial applications including process monitoring, condition monitoring, asset management, Environment, Health and Safety (EHS) monitoring and power management. They were acquired by Linear Technology, Inc in December 2011, which in turn was acquired by Analog Devices, Inc in 2017. The Dust Networks product team operates in the IoT Networking Platforms group of Analog Devices.

Unattended ground sensor

The unattended ground sensor (UGS) is under development as part of the United States Army's Future Combat Systems Program. For information on currently fielded UGS systems, refer to the Current Force UGS Program or CF UGS.

Multi-focus image fusion is a multiple image compression technique using input images with different focus depths to make an output image that preserves information. In visual sensor network (VSN), sensors are cameras which record images and video sequences. In many applications of VSN, a camera can't give a perfect illustration including all details of the scene. This is because of the limited depth of focus exists in the optical lens of cameras. Therefore, just the object located in the focal length of camera is focused and cleared and the other parts of image are blurred. VSN has an ability to capture images with different depth of focuses in the scene using several cameras. Due to the large amount of data generated by camera compared to other sensors such as pressure and temperature sensors and some limitation such as limited band width, energy consumption and processing time, it is essential to process the local input images to decrease the amount of transmission data. The aforementioned reasons emphasize the necessary of multi-focus images fusion. Multi-focus image fusion is a process which combines the input multi-focus images into a single image including all important information of the input images and it's more accurate explanation of the scene than every single input image.


  1. Obraczka, K.; Manduchi, R.; Garcia-Luna-Aveces, J. J. (October 2002). Managing the Information Flow in Visual Sensor Networks (PDF). Proc. 5th Intl. Symposium on Wireless Personal Multimedia Communications. 3. pp. 1177–1181. CiteSeerX . doi:10.1109/WPMC.2002.1088364. ISBN   978-0-7803-7442-3.
  2. Akdere, M.; Centintemel, U.; Crispell, D.; Jannotti, J.; Mao, J.; Taubin, G. (October 2006). "Data-Centric Visual Sensor Networks for 3D Sensing" (PDF). Proc. 2nd Intl. Conf. On Geosensor Networks.
  3. Castanedo, F., Patricio, M. A., García, J., and Molina, J. M. 2006. Extending surveillance systems capabilities using BDI cooperative sensor agents. In Proceedings of the 4th ACM international Workshop on Video Surveillance and Sensor Networks (Santa Barbara, California, USA, October 27 – 27, 2006). VSSN '06. ACM Press, New York, NY, 131–138. DOI=
  1. ^ Cheng Qian, Hairong Qi: Coverage Estimation in the Presence of Occlusions for Visual Sensor Networks. DCOSS 2008: 346–356

  1. ^ Soro S., Heinzelman W.: A Survey of Visual Sensor Networks, Advances in Multimedia, vol. 2009, Article ID 640386, 21 pages, 2009. doi:10.1155/2009/640386
  1. ^ Yang Bai, Hairong Qi: Feature-Based Image Comparison for Semantic Neighbor Selection in Resource-Constrained Visual Sensor Networks. EURASIP Journal on Image and Video Processing, Volume 2010 (2010).