A visual sensor network or smart camera network or intelligent camera network is a network of spatially distributed smart camera devices capable of processing, exchanging data and fusing images of a scene from a variety of viewpoints into some form more useful than the individual images. [1] [2] [3] A visual sensor network may be a type of wireless sensor network, and much of the theory and application of the latter applies to the former. The network generally consists of the cameras themselves, which have some local image processing, communication and storage capabilities, and possibly one or more central computers, where image data from multiple cameras is further processed and fused (this processing may, however, simply take place in a distributed fashion across the cameras and their local controllers). Visual sensor networks also provide some high-level services to the user so that the large amount of data can be distilled into information of interest using specific queries. [4] [5] [6]
The primary difference between visual sensor networks and other types of sensor networks is the nature and volume of information the individual sensors acquire: unlike most sensors, cameras are directional in their field of view, and they capture a large amount of visual information which may be partially processed independently of data from other cameras in the network. Alternatively, one may say that while most sensors measure some value such as temperature or pressure, visual sensors measure patterns. In light of this, communication in visual sensor networks differs substantially from traditional sensor networks.
Visual sensor networks are most useful in applications involving area surveillance, tracking, and environmental monitoring. Of particular use in surveillance applications is the ability to perform a dense 3D reconstruction of a scene and storing data over a period of time, so that operators can view events as they unfold over any period of time (including the current moment) from any arbitrary viewpoint in the covered area, even allowing them to "fly" around the scene in real time. High-level analysis using object recognition and other techniques can intelligently track objects (such as people or cars) through a scene, and even determine what they are doing so that certain activities could be automatically brought to the operator's attention. Another possibility is the use of visual sensor networks in telecommunications, where the network would automatically select the "best" view (perhaps even an arbitrarily generated one) of a live event.
Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.
Smartdust is a system of many tiny microelectromechanical systems (MEMS) such as sensors, robots, or other devices, that can detect, for example, light, temperature, vibration, magnetism, or chemicals. They are usually operated on a computer network wirelessly and are distributed over some area to perform tasks, usually sensing through radio-frequency identification. Without an antenna of much greater size the range of tiny smart dust communication devices is measured in a few millimeters and they may be vulnerable to electromagnetic disablement and destruction by microwave exposure.
Wireless sensor networks (WSNs) refer to networks of spatially dispersed and dedicated sensors that monitor and record the physical conditions of the environment and forward the collected data to a central location. WSNs can measure environmental conditions such as temperature, sound, pollution levels, humidity and wind.
A smart camera (sensor) or intelligent camera (sensor) or (smart) vision sensor or intelligent vision sensor or smart optical sensor or intelligent optical sensor or smart visual sensor or intelligent visual sensor is a machine vision system which, in addition to image capture circuitry, is capable of extracting application-specific information from the captured images, along with generating event descriptions or making decisions that are used in an intelligent and automated system. A smart camera is a self-contained, standalone vision system with built-in image sensor in the housing of an industrial video camera. The vision system and the image sensor can be integrated into one single piece of hardware known as intelligent image sensor or smart image sensor. It contains all necessary communication interfaces, e.g. Ethernet, as well as industry-proof 24V I/O lines for connection to a PLC, actuators, relays or pneumatic valves, and can be either static or mobile. It is not necessarily larger than an industrial or surveillance camera. A capability in machine vision generally means a degree of development such that these capabilities are ready for use on individual applications. This architecture has the advantage of a more compact volume compared to PC-based vision systems and often achieves lower cost, at the expense of a somewhat simpler (or omitted) user interface. Smart cameras are also referred to by the more general term smart sensors.
Sensor fusion is the process of combining sensor data or data derived from disparate sources such that the resulting information has less uncertainty than would be possible when these sources were used individually. For instance, one could potentially obtain a more accurate location estimate of an indoor object by combining multiple data sources such as video cameras and WiFi localization signals. The term uncertainty reduction in this case can mean more accurate, more complete, or more dependable, or refer to the result of an emerging view, such as stereoscopic vision.
In computer networking, linear network coding is a program in which intermediate nodes transmit data from source nodes to sink nodes by means of linear combinations.
Edge computing is a distributed computing paradigm that brings computation and data storage closer to the sources of data. This is expected to improve response times and save bandwidth. Edge computing is an architecture rather than a specific technology, and a topology- and location-sensitive form of distributed computing.
A wireless ad hoc network (WANET) or mobile ad hoc network (MANET) is a decentralized type of wireless network. The network is ad hoc because it does not rely on a pre-existing infrastructure, such as routers or wireless access points. Instead, each node participates in routing by forwarding data for other nodes. The determination of which nodes forward data is made dynamically on the basis of network connectivity and the routing algorithm in use.
The image fusion process is defined as gathering all the important information from multiple images, and their inclusion into fewer images, usually a single one. This single image is more informative and accurate than any single source image, and it consists of all the necessary information. The purpose of image fusion is not only to reduce the amount of data but also to construct images that are more appropriate and understandable for the human and machine perception. In computer vision, multisensor image fusion is the process of combining relevant information from two or more images into a single image. The resulting image will be more informative than any of the input images.
The Internet of things (IoT) describes devices with sensors, processing ability, software and other technologies that connect and exchange data with other devices and systems over the Internet or other communications networks. The Internet of things encompasses electronics, communication and computer science engineering. Internet of things has been considered a misnomer because devices do not need to be connected to the public internet, they only need to be connected to a network, and be individually addressable.
An indoor positioning system (IPS) is a network of devices used to locate people or objects where GPS and other satellite technologies lack precision or fail entirely, such as inside multistory buildings, airports, alleys, parking garages, and underground locations.
A sensor grid integrates wireless sensor networks with grid computing concepts to enable real-time data collection and the sharing of computational and storage resources for sensor data processing and management. It is an enabling technology for building large-scale infrastructures, integrating heterogeneous sensor, data and computational resources deployed over a wide area, to undertake complicated surveillance tasks such as environmental monitoring.
Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents' actions and the environmental conditions. Since the 1980s, this research field has captured the attention of several computer science communities due to its strength in providing personalized support for many different applications and its connection to many different fields of study such as medicine, human-computer interaction, or sociology.
Visual privacy is the relationship between collection and dissemination of visual information, the expectation of privacy, and the legal issues surrounding them. These days digital cameras are ubiquitous. They are one of the most common sensors found in electronic devices, ranging from smartphones to tablets, and laptops to surveillance cams. However, privacy and trust implications surrounding it limit its ability to seamlessly blend into computing environment. In particular, large-scale camera networks have created increasing interest in understanding the advantages and disadvantages of such deployments. It is estimated that over 4 million CCTV cameras deployed in the UK. Due to increasing security concerns, camera networks have continued to proliferate across other countries such as the United States. While the impact of such systems continues to be evaluated, in parallel, tools for controlling how these camera networks are used and modifications to the images and video sent to end-users have been explored.
Tomasz Imieliński is a Polish-American computer scientist, most known in the areas of data mining, mobile computing, data extraction, and search engine technology. He is currently a professor of computer science at Rutgers University in New Jersey, United States.
Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events.
Fog computing or fog networking, also known as fogging, is an architecture that uses edge devices to carry out a substantial amount of computation, storage, and communication locally and routed over the Internet backbone.
Apache SINGA is an Apache top-level project for developing an open source machine learning library. It provides a flexible architecture for scalable distributed training, is extensible to run over a wide range of hardware, and has a focus on health-care applications.
Transition refers to a computer science paradigm in the context of communication systems which describes the change of communication mechanisms, i.e., functions of a communication system, in particular, service and protocol components. In a transition, communication mechanisms within a system are replaced by functionally comparable mechanisms with the aim to ensure the highest possible quality, e.g., as captured by the quality of service.