Visual privacy

Last updated


Visual privacy is the relationship between collection and dissemination of visual information, the expectation of privacy, and the legal issues surrounding them. These days digital cameras are ubiquitous. They are one of the most common sensors found in electronic devices, ranging from smartphones to tablets, and laptops to surveillance cams. However, privacy and trust implications surrounding it limit its ability to seamlessly blend into computing environment. In particular, large-scale camera networks have created increasing interest in understanding the advantages and disadvantages of such deployments. It is estimated that over 7 million CCTV cameras deployed in the UK. [1] Due to increasing security concerns, camera networks have continued to proliferate across other countries such as the United States. While the impact of such systems continues to be evaluated, in parallel, tools for controlling how these camera networks are used and modifications to the images and video sent to end-users have been explored.

Contents


Forms of Visual Data

Visual Privacy is often typically applied to particular technologies including:

Technologies enhancing visual privacy

Different forms of technologies are explored to enhance or preserve privacy while providing information collected from camera networks. Most of these solutions rely upon the target application and try to accomplish it in a privacy-preserving manner:

Visual privacy hence encompasses privacy aware and privacy preserving systems which factor in the compute design choices, [8] privacy policies regarding data-sharing in a collaborative and distributive environment and data ownership itself. At times privacy and trust are interlinked especially for the adoption and wide-scale acceptance of any technology. Having a fair and accurate computer vision model goes a long way into ensuring the prior two. A lot of developers are also now inculcating perspectives from Privacy by design. These include but are not limited to processing all user sensitive data on the edge client device, decreasing data retentivity, and ensuring that the data is not used for anything it is not intended for.

See also

Related Research Articles

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

<span class="mw-page-title-main">Closed-circuit television</span> Use of video cameras to transmit a signal to a specific place on a limited set of monitors

Closed-circuit television (CCTV), also known as video surveillance, is the use of closed-circuit television cameras to transmit a signal to a specific place, on a limited set of monitors. It differs from broadcast television in that the signal is not openly transmitted, though it may employ point-to-point, point-to-multipoint (P2MP), or mesh wired or wireless links. Even though almost all video cameras fit this definition, the term is most often applied to those used for surveillance in areas that require additional security or ongoing monitoring.

<span class="mw-page-title-main">Surveillance</span> Monitoring something for the purposes of influencing, protecting, or suppressing it

Surveillance is the monitoring of behavior, many activities, or information for the purpose of information gathering, influencing, managing, or directing. This can include observation from a distance by means of electronic equipment, such as closed-circuit television (CCTV), or interception of electronically transmitted information like Internet traffic. It can also include simple technical methods, such as human intelligence gathering and postal interception.

<span class="mw-page-title-main">Gesture recognition</span> Topic in computer science and language technology

Gesture recognition is an area of research and development in computer science and language technology concerned with the recognition and interpretation of human gestures. A subdiscipline of computer vision, it employs mathematical algorithms to interpret gestures.

<span class="mw-page-title-main">Smart camera</span> Machine vision system

A smart camera (sensor) or intelligent camera (sensor) or (smart) vision sensor or intelligent vision sensor or smart optical sensor or intelligent optical sensor or smart visual sensor or intelligent visual sensor is a machine vision system which, in addition to image capture circuitry, is capable of extracting application-specific information from the captured images, along with generating event descriptions or making decisions that are used in an intelligent and automated system. A smart camera is a self-contained, standalone vision system with built-in image sensor in the housing of an industrial video camera. The vision system and the image sensor can be integrated into one single piece of hardware known as intelligent image sensor or smart image sensor. It contains all necessary communication interfaces, e.g. Ethernet, as well as industry-proof 24V I/O lines for connection to a PLC, actuators, relays or pneumatic valves, and can be either static or mobile. It is not necessarily larger than an industrial or surveillance camera. A capability in machine vision generally means a degree of development such that these capabilities are ready for use on individual applications. This architecture has the advantage of a more compact volume compared to PC-based vision systems and often achieves lower cost, at the expense of a somewhat simpler (or omitted) user interface. Smart cameras are also referred to by the more general term smart sensors.

<span class="mw-page-title-main">Automatic image annotation</span>

Automatic image annotation is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.

A people counter is an electronic device that is used to measure the number of people traversing a certain passage or entrance. Examples include simple manual clickers, smart-flooring technologies, infrared beams, thermal imaging systems, WiFi trackers and video counters using advanced machine learning algorithms. They are commonly used by retail establishments to judge the effectiveness of marketing campaigns, building design and layout, and the popularity of particular brands.

A visual sensor network or smart camera network or intelligent camera network is a network of spatially distributed smart camera devices capable of processing, exchanging data and fusing images of a scene from a variety of viewpoints into some form more useful than the individual images. A visual sensor network may be a type of wireless sensor network, and much of the theory and application of the latter applies to the former. The network generally consists of the cameras themselves, which have some local image processing, communication and storage capabilities, and possibly one or more central computers, where image data from multiple cameras is further processed and fused. Visual sensor networks also provide some high-level services to the user so that the large amount of data can be distilled into information of interest using specific queries.

Articulated body pose estimation in computer vision is the study of algorithms and systems that recover the pose of an articulated body, which consists of joints and rigid parts using image-based observations. It is one of the longest-lasting problems in computer vision because of the complexity of the models that relate observation with pose, and because of the variety of situations in which it would be useful.

An area of computer vision is active vision, sometimes also called active computer vision. An active vision system is one that can manipulate the viewpoint of the camera(s) in order to investigate the environment and get better information from it.

Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents' actions and the environmental conditions. Since the 1980s, this research field has captured the attention of several computer science communities due to its strength in providing personalized support for many different applications and its connection to many different fields of study such as medicine, human-computer interaction, or sociology.

Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

Egocentric vision or first-person vision is a sub-field of computer vision that entails analyzing images and videos captured by a wearable camera, which is typically worn on the head or on the chest and naturally approximates the visual field of the camera wearer. Consequently, visual data capture the part of the scene on which the user focuses to carry out the task at hand and offer a valuable perspective to understand the user's activities and their context in a naturalistic setting.

<span class="mw-page-title-main">Gregory D. Hager</span> American computer scientist

Gregory D. Hager is the Mandell Bellmore Professor of Computer Science and founding director of the Johns Hopkins Malone Center for Engineering in Healthcare at Johns Hopkins University.

Jiebo Luo is a Chinese-American computer scientist, the Albert Arendt Hopeman Professor of Engineering and Professor of Computer Science at the University of Rochester. He is interested in artificial intelligence, data science and computer vision.

<span class="mw-page-title-main">Michael J. Black</span> American-born computer scientist

Michael J. Black is an American-born computer scientist working in Tübingen, Germany. He is a founding director at the Max Planck Institute for Intelligent Systems where he leads the Perceiving Systems Department in research focused on computer vision, machine learning, and computer graphics. He is also an Honorary Professor at the University of Tübingen.

<span class="mw-page-title-main">Rita Cucchiara</span> Italian electrical and computer engineer (born 1965)

Rita Cucchiara is an Italian electrical and computer engineer, and professor in Computer engineering and Science in the Enzo Ferrari Department of Engineering at the University of Modena and Reggio Emilia (UNIMORE) in Italy. She helds the courses of “Computer Architecture” and “Computer Vision and Cognitive Systems”. Cucchiara's research work focuses on artificial intelligence, specifically deep network technologies and computer vision for human behavior understanding (HBU) and visual, language and multimodal generative AI. She is the scientific coordinator of the AImage Lab at UNIMORE and is director of the Artificial Intelligence Research and Innovation Center (AIRI) as well as the ELLIS Unit at Modena. She was founder and director from 2018 to 2021 of the Italian National Lab of Artificial Intelligence and intelligent systems AIIS of CINI. Cucchiara was also president of the CVPL from 2016 to 2018. Rita Cucchiara is IAPR Fellow since 2006 and ELLIS Fellow since 2020.

Gérard G. Medioni is a computer scientist, author, academic and inventor. He is a vice president and distinguished scientist at Amazon and serves as emeritus professor of Computer Science at the University of Southern California.

References

  1. "How Many CCTV Cameras in London? UK CCTV Numbers (Updated 2022)". Clarion UK. 2022-10-04. Retrieved 2024-06-18.
  2. Jeremy Schiff; Marci Meingast; Deirdre K. Mulligan; Shankar Sastry; Ken Goldberg (2007). "Respectful Cameras: Detecting Visual Markers in Real-Time to Address Privacy Concerns". International Conference on Intelligent Robots and Systems (IROS). San Diego, California. October 2007.
  3. "Street View revisits Manhattan".
  4. "Eptascape, Inc. MPEG-7 Video Analytics". www.eptascape.com. Archived from the original on 21 June 2008. Retrieved 13 January 2022.
  5. "Cardea: Context–Aware Visual Privacy Protection for Photo Taking and Sharing" (PDF). Archived from the original (PDF) on 2018-11-08. Retrieved 2023-12-24.
  6. Pittaluga, Francesco; Koppal, Sanjeev J. (June 2015). "Privacy preserving optics for miniature vision sensors". 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE. pp. 314–324. CiteSeerX   10.1.1.944.2193 . doi:10.1109/CVPR.2015.7298628. ISBN   9781467369640. S2CID   14056410.
  7. Hinojosa, Carlos; Niebles, Juan Carlos; Arguello, Henry (October 2021). "Learning Privacy-preserving Optics for Human Pose Estimation". 2021 IEEE International Conference on Computer Vision (ICCV). Virtual, USA: IEEE/CVF: 2573–2582.
  8. Koelle, Marion; Wolf, Katrin; Boll, Susanne (2018). "Beyond LED Status Lights - Design Requirements of Privacy Notices for Body-worn Cameras". Proceedings of the Twelfth International Conference on Tangible, Embedded, and Embodied Interaction. Tei '18. Stockholm, Sweden: ACM Press. pp. 177–187. doi:10.1145/3173225.3173234. ISBN   9781450355681. S2CID   3954480.