Visual privacy

Last updated

Visual privacy is the relationship between collection and dissemination of visual information, the expectation of privacy, and the legal issues surrounding them. These days digital cameras are ubiquitous. They are one of the most common sensors found in electronic devices, ranging from smartphones to tablets, and laptops to surveillance cams. However, privacy and trust implications surrounding it limit its ability to seamlessly blend into computing environment. In particular, large-scale camera networks have created increasing interest in understanding the advantages and disadvantages of such deployments. It is estimated that over 4 million CCTV cameras deployed in the UK. [1] Due to increasing security concerns, camera networks have continued to proliferate across other countries such as the United States. While the impact of such systems continues to be evaluated, in parallel, tools for controlling how these camera networks are used and modifications to the images and video sent to end-users have been explored.

Contents

Technologies

To enhance visual privacy, a number of different technologies have been suggested.

Forms of Visual Data

Visual Privacy is often typically applied to particular technologies including:

Systems

Many different forms of technologies are explored to preserve privacy while providing information collected from camera networks. Most of these solutions rely upon the target application and try to accomplish it in a privacy-preserving manner:

Visual privacy hence encompasses privacy aware and privacy preserving systems which factor in the compute design choices, [8] privacy policies regarding data-sharing in a collaborative and distributive environment and data ownership itself. At times privacy and trust are interlinked especially for the adoption and wide-scale acceptance of any technology. Having a fair and accurate computer vision model goes a long way into ensuring the prior two. A lot of developers are also now inculcating perspectives from Privacy by design. These include but are not limited to processing all user sensitive data on the edge client device, decreasing data retentivity, and ensuring that the data is not used for anything it is not intended for.

Related Research Articles

<span class="mw-page-title-main">Computer vision</span> Computerized information extraction from images

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

<span class="mw-page-title-main">Closed-circuit television</span> Use of video cameras to transmit a signal to a specific place on a limited set of monitors

Closed-circuit television (CCTV), also known as video surveillance, is the use of video cameras to transmit a signal to a specific place, on a limited set of monitors. It differs from broadcast television in that the signal is not openly transmitted, though it may employ point-to-point (P2P), point-to-multipoint (P2MP), or mesh wired or wireless links. Even though almost all video cameras fit this definition, the term is most often applied to those used for surveillance in areas that require additional security or ongoing monitoring.

<span class="mw-page-title-main">Surveillance</span> Monitoring something for the purposes of influencing, protecting, or suppressing it

Surveillance is the monitoring of behavior, many activities, or information for the purpose of information gathering, influencing, managing or directing. This can include observation from a distance by means of electronic equipment, such as closed-circuit television (CCTV), or interception of electronically transmitted information like Internet traffic. It can also include simple technical methods, such as human intelligence gathering and postal interception.

<span class="mw-page-title-main">Sousveillance</span> Recording of an activity by a participant

Sousveillance is the recording of an activity by a member of the public, rather than a person or organisation in authority, typically by way of small wearable or portable personal technologies. The term, coined by Steve Mann, stems from the contrasting French words sur, meaning "above", and sous, meaning "below", i.e. "surveillance" denotes the "eye-in-the-sky" watching from above, whereas "sousveillance" denotes bringing the means of observation down to human level, either physically or hierarchically.

<span class="mw-page-title-main">Facial recognition system</span> Technology capable of matching a face from an image against a database of faces

A facial recognition system is a technology potentially capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and works by pinpointing and measuring facial features from a given image.

<span class="mw-page-title-main">Gesture recognition</span> Topic in computer science and language technology

Gesture recognition is an area of research and development in computer science and language technology concerned with the recognition and interpretation of human gestures. A subdiscipline of computer vision, it employs mathematical algorithms to interpret gestures. Gestures can originate from any bodily motion or state, but commonly originate from the face or hand. One area of the field is emotion recognition derived from facial expressions and hand gestures. Users can make simple gestures to control or interact with devices without physically touching them. Many approaches have been made using cameras and computer vision algorithms to interpret sign language, however, the identification and recognition of posture, gait, proxemics, and human behaviors is also the subject of gesture recognition techniques. Gesture recognition is a path for computers to begin to better understand and interpret human body language, previously not possible through text or unenhanced graphical (GUI) user interfaces.

<span class="mw-page-title-main">Smart camera</span> Machine vision system

A smart camera (sensor) or intelligent camera (sensor) or (smart) vision sensor or intelligent vision sensor or smart optical sensor or intelligent optical sensor or smart visual sensor or intelligent visual sensor is a machine vision system which, in addition to image capture circuitry, is capable of extracting application-specific information from the captured images, along with generating event descriptions or making decisions that are used in an intelligent and automated system. A smart camera is a self-contained, standalone vision system with built-in image sensor in the housing of an industrial video camera. The vision system and the image sensor can be integrated into one single piece of hardware known as intelligent image sensor or smart image sensor. It contains all necessary communication interfaces, e.g. Ethernet, as well as industry-proof 24V I/O lines for connection to a PLC, actuators, relays or pneumatic valves, and can be either static or mobile. It is not necessarily larger than an industrial or surveillance camera. A capability in machine vision generally means a degree of development such that these capabilities are ready for use on individual applications. This architecture has the advantage of a more compact volume compared to PC-based vision systems and often achieves lower cost, at the expense of a somewhat simpler (or omitted) user interface. Smart cameras are also referred to by the more general term smart sensors.

A people counter is an electronic device that is used to measure the number of people traversing a certain passage or entrance. Examples include simple manual clickers, smart-flooring technologies, infrared beams, thermal imaging systems, WiFi trackers and video counters using advanced machine learning algorithms. They are commonly used by retail establishments to judge the effectiveness of marketing campaigns, building design and layout, and the popularity of particular brands.

A visual sensor network or smart camera network or intelligent camera network is a network of spatially distributed smart camera devices capable of processing, exchanging data and fusing images of a scene from a variety of viewpoints into some form more useful than the individual images. A visual sensor network may be a type of wireless sensor network, and much of the theory and application of the latter applies to the former. The network generally consists of the cameras themselves, which have some local image processing, communication and storage capabilities, and possibly one or more central computers, where image data from multiple cameras is further processed and fused. Visual sensor networks also provide some high-level services to the user so that the large amount of data can be distilled into information of interest using specific queries.

<span class="mw-page-title-main">Takeo Kanade</span> Japanese computer scientist

Takeo Kanade is a Japanese computer scientist and one of the world's foremost researchers in computer vision. He is U.A. and Helen Whitaker Professor at Carnegie Mellon School of Computer Science. He has approximately 300 peer-reviewed academic publications and holds around 20 patents.

Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents' actions and the environmental conditions. Since the 1980s, this research field has captured the attention of several computer science communities due to its strength in providing personalized support for many different applications and its connection to many different fields of study such as medicine, human-computer interaction, or sociology.

<span class="mw-page-title-main">Lifelog</span> Personal record of ones daily life

A lifelog is a personal record of one's daily life in a varying amount of detail, for a variety of purposes. The record contains a comprehensive dataset of a human's activities. The data could be used to increase knowledge about how people live their lives. In recent years, some lifelog data has been automatically captured by wearable technology or mobile devices. People who keep lifelogs about themselves are known as lifeloggers.

Video content analysis or video content analytics (VCA), also known as video analysis or video analytics (VA), is the capability of automatically analyzing video to detect and determine temporal and spatial events.

Bir Bhanu is the Marlan and Rosemary Bourns Endowed University of California Presidential Chair in Engineering, the Distinguished Professor of Electrical and Computer Engineering, and Cooperative Professor of Computer Science and Engineering, Mechanical Engineering and Bioengineering, at the Marlan and Rosemary Bourns College of Engineering at the University of California, Riverside (UCR). He is the first Founding Faculty of the Marlan and Rosemary Bourns College of Engineering at UCR and served as the Founding Chair of Electrical Engineering from 1/1991 to 6/1994 and the Founding Director of the Center for Research in Intelligent Systems (CRIS) from 4/1998 to 6/2019. He has been the director of Visualization and Intelligent Systems Laboratory (VISLab) at UCR since 1991. He was the Interim Chair of the Department of Bioengineering at UCR from 7/2014 to 6/2016. Additionally, he has been the Director of the NSF Integrative Graduate Education, Research and Training (IGERT) program in Video Bioinformatics at UC Riverside. Dr. Bhanu has been the principal investigator of various programs for NSF, DARPA, NASA, AFOSR, ONR, ARO and other agencies and industries in the areas of object/target recognition, learning and vision, image/video understanding, image/video databases with applications in security, defense, intelligence, biological and medical imaging and analysis, biometrics, autonomous navigation and industrial machine vision.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

Faceless is a 2007 Austrian/British science-fiction film directed by Manu Luksch that is constructed entirely from CCTV surveillance camera images, obtained under European data protection legislation. It is part of what film theorist Elizabeth Cowie describes as a 'multi-platform project that [Luksch] developed between 2002 and 2008, exploring London as "the most surveilled city on earth"'.

Egocentric vision or first-person vision is a sub-field of computer vision that entails analyzing images and videos captured by a wearable camera, which is typically worn on the head or on the chest and naturally approximates the visual field of the camera wearer. Consequently, visual data capture the part of the scene on which the user focuses to carry out the task at hand and offer a valuable perspective to understand the user's activities and their context in a naturalistic setting.

Jiebo Luo is a Chinese-American computer scientist, the Albert Arendt Hopeman Professor of Engineering and Professor of Computer Science at the University of Rochester. He is interested in artificial intelligence, data science and computer vision.

<span class="mw-page-title-main">Gérard G. Medioni</span>

Gérard G. Medioni is a computer scientist, author, academic and inventor. He is a vice president and distinguished scientist at Amazon and serves as emeritus professor of Computer Science at the University of Southern California.

References

  1. McCahill, M. and Norris, C. 2004, From cameras to control rooms: the mediation of the image by cctv operatives, CCTV and Social Control: The politics and practice of video surveillance-European and global perspectives, 2004
  2. Jeremy Schiff; Marci Meingast; Deirdre K. Mulligan; Shankar Sastry; Ken Goldberg (2007). "Respectful Cameras: Detecting Visual Markers in Real-Time to Address Privacy Concerns". International Conference on Intelligent Robots and Systems (IROS). San Diego, California. October 2007.
  3. "Street View revisits Manhattan".
  4. "Eptascape, Inc. MPEG-7 Video Analytics". www.eptascape.com. Archived from the original on 21 June 2008. Retrieved 13 January 2022.
  5. "Cardea: Context–Aware Visual Privacy Protection for Photo Taking and Sharing" (PDF). Archived from the original (PDF) on 2018-11-08. Retrieved 2023-12-24.
  6. Pittaluga, Francesco; Koppal, Sanjeev J. (June 2015). "Privacy preserving optics for miniature vision sensors". 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE. pp. 314–324. CiteSeerX   10.1.1.944.2193 . doi:10.1109/CVPR.2015.7298628. ISBN   9781467369640. S2CID   14056410.
  7. Hinojosa, Carlos; Niebles, Juan Carlos; Arguello, Henry (October 2021). "Learning Privacy-preserving Optics for Human Pose Estimation". 2021 IEEE International Conference on Computer Vision (ICCV). Virtual, USA: IEEE/CVF: 2573–2582.
  8. Koelle, Marion; Wolf, Katrin; Boll, Susanne (2018). "Beyond LED Status Lights - Design Requirements of Privacy Notices for Body-worn Cameras". Proceedings of the Twelfth International Conference on Tangible, Embedded, and Embodied Interaction. Tei '18. Stockholm, Sweden: ACM Press. pp. 177–187. doi:10.1145/3173225.3173234. ISBN   9781450355681. S2CID   3954480.