Gerald Friedland (born 1978, Berlin) is a Principal Scientist at Amazon Web Services and an adjunct professor at the Electrical Engineering and Computer Science Department of the University of California, Berkeley. [1] [2]
Gerald Friedland completed his Masters and Doctorate degrees in computer science from Free University of Berlin in 2002 and 2006, respectively. [3] His PhD advisor was Raúl Rojas. He then moved to the International Computer Science Institute where he completed his a postdoc under Nelson Morgan before continuing to be a research scientist and group leader there. [4] He then worked as a Principal Data Scientist at Lawrence Livermore National Lab before co-founding Brainome, Inc. [5] He is a faculty fellow of the Berkeley Institute for Data Science where he has been running a discussion group [6] since 2018, understanding the implications of using information theory as universal tool for modeling. This resulted in a book published in 2024. [7]
Friedland is a computer scientist specializing in the processing and analysis of multimedia data and machine learning. [8] He is mostly known as the original author of the widely used "Simple Interactive Object Extraction" image and video segmentation algorithm, [9] [10] [11] [12] [13] [14] [15] [16] created as part of his PhD thesis, [17] [18] and as the co-author of a textbook on Multimedia Computing. [19] He also led the initiative to create and release the YFCC100M corpus (see also: List of datasets for machine learning research), [20] [21] [22] the largest freely available research corpus of consumer-produced videos and images. He co-founded the field of geolocation estimation for images and videos, sometimes also referred to as placing. [23] [24] [25] Friedland also frequently uncovers privacy risks in multimedia publishing practice [26] [27] [28] [29] [30] [31] [32] [33] and heads the development of the teachingprivacy.org [34] portal which provides educational materials for use in US high-schools as part of the AP Computer Science Principles and the Code.org initiative. Friedland is also the co-creator of MOVI, an open-source speech recognition board that allows the creation of cloudless voice interfaces [35] for Internet of things devices.
Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.
A CAPTCHA is a type of challenge–response test used in computing to determine whether the user is human in order to deter bot attacks and spam.
Automatic image annotation is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database.
The International Computer Science Institute (ICSI) is an independent, non-profit research organization located in Berkeley, California, United States. Since its founding in 1988, ICSI has maintained an affiliation agreement with the University of California, Berkeley, where several of its members hold faculty appointments.
Simple interactive object extraction (SIOX) is an algorithm for extracting foreground objects from color images and videos with very little user interaction. It has been implemented as "foreground selection" tool in the GIMP, as part of the tracer tool in Inkscape, and as function in ImageJ and Fiji (plug-in). Experimental implementations were also reported for Blender and Krita. Although the algorithm was originally designed for videos, virtually all implementations use SIOX primarily for still image segmentation. In fact, it is often said to be the current de facto standard for this task in the open-source world.
Thomas Shi-Tao Huang was a Chinese-born American computer scientist, electrical engineer, and writer. He was a researcher and professor emeritus at the University of Illinois at Urbana-Champaign (UIUC). Huang was one of the leading figures in computer vision, pattern recognition and human computer interaction.
Michael S. Lew is a scientist in multimedia information search and retrieval at Leiden University, Netherlands. He has published over a dozen books and 150 scientific articles in the areas of content based image retrieval, computer vision, and deep learning. Notably, he had the most cited paper in the ACM Transactions on Multimedia, one of the top 10 most cited articles in the history of the ACM SIGMM, and the most cited article from the ACM International Conference on Multimedia Information Retrieval in 2008 and also in 2010. He was the opening keynote speaker for the 9th International Conference on Visual Information Systems, the Editor-in-Chief of the International Journal of Multimedia Information Retrieval (Springer), the co-founder of influential conferences such as the International Conference on Image and Video Retrieval, and the IEEE Workshop on Human Computer Interaction. He was also a founding member of the international advisory committee for the TRECVID video retrieval evaluation project, chair of the steering committee for the ACM International Conference on Multimedia Retrieval and a member of the ACM SIGMM Executive Committee. In addition, his work on convolutional fusion networks in deep learning won the best paper award at the 23rd International Conference on Multimedia Modeling. His work is frequently cited in both scientific and popular news sources.
Stephen Malvern Omohundro is an American computer scientist whose areas of research include Hamiltonian physics, dynamical systems, programming languages, machine learning, machine vision, and the social implications of artificial intelligence. His current work uses rational economics to develop safe and beneficial intelligent technologies for better collaborative modeling, understanding, innovation, and decision making.
Ricky J. Sethi is an Assistant Professor of Computer Science at Fitchburg State University and the Director of Research for The Madsci Network. He was appointed as a National Science Foundation (NSF) Computing Innovation Fellow by the Computing Community Consortium and the Computing Research Association. He has contributed significantly in the fields of machine learning, computer vision, social computing, and science education/eLearning.
Ingemar J. Cox is Professor and Director of Research in the Department of Computer Science at University College London, where he is Head of the Future Media Group, and he is Professor in the Machine Learning department at the University of Copenhagen. Between 2003 and 2008, he was Director of UCL's Adastral Park Campus.
In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.
Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.
In computer vision, a saliency map is an image that highlights either the region on which people's eyes focus first or the most relevant regions for machine learning models. The goal of a saliency map is to reflect the degree of importance of a pixel to the human visual system or an otherwise opaque ML model.
Subhasis Chaudhuri is an Indian electrical engineer and former director at the Indian Institute of Technology, Bombay. He is a former K. N. Bajaj Chair Professor of the Department of Electrical Engineering of IIT Bombay. He is known for his pioneering studies on computer vision and is an elected fellow of all the three major Indian science academies viz. the National Academy of Sciences, India, Indian Academy of Sciences, and Indian National Science Academy. He is also a fellow of Institute of Electrical and Electronics Engineers, and the Indian National Academy of Engineering. The Council of Scientific and Industrial Research, the apex agency of the Government of India for scientific research, awarded him the Shanti Swarup Bhatnagar Prize for Science and Technology, one of the highest Indian science awards, in 2004 for his contributions to Engineering Sciences.
René Vidal is a Chilean electrical engineer and computer scientist who is known for his research in machine learning, computer vision, medical image computing, robotics, and control theory. He is the Herschel L. Seder Professor of the Johns Hopkins Department of Biomedical Engineering, and the founding director of the Mathematical Institute for Data Science (MINDS).
Shih-Fu Chang is a Taiwanese American computer scientist and electrical engineer noted for his research on multimedia information retrieval, computer vision, machine learning, and signal processing.
Gregory D. Hager is the Mandell Bellmore Professor of Computer Science and founding director of the Johns Hopkins Malone Center for Engineering in Healthcare at Johns Hopkins University.
Jiebo Luo is a Chinese-American computer scientist, the Albert Arendt Hopeman Professor of Engineering and Professor of Computer Science at the University of Rochester. He is interested in artificial intelligence, data science and computer vision.
Jiaya Jia is a tenured professor of the Department of Computer Science and Engineering at The Chinese University of Hong Kong (CUHK). He is an IEEE Fellow, the associate editor-in-chief of one of IEEE’s flagship and premier journals- Transactions on Pattern Analysis and Machine Intelligence (TPAMI), as well as on the editorial board of International Journal of Computer Vision (IJCV).
Edward Y. Chang is a computer scientist, academic, and author. He is an adjunct professor of Computer Science at Stanford University, and Visiting Chair Professor of Bioinformatics and Medical Engineering at Asia University, since 2019.