The Overhead Imagery Research Data Set (OIRDS) is a collection of an open-source, annotated, overhead images that computer vision researchers can use to aid in the development of algorithms. [1] Most computer vision and machine learning algorithms function by training on a large set of example data. [2] Further, for many academic and industry researchers, the availability of truth-labeled test data helps drive algorithm research.
While a great deal of terrestrial imagery is available on the Internet from various sources, there are few (if any) repositories of overhead imagery. The limited overhead imagery that is found via sources such as Google Earth or Google Maps is copyrighted or may have limited use. [3]
The initial ~1,000 images in the OIRDS is focused on an Automatic Target Detection (ATD) task for passenger vehicles. Passenger vehicles in the OIRDS consist of cars, trucks, vans, & pick-ups. The vehicle data set is composed of USGS and VIVID images. All of these images are color RGB images. The annotations that describe the images are documented in detail in. [4]
OIRDS v1.0 was released in September, 2009. This version contains ~900 annotated images with ~1800 targets identified. [1]
The current OIRDS data set only has vehicle annotations. It does not include other target types. Additionally, recent trends in computer vision include image context for many detection and classification problems. While researchers are encouraged to provide those annotations, they are not currently provided. [4]
Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.
Corpus linguistics is the study of a language as that language is expressed in its text corpus, its body of "real world" text. Corpus linguistics proposes that a reliable analysis of a language is more feasible with corpora collected in the field—the natural context ("realia") of that language—with minimal experimental interference.
Avinash C. Kak is a professor of Electrical and Computer Engineering at Purdue University who has conducted pioneering research in several areas of information processing. His most noteworthy contributions deal with algorithms, languages, and systems related to networks, robotics, and computer vision. Born in Srinagar, Kashmir, he did his Bachelors in BE at University of Madras and Phd in Indian Institute of Technology Delhi. He joined the faculty of Purdue University in 1971.
Scientific visualization is an interdisciplinary branch of science concerned with the visualization of scientific phenomena. It is also considered a subset of computer graphics, a branch of computer science. The purpose of scientific visualization is to graphically illustrate scientific data to enable scientists to understand, illustrate, and glean insight from their data. Research into how people read and misread various types of visualizations is helping to determine what types and features of visualizations are most understandable and effective in conveying information.
NASA WorldWind is an open-source virtual globe. According to the website, "WorldWind is an open source virtual globe API. WorldWind allows developers to quickly and easily create interactive visualizations of 3D globe, map and geographical information. Organizations around the world use WorldWind to monitor weather patterns, visualize cities and terrain, track vehicle movement, analyze geospatial data and educate humanity about the Earth." It was first developed by NASA in 2003 for use on personal computers and then further developed in concert with the open source community since 2004. As of 2017, a web-based version of WorldWind is available online. An Android version is also available.
Ensembl genome database project is a scientific project at the European Bioinformatics Institute, which provides a centralized resource for geneticists, molecular biologists and other researchers studying the genomes of our own species and other vertebrates and model organisms. Ensembl is one of several well known genome browsers for the retrieval of genomic information.
Articulated body pose estimation in computer vision is the study of algorithms and systems that recover the pose of an articulated body, which consists of joints and rigid parts using image-based observations. It is one of the longest-lasting problems in computer vision because of the complexity of the models that relate observation with pose, and because of the variety of situations in which it would be useful.
LabelMe is a project created by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) which provides a dataset of digital images with annotations. The dataset is dynamic, free to use, and open to public contribution. The most applicable use of LabelMe is in computer vision research. As of October 31, 2010, LabelMe has 187,240 images, 62,197 annotated images, and 658,992 labeled objects.
The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy.
Caltech 101 is a data set of digital images created in September 2003 and compiled by Fei-Fei Li, Marco Andreetto, Marc 'Aurelio Ranzato and Pietro Perona at the California Institute of Technology. It is intended to facilitate Computer Vision research and techniques and is most applicable to techniques involving image recognition classification and categorization. Caltech 101 contains a total of 9,146 images, split between 101 distinct object categories and a background category. Provided with the images are a set of annotations describing the outlines of each image, along with a Matlab script for viewing.
In computer vision, the problem of object categorization from image search is the problem of training a classifier to recognize categories of objects, using only the images retrieved automatically with an Internet search engine. Ideally, automatic image collection would allow classifiers to be trained with nothing but the category names as input. This problem is closely related to that of content-based image retrieval (CBIR), where the goal is to return better image search results rather than training a classifier for image recognition.
Stanley A. Klein is an American psychophysicist. He is Professor of Vision Science and Optometry at the University of California, Berkeley and a member of the Berkeley Visual Processing Laboratory. He was a consulting editor for Attention, Perception, & Psychophysics, a publication of the Psychonomic Society which promotes the communication of scientific research in psychology and allied sciences. His major area of research has been neurotechnology, a field of science that studies the body and mind through the nervous system by electronics and mechanisms. He was the co-chair for the SPIE meetings on human vision. Klein has authored and co-authored numerous papers on visual perception in the human brain. He is currently interested in the intersection of religion and science.
Pietro Perona is an Italian-American educator and computer scientist. He is the Allan E. Puckett Professor of Electrical Engineering and Computation and Neural Systems at the California Institute of Technology and director of the National Science Foundation Engineering Research Center in Neuromorphic Systems Engineering. He is known for his research in computer vision and is the director of the Caltech Computational Vision Group.
Mapillary is a service for sharing crowdsourced geotagged photos, developed by remote company Mapillary AB, based in Malmö, Sweden. Mapillary was launched in 2013 and acquired by Facebook, Inc. in 2020. This is one of the few alternative platforms that has street level imagery like Google Street View.
BisQue is a free, open source web-based platform for the exchange and exploration of large, complex datasets. It is being developed at the Vision Research Lab at the University of California, Santa Barbara. BisQue specifically supports large scale, multi-dimensional multimodal-images and image analysis. Metadata is stored as arbitrarily nested and linked tag/value pairs, allowing for domain-specific data organization. Image analysis modules can be added to perform complex analysis tasks on compute clusters. Analysis results are stored within the database for further querying and processing. The data and analysis provenance is maintained for reproducibility of results. BisQue can be easily deployed in cloud computing environments or on computer clusters for scalability. BisQue has been integrated into the NSF Cyberinfrastructure project CyVerse. The user interacts with BisQue via any modern web browser.
Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.
The ImageNet project is a large visual database designed for use in visual object recognition software research. More than 14 million images have been hand-annotated by the project to indicate what objects are pictured and in at least one million of the images, bounding boxes are also provided. ImageNet contains more than 20,000 categories, with a typical category, such as "balloon" or "strawberry", consisting of several hundred images. The database of annotations of third-party image URLs is freely available directly from ImageNet, though the actual images are not owned by ImageNet. Since 2010, the ImageNet project runs an annual software contest, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where software programs compete to correctly classify and detect objects and scenes. The challenge uses a "trimmed" list of one thousand non-overlapping classes.
Computer Vision Annotation Tool (CVAT) is a free, open source, web-based image and video annotation tool which is used for labeling data for computer vision algorithms. Originally developed by Intel, CVAT is designed for use by a professional data annotation team, with a user interface optimized for computer vision annotation tasks.
Olga Russakovsky is an Assistant Professor of Computer Science at Princeton University. Her research investigates computer vision and machine learning. She was one of the leaders of the ImageNet Large Scale Visual Recognition challenge and has been recognised by MIT Technology Review as one of the world's top young innovators.
Links to Data Sets
Links to some sources of OIRDS imagery
Other Links