Concept-based image indexing

Last updated

Concept-based image indexing, also variably named as "description-based" or "text-based" image indexing/retrieval, refers to retrieval from text-based indexing of images that may employ keywords, subject headings, captions, or natural language text (Chen & Rasmussen, 1999). It is opposed to Content-based image retrieval . Indexing is a technique used in CBIR.

Content-based image retrieval method of image retrieval

Content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR) is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases. Content-based image retrieval is opposed to traditional concept-based approaches.

Chu (2001) confirms that there exist two distinctive research groups employing the content-based and description-based approaches, respectively. However, research in the content-based domain is currently dominating in the field, while the other approach has less visibility.

See also

Related Research Articles

Information retrieval (IR) is the activity of obtaining information system resources relevant to an information need from a collection. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for metadata that describe data, and for databases of texts, images or sounds.

Wide Area Information Server (WAIS) is a client–server text searching system that uses the ANSI Standard Z39.50 Information Retrieval Service Definition and Protocol Specifications for Library Applications" (Z39.50:1988) to search index databases on remote computers. It was developed in the late 1980s as a project of Thinking Machines, Apple Computer, Dow Jones, and KPMG Peat Marwick.

Information science field primarily concerned with the analysis, collection, classification, manipulation, storage, retrieval and dissemination of information

Information science is a field primarily concerned with the analysis, collection, classification, manipulation, storage, retrieval, movement, dissemination, and protection of information. Practitioners within and outside the field study application and usage of knowledge in organizations along with the interaction between people, organizations, and any existing information systems with the aim of creating, replacing, improving, or understanding information systems. Historically, information science is associated with computer science, psychology, and technology. However, information science also incorporates aspects of diverse fields such as archival science, cognitive science, commerce, law, linguistics, museology, management, mathematics, philosophy, public policy, and social sciences.

In information science and information retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user. Relevance may include concerns such as timeliness, authority or novelty of the result.

Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification.

Bibliographic coupling

Bibliographic coupling, like co-citation, is a similarity measure that uses citation analysis to establish a similarity relationship between documents. Bibliographic coupling occurs when two works reference a common third work in their bibliographies. It is an indication that a probability exists that the two works treat a related subject matter.

Library and information science (LIS) or as "library and information studies" is a merging of library science and information science. The joint term is associated with schools of library and information science. In the last part of the 1960s, schools of librarianship, which generally developed from professional training programs to university institutions during the second half of the 20th century, began to add the term "information science" to their names. The first school to do this was at the University of Pittsburgh in 1964. More schools followed during the 1970s and 1980s, and by the 1990s almost all library schools in the USA had added information science to their names. Weaver Press: Although there are exceptions, similar developments have taken place in other parts of the world. In Denmark, for example, the 'Royal School of Librarianship' changed its English name to The Royal School of Library and Information Science in 1997. Exceptions include Tromsø, Norway, where the term documentation science is the preferred name of the field, France, where information science and communication studies form one interdiscipline, and Sweden, where the fields of Archival science, Library science and Museology have been integrated as Archival, Library and Museum studies.

Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process in the context of search engines designed to find web pages on the Internet is web indexing.

A focused crawler is a web crawler that collects Web pages that satisfy some specific property, by carefully prioritizing the crawl frontier and managing the hyperlink exploration process. Some predicates may be based on simple, deterministic and surface properties. For example, a crawler's mission may be to crawl pages from only the .jp domain. Other predicates may be softer or comparative, e.g., "crawl pages about baseball", or "crawl pages with large PageRank". An important page property pertains to topics, leading to topical crawlers. For example, a topical crawler may be deployed to collect pages about solar power, swine flu, or even more abstract concepts like controversy while minimizing resources spent fetching pages on other topics. Crawl frontier management may not be the only device used by focused crawlers; they may use a Web directory, a Web text index, backlinks, or any other Web artifact.

A web search query is a query based on a specific search term that a user enters into a web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are often plain text or hypertext with optional search-directives. They vary greatly from standard query languages, which are governed by strict syntax rules as command languages with keyword or positional parameters.

Subject indexing is the act of describing or classifying a document by index terms or other symbols in order to indicate what the document is about, to summarize its content or to increase its findability. In other words, it is about identifying and describing the subject of documents. Indexes are constructed, separately, on three distinct levels: terms in a document such as a book; objects in a collection such as a library; and documents within a field of knowledge.

James Z. Wang American computer scientist

James Ze Wang is a Chinese American computer scientist. He is a professor of the College of Information Sciences and Technology at Pennsylvania State University. He is also an affiliated professor of the Molecular, Cellular, and Integrative Biosciences Program; the Computational Science Graduate Minor; and the Social Data Analytics Graduate Program. He is co-director of the Intelligent Information Systems Laboratory. He was a visiting professor of the Robotics Institute at Carnegie Mellon University from 2007 to 2008. In 2011 and 2012, he served as a program manager in the Office of International Science and Engineering at the National Science Foundation. He is the second son of Chinese mathematician Wang Yuan.

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

Aboutness is a term used in library and information science (LIS), linguistics, philosophy of language, and philosophy of mind. In LIS, it is often considered synonymous with subject (documents). In the philosophy of mind it has been often considered synonymous with intentionality, perhaps since John Searle (1983). In the philosophy of logic and language it is understood as the way a piece of text relates to a subject matter or topic.

Knowledge retrieval (KR) seeks to return information in a structured form, consistent with human cognitive processes as opposed to simple lists of data items. It draws on a range of fields including epistemology, cognitive psychology, cognitive neuroscience, logic and inference, machine learning and knowledge discovery, linguistics, and information technology.

Folksonomy is the system in which users apply public tags to online items, typically to make those items easier for themselves or others to find later. Over time, this can give rise to a classification system based on those tags and how often they are applied or searched for, in contrast to a taxonomic classification designed by the owners of the content and specified when it is published. This practice is also known as collaborative tagging, social classification, social indexing, and social tagging. Folksonomy was originally "the result of personal free tagging of information [...] for one's own retrieval", but online sharing and interaction expanded it into collaborative forms. Social tagging is the application of tags in an open online environment where the tags of other users are available to others. Collaborative tagging is tagging performed by a group of users. This type of folksonomy is commonly used in cooperative and collaborative projects such as research, content repositories, and social bookmarking.

Regional Information Center for Science and Technology government agency in Iran

Regional Information Center for Science and Technology (RICeST) is an Iranian governmental organisation established to promote the production and distribution of scientific information in Iran and Islamic countries, providing reference, study and bibliographical information and related services. It also undertakes scientometrics based on its databases of scientific products of Iran and Islamic countries.

Multimedia information retrieval is a research discipline of computer science that aims at extracting semantic information from multimedia data sources. Data sources include directly perceivable media such as audio, image and video, indirectly perceivable sources such as text, semantic descriptions, biosignals as well as not perceivable sources such as bioinformation, stock prices, etc. The methodology of MMIR can be organized in three groups:

  1. Methods for the summarization of media content. The result of feature extraction is a description.
  2. Methods for the filtering of media descriptions
  3. Methods for the categorization of media descriptions into classes.

Jack Mills was a British librarian and classification researcher, who worked for more than sixty years in the study, teaching, development and promotion of library classification and information retrieval, principally as a major figure in the British school of facet analysis which builds on the traditions of Henry E. Bliss and S.R. Ranganathan.

Pauline Atherton Cochrane is an American librarian and one of the most highly cited authors in the field of library and information sciences. She is considered a leading researcher in the campaign to redesign catalogues and indexes to provide improved online subject access in library and information services as well as "a leading teacher and theorist in cataloging, indexing, and information access."

References

Ahmad, K., M. Tariq, B. Vrusias and C.Handy. 2003. Corpus-based thesaurus construction for image retrieval in specialist domains. In Sebastiani, F. (ed.). Proceedings of the 25th European Conference on Information Retrieval Research (ECIR-03). 502–510. Heidelberg: Springer Verlag.

Angeles, M. (1998). Information Organization and Information Use of Visual Resources Collections. VRA Bulletin, 25 (3), 51-58. http://urlgreyhot.com/personal/publications/information_organization_and_information_use_of_visual_resources?PHPSESSID=05f07e15bb719a05b4c621657f8cd897%5Bpermanent+dead+link%5D

Chen, H.-L., & Rasmussen, E.M. (1999). Intellectual access to images. Library Trends, 48(2), 291–302.

Chu, H. T. (2001). Research in image indexing and retrieval as reflected in the literature. Journal of the American Society for Information Science and Technology, 52(12), 1011-1018.

Fidel, R.; Hahn, T. B.; Rasmussen, E. M. & Smith, P. J. (1994). Challenges in Indexing Electronic Text and Images. Medford, NJ: Learned Information. (ASIS Monograph Series)

Heidorn, P. B. & Sandore, B. (Eds.). (1997). Digital Image Access & Retrieval: Proceedings of the 1996 Clinic on Library Applications of Data Processing. Illinois: University of Illinois, Graduate School of Library and Information Science.

Jörgensen, C. (2003). Image Retrieval. Theory and Research. Lanham, Maryland: Scarecrow Press.

Landbeck, C. R. (2002). The organization and categorization of political cartoons: An exploratory study. The Florida State University, School of Information Studies. (Master of Science thesis). https://web.archive.org/web/20120331122537/http://etd.lib.fsu.edu/theses/available/etd-06272003-144515/unrestricted/crl01.pdf

Lamy-Rousseau, F. (1984). Classification des images, materiels et donnees = Classification of images, materials and data . 2nd ed. Longueuil, Quebec: F. Lamy-Rousseau.

Panofsky, E. (1962). Studies in Icology: Humanistic themes in the art of the Renaissance. New York: Harper & Row.

Rasmussen, E. M. (1997). Indexing images. Annual Review of Information Science and Technology, 32, 169-196.

Shatford, S. (1986). Analyzing the Subject of a Picture: A Theoretical Approach. Cataloging and Classification Quarterly, 6(3), 39-62.

Wang, J. Z. (2001). Integrated Region-Based Image Retrieval. Boston, MA: Kluwer Academic Publishers. Book review: http://www-db.stanford.edu/~wangz/project/kluwer/1/review.pdf

Wang, Xin; Erdelez, Sanda; Allen, Carla; Anderson, Blake; Cao, Hongfei & Shyu, Chi-Ren (2011). Role of Domain Knowledge in Developing User-Centered Medical-Image Indexing. Journal of the American Society for Information Science and Technology, early view October 2011. doi : 10.1002/asi.21686

Digital object identifier Character string used as a permanent identifier for a digital object, in a format controlled by the International DOI Foundation

In computing, a Digital Object Identifier or DOI is a persistent identifier or handle used to uniquely identify objects, standardized by the International Organization for Standardization (ISO). An implementation of the Handle System, DOIs are in wide use mainly to identify academic, professional, and government information, such as journal articles, research reports and data sets, and official publications though they also have been used to identify other types of information resources, such as commercial videos.

Warden, G.; Dunbar, D.; Wanczycki, C. & O'Hanley, S. (2002). The Subject Analysis of Images: Past, Present and Future. https://web.archive.org/web/20080726185732/http://www.slais.ubc.ca/PEOPLE/students/student-projects/C_Wanczycki/libr517/homepage.html

Ørnager, S. (1997). Image retrieval - Theoretical analysis and empirical user studies on accessing information in images. Proceedings of the ASIS annual meeting, 34, 202-211.