TREC Genomics

Last updated

The TREC Genomics track was a workshop held under the auspices of NIST for the purpose of evaluating systems for information retrieval and related technologies in the genomics domain. The TREC Genomics track took place annually from 2003 to 2007, with some modifications to the task set every year; tasks included information retrieval, document classification, GeneRIF prediction, and question answering.

Information retrieval (IR) is the activity of obtaining information system resources relevant to an information need from a collection. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for metadata that describe data, and for databases of texts, images or sounds.

Genomics discipline in genetics

Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of genes, which direct the production of proteins with the assistance of enzymes and messenger molecules. In turn, proteins make up body structures such as organs and tissues as well as control chemical reactions and carry signals between cells. Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes. Advances in genomics have triggered a revolution in discovery-based research and systems biology to facilitate understanding of even the most complex biological systems such as the brain.

Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is mainly in information science and computer science. The problems are overlapping, however, and there is therefore interdisciplinary research on document classification.

See also


Related Research Articles

Cross-language information retrieval (CLIR) is a subfield of information retrieval dealing with retrieving information written in a language different from the language of the user's query. The term "cross-language information retrieval" has many synonyms, of which the following are perhaps the most frequent: cross-lingual information retrieval, translingual information retrieval, multilingual information retrieval. The term "multilingual information retrieval" refers more generally both to technology for retrieval of multilingual collections, and to technology which has been moved to handle material in one language to another. Cross-language information retrieval refers more specifically to the use case where users formulate their information need in one language and the system retrieves relevant documents in another. To do so, most CLIR systems use various translation techniques. CLIR techniques can be classified into different categories based on different translation resources:

In information science and information retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user. Relevance may include concerns such as timeliness, authority or novelty of the result.

The Text REtrieval Conference (TREC) is an ongoing series of workshops focusing on a list of different information retrieval (IR) research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology (NIST) and the Intelligence Advanced Research Projects Activity, and began in 1992 as part of the TIPSTER Text program. Its purpose is to support and encourage research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies and to increase the speed of lab-to-product transfer of technology.

A GeneRIF or Gene Reference Into Function is a short statement about the function of a gene. GeneRIFs provide a simple mechanism for allowing scientists to add to the functional annotation of genes described in the Entrez Gene database. In practice, function is construed quite broadly. For example, there are GeneRIFs that discuss the role of a gene in a disease, GeneRIFs that point the viewer towards a review article about the gene, and GeneRIFs that discuss the structure of a gene. However, the stated intent is for GeneRIFs to be about gene function. Currently over half a million geneRIFs have been created for genes from almost 1000 different species.

Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query, to gather user feedback, and to use information about whether or not those results are relevant to perform a new query. We can usefully distinguish between three types of feedback: explicit feedback, implicit feedback, and blind or "pseudo" feedback.

Zettair is a compact text search engine for indexing and search of HTML collections. It is an open source software developed by a group of researchers at RMIT University.

In information retrieval, Okapi BM25 is a ranking function used by search engines to rank matching documents according to their relevance to a given search query. It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and others.

Geographic information retrieval (GIR) or geographical information retrieval is the augmentation of information retrieval with geographic information. GIR aims at solving textual queries that include a geographic dimension, such as "What wars were fought in Greece?" or "restaurants in Beirut". It is common in GIR to separate the text indexing and analysis from the geographic indexing. Semantic similarity and word-sense disambiguation are important components of GIR. To identify place names, GIR often relies on gazetteers.

Terrier IR Platform is a modular open source software for the rapid development of large-scale Information Retrieval applications. Terrier was developed by members of the Information Retrieval Research Group, Department of Computing Science, at the University of Glasgow.

The mean reciprocal rank is a statistic measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer: 1 for first place, ​12 for second place, ​13 for third place and so on. The mean reciprocal rank is the average of the reciprocal ranks of results for a sample of queries Q:

Gordon V. Cormack is a professor in the David R. Cheriton School of Computer Science at the University of Waterloo and co-inventor of Dynamic Markov Compression.

Adversarial information retrieval is a topic in information retrieval related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.

The TRECVID evaluation meetings are an ongoing series of workshops focusing on a list of different information retrieval (IR) research areas in content-based retrieval and exploitation of digital video. TRECVID is co-sponsored by the National Institute of Standards and Technology (NIST) and other US government agencies. Various participating research groups make significant contributions. The goal of the workshops is to encourage research in content-based video retrieval and analysis by providing large test collections, realistic system tasks, uniform scoring procedures, and a forum for organizations interested in comparing their results.

A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.

The Cranfield experiments were computer information retrieval experiments conducted by Cyril W. Cleverdon at the College of Aeronautics at Cranfield in the 1960s, to evaluate the efficiency of indexing systems.

H5 is a privately held company specializing in information retrieval systems for the legal industry, with offices in San Francisco and New York City. Founded in 1999, H5 combines advanced proprietary information retrieval technologies with professional expertise in linguistics, statistics, computer science, law, information technology, process engineering and e-discovery. The company completed its third and latest round of venture capital funding in 2005; its primary investors are the private equity firms of Draper Fisher Jurvetson, Institutional Venture Partners (IVP), and Walden VC.

RetrievalWare is an enterprise search engine emphasizing natural language processing and semantic networks which was commercially available from 1992 to 2007 and is especially known for its use by government intelligence agencies.

LGTE

Lucene Geographic and Temporal (LGTE) is an information retrieval tool developed at Technical University of Lisbon which can be used as a search engine or as evaluation system for information retrieval techniques for research purposes. The first implementation powered by LGTE was the search engine of DIGMAP, a project co-funded by the community programme eContentplus between 2006 and 2008, which was aimed to provide services available on the web over old digitized maps from a group of partners over Europe including several National Libraries.

The Conference and Labs of the Evaluation Forum, or CLEF, is an organization promoting research in multilingual information access. Its specific functions are to maintain an underlying framework for testing information retrieval systems and to create repositories of data for researchers to use in developing comparable standards. The organization holds a conference every September in Europe since a first constituting workshop in 2000. From 1997 to 1999, TREC, the similar evaluation conference organised annually in the USA, included a track for the evaluation of Cross-Language IR for European languages. This track was coordinated jointly by NIST and by a group of European volunteers that grew over the years. At the end of 1999, a decision by some of the participants was made to transfer the activity to Europe and set it up independently. The aim was to expand coverage to a larger number of languages and to focus on a wider range of issues, including monolingual system evaluation for languages other than English. Over the years, CLEF has been supported by a number of various EU funded projects and initiatives.