Anne O'Tate

Last updated

Anne O'Tate is a free, web-based application [1] that analyses sets of records identified on PubMed, the bibliographic database of articles from over 5,500 biomedical journals worldwide. While PubMed has its own wide range of search options to identify sets of records relevant to a researchers query it lacks the ability to analyse these sets of records further, a process for which the terms text mining and drill down have been used. Anne O'Tate is able to perform such analysis and can process sets of up to 25,000 PubMed records. [1]

Contents

Description

Once a set of articles has been identified using Anne O’Tate with its PubMed-like interface and search syntax, the set can be analysed and words and concepts mentioned in specific 'fields' (sections) of PubMed records can be displayed in order of frequency. [2] ‘Fields’ which Anne O’Tate can display in this manner are:

Topics (MeSH)

This option may help to identify possible Medical Subject Headings (known as MeSH terms, but called ‘Topics’ by Anne O’Tate) for a subject for which no corresponding subject heading or ‘entry term’ (cross-references to preferred MeSH term) exists or where PubMed’s automatic mapping process (identifying a MeSH term and including it in a search formulation) fails.

Searching for instance for articles on ‘“Knowledge Transfer”’ (for which no corresponding MeSH or entry term exists) will retrieve a set of some 530 studies in PubMed (as of August 2011); Anne O’Tate’s analysis suggests that MeSH terms like "Diffusion of Innovation" or "Information Dissemination" may be suitable additional concepts to retrieve a more ‘sensitive’ (comprehensive) set of references. This method of identifying possible MeSH terms is not available on PubMed.

Authors

This option may help with identifying authors who have written frequently about a given subject, or may help with identifying possible experts or peer reviewers

Journals

Identifying journals which publish papers on the subject under investigation may assist with selecting suitable journals to consider for manuscripts or for detailed scanning for relevant articles ('hand searching' [3] ) not found by the search on PubMed.

Other fields

Author affiliations (addresses) and the years of publication can also be analysed. ‘Important words’ from titles and abstracts which may "[...] have more frequent occurrences in the result subset than in the MEDLINE as a whole, thus they distinguish the result subset from the rest of MEDLINE" [4] can be identified and help with further refining a search on PubMed. [5] [6] [7]

History

Anne O'Tate (a pun on the word ‘annotate’) was developed by Neil R Smalheiser and a team of researchers from the University of Chicago. It is part of the Arrowsmith Project, which developed tools such as “Arrowsmith” proper, a text-comparison application, [8] "Adam", a database of medical abbreviations, [9] and ‘’Author-ity’’ (an author-disambiguation tool), [10] "Compendium", a list of biomedical text mining tools, and Anne O’Tate. The Project is based on research led by Don R. Swanson at the University of Chicago [11] which hosted the original tool. [12] Further research was led by Neil R. Smalheiser at the University of Illinois at Chicago, with funding from the National Institutes of Health. [13]

Other PubMed text-mining applications

A wide range of text-mining applications for PubMed have been developed, [4] using their own interface, such as GoPubMed, ClusterMed, or PubReMiner. Only Anne O’Tate uses PubMed’s standard interface, search syntax, and some of its functionality.

Related Research Articles

MEDLINE is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care. MEDLINE also covers much of the literature in biology and biochemistry, as well as fields such as molecular evolution.

PubMed is a free search engine accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintain the database as part of the Entrez system of information retrieval.

<span class="mw-page-title-main">Entrez</span> Cross-database search engine for health sciences

The Entrez Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The NCBI is a part of the National Library of Medicine (NLM), which is itself a department of the National Institutes of Health (NIH), which in turn is a part of the United States Department of Health and Human Services. The name "Entrez" was chosen to reflect the spirit of welcoming the public to search the content available from the NLM.

Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual. User queries can range from multi-sentence full descriptions of an information need to a few words.

<span class="mw-page-title-main">Medical Subject Headings</span> Controlled vocabulary

Medical Subject Headings (MeSH) is a comprehensive controlled vocabulary for the purpose of indexing journal articles and books in the life sciences. It serves as a thesaurus that facilitates searching. Created and updated by the United States National Library of Medicine (NLM), it is used by the MEDLINE/PubMed article database and by NLM's catalog of book holdings. MeSH is also used by ClinicalTrials.gov registry to classify which diseases are studied by trials registered in ClinicalTrials.

A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to that sequence. Summaries and aggregate results are provided in standardized format describing the information that would otherwise have required visits to many smaller sites or direct literature searches to compile. Many sequence profiling tools are software portals or gateways that simplify the process of finding information about a query in the large and growing number of bioinformatics databases. The access to these kinds of tools is either web based or locally downloadable executables.

The Gene Ontology (GO) is a major bioinformatics initiative to unify the representation of gene and gene product attributes across all species. More specifically, the project aims to: 1) maintain and develop its controlled vocabulary of gene and gene product attributes; 2) annotate genes and gene products, and assimilate and disseminate annotation data; and 3) provide tools for easy access to all aspects of the data provided by the project, and to enable functional interpretation of experimental data using the GO, for example via enrichment analysis. GO is part of a larger classification effort, the Open Biomedical Ontologies, being one of the Initial Candidate Members of the OBO Foundry.

<span class="mw-page-title-main">Medical library</span> Library focused on medical information

A health or medical library is designed to assist physicians, health professionals, students, patients, consumers, medical researchers, and information specialists in finding health and scientific information to improve, update, assess, or evaluate health care. Medical libraries are typically found in hospitals, medical schools, private industry, and in medical or health associations. A typical health or medical library has access to MEDLINE, a range of electronic resources, print and digital journal collections, and print reference books. The influence of open access (OA) and free searching via Google and PubMed has a major impact on the way medical libraries operate.

The Unified Medical Language System (UMLS) is a compendium of many controlled vocabularies in the biomedical sciences. It provides a mapping structure among these vocabularies and thus allows one to translate among the various terminology systems; it may also be viewed as a comprehensive thesaurus and ontology of biomedical concepts. UMLS further provides facilities for natural language processing. It is intended to be used mainly by developers of systems in medical informatics.

PubMed Central (PMC) is a free digital repository that archives open access full-text scholarly articles that have been published in biomedical and life sciences journals. As one of the major research databases developed by the National Center for Biotechnology Information (NCBI), PubMed Central is more than a document repository. Submissions to PMC are indexed and formatted for enhanced metadata, medical ontology, and unique identifiers which enrich the XML structured data for each article. Content within PMC can be linked to other NCBI databases and accessed via Entrez search and retrieval systems, further enhancing the public's ability to discover, read and build upon its biomedical knowledge.

Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies in this field have been applied to the biomedical literature available through services such as PubMed.

HubMed is an alternative, third-party interface to PubMed, the database of biomedical literature produced by the National Library of Medicine. It transforms data from PubMed and integrates it with data from other sources. Features include relevance-ranked search results, direct citation export, tagging and graphical display of related articles.

Index Medicus (IM) is a curated subset of MEDLINE, which is a bibliographic database of life science and biomedical science information, principally scientific journal articles. From 1879 to 2004, Index Medicus was a comprehensive bibliographic index of such articles in the form of a print index or its onscreen equivalent. Medical history experts have said of Index Medicus that it is “America's greatest contribution to medical knowledge.”

GoPubMed was a knowledge-based search engine for biomedical texts. The Gene Ontology (GO) and Medical Subject Headings (MeSH) served as "Table of contents" in order to structure the millions of articles in the MEDLINE database. MeshPubMed was at one point a separate project, but the two were merged.

<span class="mw-page-title-main">Don R. Swanson</span> American computer scientist (1924–2012)

Don R. Swanson was an American information scientist, most known for his work in literature-based discovery in the biomedical domain. His particular method has been used as a model for further work, and is often referred to as Swanson linking. He was an investigator in the Arrowsmith System project, which seeks to determine meaningful links between Medline articles to identify previously undiscovered public knowledge. He had been professor emeritus of the University of Chicago since 1996, and remained active in a post-retirement appointment until his health began to decline in 2009.

The National Centre for Text Mining (NaCTeM) is a publicly funded text mining (TM) centre. It was established to provide support, advice and information on TM technologies and to disseminate information from the larger TM community, while also providing services and tools in response to the requirements of the United Kingdom academic community.

SafetyLit is a bibliographic database and online update of recently published scholarly research of relevance to those interested in the broad field of injury prevention and safety promotion. Initiated in 1995, SafetyLit is a project of the SafetyLit Foundation in cooperation with the San Diego State University College of Health & Human Services and the World Health Organization - Department of Violence and Injury Prevention.

<span class="mw-page-title-main">MicrobesOnline</span>

MicrobesOnline is a publicly and freely accessible website that hosts multiple comparative genomic tools for comparing microbial species at the genomic, transcriptomic and functional levels. MicrobesOnline was developed by the Virtual Institute for Microbial Stress and Survival, which is based at the Lawrence Berkeley National Laboratory in Berkeley, California. The site was launched in 2005, with regular updates until 2011.

<span class="mw-page-title-main">Literature-based discovery</span> Research method using published knowledge as data

Literature-based discovery (LBD), also called literature-related discovery (LRD) is a form of knowledge extraction and automated hypothesis generation that uses papers and other academic publications to find new relationships between existing knowledge. Literature-based discovery aims to discover new knowledge by connecting information which have been explicitly stated in literature to deduce connections which have not been explicitly stated.

Arrowsmith was a literature-based discovery system built by Don R. Swanson using the concept of undiscovered public knowledge. He called it Arrowsmith: ‘An intellectual adventure’

"Imagine that the pieces of a puzzle are independently designed and created, and that, when retrieved and assembled, they then reveal a pattern – undesigned, unintended, and never before seen, yet a pattern that commands interest and invites interpretation. So it is, I claim, that independently created pieces of knowledge can harbor an unseen, unknown, and unintended pattern. And so it is that the world of recorded knowledge can yield genuinely new discoveries"

References

  1. 1 2 Smalheiser, N. R.; Zhou, W.; Torvik, V. I. (2008). "Anne O'Tate: A tool to support user-driven summarization, drill-down and browsing of PubMed search results". Journal of Biomedical Discovery and Collaboration. 3: 2. doi: 10.1186/1747-5333-3-2 . PMC   2276193 . PMID   18279519.
  2. Palidwor, G. A.; Andrade-Navarro, M. A. (2010). "MLTrends: Graphing MEDLINE term usage over time". Journal of Biomedical Discovery and Collaboration. 5: 1–6. doi:10.5210/disco.v5i0.2680. PMC   2990277 . PMID   20333611.
  3. Langham, J.; Thompson, E.; Rowan, K. (1999). "Identification of randomized controlled trials from the emergency medicine literature: Comparison of hand searching versus MEDLINE searching". Annals of Emergency Medicine. 34 (1): 25–34. doi:10.1016/s0196-0644(99)70268-4. PMID   10381991.
  4. 1 2 Lu, Z. (2011). "PubMed and beyond: A survey of web tools for searching biomedical literature". Database. 2011: baq036. doi:10.1093/database/baq036. PMC   3025693 . PMID   21245076.
  5. Wilczynski, N. L.; Walker, C. J.; McKibbon, K. A.; Haynes, R. B. (1995). "Reasons for the loss of sensitivity and specificity of methodologic MeSH terms and textwords in MEDLINE". Proceedings. Symposium on Computer Applications in Medical Care: 436–440. PMC   2579130 . PMID   8563319.
  6. Greenhalgh, T. (1997). "How to read a paper. The Medline database". BMJ (Clinical Research Ed.). 315 (7101): 180–183. doi:10.1136/bmj.315.7101.180. PMC   2127107 . PMID   9251552.
  7. Smalheiser, N. R.; Zhou, W.; Torvik, V. I. (2011). "Distribution of "Characteristic" Terms in MEDLINE Literatures". Information. 2 (4): 266–276. doi: 10.3390/info2020266 .
  8. Smalheiser, N. R.; Torvik, V. I.; Zhou, W. (2009). "Arrowsmith two-node search interface: A tutorial on finding meaningful links between two disparate sets of articles in MEDLINE". Computer Methods and Programs in Biomedicine. 94 (2): 190–197. doi:10.1016/j.cmpb.2008.12.006. PMC   2693227 . PMID   19185946.
  9. Zhou, W.; Torvik, V. I.; Smalheiser, N. R. (2006). "ADAM: Another database of abbreviations in MEDLINE". Bioinformatics. 22 (22): 2813–2818. doi: 10.1093/bioinformatics/btl480 . PMID   16982707.
  10. Torvik, V. I.; Smalheiser, N. R. (2009). "Author Name Disambiguation in MEDLINE". ACM Transactions on Knowledge Discovery from Data. 3 (3): 1–29. doi:10.1145/1552303.1552304. PMC   2805000 . PMID   20072710.
  11. Swanson, D.R.; Smalheiser, N.R. (Summer 1999). "Implicit Text Linkages between Medline Records: Using Arrowsmith as an Aid to Scientific Discovery" (PDF). Library Trends. 48 (1): 48–59. Retrieved July 4, 2011.
  12. "Arrowsmith-2 on Linux". The University of Chicago. Archived from the original on June 18, 2009. Retrieved July 4, 2011.
  13. Smalheiser, N.R. (October 2005). "The Arrowsmith Project: 2005 Status Report". Discovery Science. 8th international conference on discovery science. Lecture Notes in Computer Science. Vol. 3735. pp. 26–43. doi:10.1007/11563983_5. ISBN   978-3-540-29230-2.