Contact | |
---|---|
Research center | United States National Library of Medicine (NLM) |
Release date | January 1996 |
Access | |
Website | pubmed |
PubMed is a free database including primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine (NLM) at the National Institutes of Health maintains the database as part of the Entrez system of information retrieval. [1]
From 1971 to 1997, online access to the MEDLINE database had been primarily through institutional facilities, such as university libraries. [2] PubMed, first released in January 1996, ushered in the era of private, free, home- and office-based MEDLINE searching. [3] The PubMed system was offered free to the public starting in June 1997. [2]
In addition to MEDLINE, PubMed provides access to:
Many PubMed records contain links to full text articles, some of which are freely available, often in PubMed Central [5] and local mirrors, such as Europe PubMed Central. [6]
Information about the journals indexed in MEDLINE, and available through PubMed, is found in the NLM Catalog. [7]
As of 23 May 2023 [update] , PubMed has more than 35 million citations and abstracts dating back to 1966, selectively to the year 1865, and very selectively to 1809. As of the same date [update] , 24.6 million of PubMed's records are listed with their abstracts, and 26.8 million records have links to full-text versions (of which 10.9 million articles are available, full-text for free). [8] Over the last 10 years (ending 31 December 2019), an average of nearly one million new records were added each year.
In 2016, NLM changed the indexing system so that publishers are able to directly correct typos and errors in PubMed indexed articles. [9]
PubMed has been reported to include some articles published in predatory journals. MEDLINE and PubMed policies for the selection of journals for database inclusion are slightly different. Weaknesses in the criteria and procedures for indexing journals in PubMed Central may allow publications from predatory journals to leak into PubMed. [10]
A new PubMed interface was launched in October 2009 and encouraged the use of such quick, Google-like search formulations; they have also been described as 'telegram' searches. [11] By default the results are sorted by Most Recent, but this can be changed to Best Match, Publication Date, First Author, Last Author, Journal, or Title. [12]
The PubMed website design and domain was updated in January 2020 and became default on 15 May 2020, with the updated and new features. [13] There was a critical reaction from many researchers who frequently use the site. [14]
PubMed/MEDLINE can be accessed via handheld devices, using for instance the "PICO" option (for focused clinical questions) created by the NLM. [15] A "PubMed Mobile" option, providing access to a mobile friendly, simplified PubMed version, is also available. [16]
Simple searches on PubMed can be carried out by entering key aspects of a subject into PubMed's search window.
PubMed translates this initial search formulation and automatically adds field names, relevant MeSH (Medical Subject Headings) terms, synonyms, Boolean operators, and 'nests' the resulting terms appropriately, enhancing the search formulation significantly, in particular by routinely combining (using the OR operator) textwords and MeSH terms.[ citation needed ]
For optimal searches in PubMed, it is necessary to understand its core component, MEDLINE, and especially of the MeSH (Medical Subject Headings) controlled vocabulary used to index MEDLINE articles. They may also require complex search strategies, use of field names (tags), proper use of limits and other features; reference librarians and search specialists offer search services. [17] [18]
The search into PubMed's search window is only recommended for the search of unequivocal topics or new interventions that do not yet have a MeSH heading created, as well as for the search for commercial brands of medicines and proper nouns. It is also useful when there is no suitable heading or the descriptor represents a partial aspect. The search using the thesaurus MeSH is more accurate and will give fewer irrelevant results. In addition, it saves the disadvantage of the free text search in which the spelling, singular/plural or abbreviated differences have to be taken into consideration. On the other side, articles more recently incorporated into the database to which descriptors have not yet been assigned will not be found. Therefore, to guarantee an exhaustive search, a combination of controlled language headings and free text terms must be used. [19]
When a journal article is indexed, numerous article parameters are extracted and stored as structured information. Such parameters are: Article Type (MeSH terms, e.g., "Clinical Trial"), Secondary identifiers, (MeSH terms), Language, Country of the Journal or publication history (e-publication date, print journal publication date).
Publication type parameter allows searching by the type of publication, including reports of various kinds of clinical research. [20]
Since July 2005, the MEDLINE article indexing process extracts identifiers from the article abstract and puts those in a field called Secondary Identifier (SI). The secondary identifier field is to store accession numbers to various databases of molecular sequence data, gene expression or chemical compounds and clinical trial IDs. For clinical trials, PubMed extracts trial IDs for the two largest trial registries: ClinicalTrials.gov (NCT identifier) and the International Standard Randomized Controlled Trial Number Register (IRCTN identifier). [21]
A reference which is judged particularly relevant can be marked and "related articles" can be identified. If relevant, several studies can be selected and related articles to all of them can be generated (on PubMed or any of the other NCBI Entrez databases) using the 'Find related data' option. The related articles are then listed in order of "relatedness". To create these lists of related articles, PubMed compares words from the title and abstract of each citation, as well as the MeSH headings assigned, using a powerful word-weighted algorithm. [22] The 'related articles' function has been judged to be so precise that the authors of a paper suggested it can be used instead of a full search. [23]
PubMed automatically links to MeSH terms and subheadings. Examples would be: "bad breath" links to (and includes in the search) "halitosis", "heart attack" to "myocardial infarction", "breast cancer" to "breast neoplasms". Where appropriate, these MeSH terms are automatically "expanded", that is, include more specific terms. Terms like "nursing" are automatically linked to "Nursing [MeSH]" or "Nursing [Subheading]". This feature is called Auto Term Mapping and is enacted, by default, in free text searching but not exact phrase searching (i.e. enclosing the search query with double quotes). [24] This feature makes PubMed searches more sensitive and avoids false-negative (missed) hits by compensating for the diversity of medical terminology. [24]
PubMed does not apply automatic mapping of the term in the following circumstances: by writing the quoted phrase (e.g., "kidney allograft"), when truncated on the asterisk (e.g., kidney allograft*), and when looking with field labels (e.g., Cancer [ti]). [19]
The PubMed optional facility "My NCBI" (with free registration) provides tools for
and a wide range of other options. [25] The "My NCBI" area can be accessed from any computer with web-access. An earlier version of "My NCBI" was called "PubMed Cubby". [26]
LinkOut is an NLM facility to link and make available full-text local journal holdings. [27] Some 3,200 sites (mainly academic institutions) participate in this NLM facility (as of March 2010 [update] ), from Aalborg University in Denmark to ZymoGenetics in Seattle. [28] Users at these institutions see their institution's logo within the PubMed search result (if the journal is held at that institution) and can access the full-text. Link out is being consolidated with Outside Tool as of the major platform update coming in the Summer of 2019. [29]
In 2016, PubMed allows authors of articles to comment on articles indexed by PubMed. This feature was initially tested in a pilot mode (since 2013) and was made permanent in 2016. [30] In February 2018, PubMed Commons was discontinued due to the fact that "usage has remained minimal". [31] [32]
askMEDLINE, a free-text, natural language query tool for MEDLINE/PubMed, developed by the NLM, also suitable for handhelds. [33]
A PMID (PubMed identifier or PubMed unique identifier) [34] is a unique integer value, starting at 1
, assigned to each PubMed record. A PMID is not the same as a PMCID (PubMed Central identifier) which is the identifier for all works published in the free-to-access PubMed Central. [35]
The assignment of a PMID or PMCID to a publication tells the reader nothing about the type or quality of the content. PMIDs are assigned to letters to the editor, editorial opinions, op-ed columns, and any other piece that the editor chooses to include in the journal, as well as peer-reviewed papers. The existence of the identification number is also not proof that the papers have not been retracted for fraud, incompetence, or misconduct. The announcement about any corrections to original papers may be assigned a PMID.
Each number that is entered in the PubMed search window is treated by default as if it were a PMID. Therefore, any reference in PubMed can be located using the PMID.
The National Library of Medicine leases the MEDLINE information to a number of private vendors such as Embase, Ovid, Dialog, EBSCO, Knowledge Finder and many other commercial, non-commercial, and academic providers. [36] As of October 2008 [update] , more than 500 licenses had been issued, more than 200 of them to providers outside the United States. As licenses to use MEDLINE data are available for free, the NLM in effect provides a free testing ground for a wide range [37] of alternative interfaces and 3rd party additions to PubMed, one of a very few large, professionally curated databases which offers this option.
Lu identifies a sample of 28 current and free Web-based PubMed versions, requiring no installation or registration, which are grouped into four categories: [37]
As most of these and other alternatives rely essentially on PubMed/MEDLINE data leased under license from the NLM/PubMed, the term "PubMed derivatives" has been suggested. [37] Without the need to store about 90 GB of original PubMed Datasets, anybody can write PubMed applications using the eutils-application program interface as described in "The E-utilities In-Depth: Parameters, Syntax and More", by Eric Sayers, PhD. [48] Various citation format generators, taking PMID numbers as input, are examples of web applications making use of the eutils-application program interface. Sample web pages include Citation Generator – Mick Schroeder, Pubmed Citation Generator – Ultrasound of the Week, PMID2cite, and Cite this for me.
Alternative methods to mine the data in PubMed use programming environments such as Matlab, Python or R. In these cases, queries of PubMed are written as lines of code and passed to PubMed and the response is then processed directly in the programming environment. Code can be automated to systematically query with different keywords such as disease, year, organs, etc.
For bulk processing, the full PubMed database is available as XML which can be downloaded from an FTP server. The annual baseline is released in December, followed by daily update files. [49]
In addition to its traditional role as a biomedical database, PubMed has become common resource for training biomedical language models. [50] Recent advancements in this field include the development of models like PubMedGPT, a 2.7B parameter model trained on PubMed data by Stanford CRFM, and Microsoft's BiomedCLIP-PubMedBERT, which utilizes figure-caption pairs from PubMed Central for vision-language processing. These models demonstrate the significant potential of PubMed data in enhancing the capabilities of AI in medical research and healthcare applications. Such advancements underline the growing intersection between large-scale data mining and AI development in the biomedical field.
The data accessible by PubMed can be mirrored locally using an unofficial tool such as MEDOC. [51]
Millions of PubMed records augment various open data datasets about open access, like Unpaywall. Data analysis tools like Unpaywall Journals are used by libraries to assist with big deal cancellations: libraries can avoid subscriptions for materials already served by instant open access via open archives like PubMed Central. [52]
The National Center for Biotechnology Information (NCBI) is part of the (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The NCBI is located in Bethesda, Maryland, and was founded in 1988 through legislation sponsored by US Congressman Claude Pepper.
MEDLINE is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care. MEDLINE also covers much of the literature in biology and biochemistry, as well as fields such as molecular evolution.
The United States National Library of Medicine (NLM), operated by the United States federal government, is the world's largest medical library.
The Entrez Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. The NCBI is a part of the National Library of Medicine (NLM), which is itself a department of the National Institutes of Health (NIH), which in turn is a part of the United States Department of Health and Human Services. The name "Entrez" was chosen to reflect the spirit of welcoming the public to search the content available from the NLM.
Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual. User queries can range from multi-sentence full descriptions of an information need to a few words.
MedlinePlus is an online information service produced by the United States National Library of Medicine. The service provides curated consumer health information in English and Spanish with select content in additional languages. The site brings together information from the National Library of Medicine (NLM), the National Institutes of Health (NIH), other U.S. government agencies, and health-related organizations. There is also a site optimized for display on mobile devices, in both English and Spanish. In 2015, about 400 million people from around the world used MedlinePlus. The service is funded by the NLM and is free to users.
Medical Subject Headings (MeSH) is a comprehensive controlled vocabulary for the purpose of indexing journal articles and books in the life sciences. It serves as a thesaurus that facilitates searching. Created and updated by the United States National Library of Medicine (NLM), it is used by the MEDLINE/PubMed article database and by NLM's catalog of book holdings. MeSH is also used by ClinicalTrials.gov registry to classify which diseases are studied by trials registered in ClinicalTrials.
A health or medical library is designed to assist physicians, health professionals, students, patients, consumers, medical researchers, and information specialists in finding health and scientific information to improve, update, assess, or evaluate health care. Medical libraries are typically found in hospitals, medical schools, private industry, and in medical or health associations. A typical health or medical library has access to MEDLINE, a range of electronic resources, print and digital journal collections, and print reference books. The influence of open access (OA) and free searching via Google and PubMed has a major impact on the way medical libraries operate.
The Unified Medical Language System (UMLS) is a compendium of many controlled vocabularies in the biomedical sciences. It provides a mapping structure among these vocabularies and thus allows one to translate among the various terminology systems; it may also be viewed as a comprehensive thesaurus and ontology of biomedical concepts. UMLS further provides facilities for natural language processing. It is intended to be used mainly by developers of systems in medical informatics.
PubMed Central (PMC) is a free digital repository that archives open access full-text scholarly articles that have been published in biomedical and life sciences journals. As one of the major research databases developed by the National Center for Biotechnology Information (NCBI), PubMed Central is more than a document repository. Submissions to PMC are indexed and formatted for enhanced metadata, medical ontology, and unique identifiers which enrich the XML structured data for each article. Content within PMC can be linked to other NCBI databases and accessed via Entrez search and retrieval systems, further enhancing the public's ability to discover, read and build upon its biomedical knowledge.
Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies in this field have been applied to the biomedical literature available through services such as PubMed.
The Vancouver system, also known as Vancouver reference style or the author–number system, is a citation style that uses numbers within the text that refer to numbered entries in the reference list. It is popular in the physical sciences and is one of two referencing systems normally used in medicine, the other being the author–date, or "Harvard", system. Vancouver style is used by MEDLINE and PubMed.
Index Medicus (IM) is a curated subset of MEDLINE, which is a bibliographic database of life science and biomedical science information, principally scientific journal articles. From 1879 to 2004, Index Medicus was a comprehensive bibliographic index of such articles in the form of a print index or its onscreen equivalent. Medical history experts have said of Index Medicus that it is “America's greatest contribution to medical knowledge.”
GoPubMed was a knowledge-based search engine for biomedical texts. The Gene Ontology (GO) and Medical Subject Headings (MeSH) served as "Table of contents" in order to structure the millions of articles in the MEDLINE database. MeshPubMed was at one point a separate project, but the two were merged.
Europe PubMed Central is an open-access repository that contains millions of biomedical research works. It was known as UK PubMed Central until 1 November 2012.
The National Centre for Text Mining (NaCTeM) is a publicly funded text mining (TM) centre. It was established to provide support, advice and information on TM technologies and to disseminate information within the larger TM community, while also providing services and tools in response to the requirements of the United Kingdom academic community.
SafetyLit is a bibliographic database and online update of recently published scholarly research of relevance to those interested in the broad field of injury prevention and safety promotion. Initiated in 1995, SafetyLit is a project of the SafetyLit Foundation in cooperation with the San Diego State University College of Health & Human Services and the World Health Organization - Department of Violence and Injury Prevention.
GeneCards is a database of human genes that provides genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. It is being developed and maintained by the Crown Human Genome Center at the Weizmann Institute of Science, in collaboration with LifeMap Sciences.
Anne O'Tate is a free, web-based application that analyses sets of records identified on PubMed, the bibliographic database of articles from over 5,500 biomedical journals worldwide. While PubMed has its own wide range of search options to identify sets of records relevant to a researchers query it lacks the ability to analyse these sets of records further, a process for which the terms text mining and drill down have been used. Anne O'Tate is able to perform such analysis and can process sets of up to 25,000 PubMed records.
Arrowsmith was a literature-based discovery system built by Don R. Swanson using the concept of undiscovered public knowledge. He called it Arrowsmith: ‘An intellectual adventure’
"Imagine that the pieces of a puzzle are independently designed and created, and that, when retrieved and assembled, they then reveal a pattern – undesigned, unintended, and never before seen, yet a pattern that commands interest and invites interpretation. So it is, I claim, that independently created pieces of knowledge can harbor an unseen, unknown, and unintended pattern. And so it is that the world of recorded knowledge can yield genuinely new discoveries"
[[Category:United States National Library of Medicine|PubMed]