Literature-based discovery

Last updated
An example diagram of Swanson linking, usinc the ABC paradigm Swanson linking.jpg
An example diagram of Swanson linking, usinc the ABC paradigm

Literature-based discovery (LBD), also called literature-related discovery (LRD) is a form of knowledge extraction and automated hypothesis generation that uses papers and other academic publications (the "literature") to find new relationships between existing knowledge (the "discovery"). Literature-based discovery aims to discover new knowledge by connecting information which have been explicitly stated in literature to deduce connections which have not been explicitly stated. [1]

Contents

LBD can help researchers to quickly discover and explore hypotheses as well as gain information on relevant advances inside and outside of their niches and increase interdisciplinary information sharing. [1]

The most basic and widespread type of LBD is called the ABC paradigm because it centers around three concepts called A, B and C. [2] [3] [4] It states that if there is a connection between A and B and one between B and C, then there is one between A and C which, if not explicitly stated, is yet to be explored. [1]

History

The LBD technique was pioneered by Don R. Swanson in the 1980s. [5] He hypothesized that the combination of two separately published results indicating an A-B relationship and a B-C relationship are evidence of an A-C relationship which is unknown or unexplored. He used this to propose fish oil as a treatment for Raynaud syndrome due to their shared relationship with blood viscosity. [6] This hypothesis was later shown to have merit in a prospective study [7] and he continually proposed other discoveries using similar methods. [8] [9] [10] [1]

Swanson linking

Swanson linking is a term proposed in 2003 [11] that refers to connecting two pieces of knowledge previously thought to be unrelated. [12] For example, it may be known that illness A is caused by chemical B, and that drug C is known to reduce the amount of chemical B in the body. However, because the respective articles were published separately from one another (called "disjoint data"), the relationship between illness A and drug C may be unknown. Swanson linking aims to find these relationships and report them.

Although the ABC paradigm is widely used, critics of the system have argued that much of science is not captured on simple assertions and it is rather built from analogies and images at a higher level of abstraction. [13]

Systems

LBD comes generally in two flavours: open and closed discovery. In open discovery, only A is given. The approach finds Bs and uses them to return possibly interesting Cs to the user, thus generating hypotheses from A. With closed discovery, the A and C are given to the approach which seeks to find the Bs which can link the two, thus testing a hypothesis about A and C. [1]

A number of systems to perform literature-based discovery have been developed over the years, extending the original idea of Don Swanson, and the evaluation of the quality of such systems is an active area of research. [14] Some systems include web versions for increased user-friendliness. [15] A common approach to many systems is the use of MeSH terms to represent scientific articles. This is used by the systems Manjal, BITOLA and LitLinker. [16]

One well-known system within the field is called Arrowsmith and is tailored to find connections between two disjoint sets of articles, an approach labeled "two-node" search. [17] [18]

Another well-known system, LION LBD, [19] uses PubTator [20] for annotating PubMed scientific articles with concepts such as chemicals, genes/proteins, mutations, diseases and species; as well as sentence-level annotation of cancer hallmarks that describe fundamental cancer processes and behaviour. [21] It uses co-occurrence metrics to rank relations between concepts and performs both open and closed discovery. [1]

While LBD systems are based on traditional statistical methods, [16] other systems leverage sophisticated machine learning methods, like neural networks. [1] Some LBD systems represent the connection between concepts as a knowledge graph, and thus employ techniques of graph theory. [22] The graph-based representation is also the foundation for LBD systems that employ graph databases like Neo4J, enabling discovery via graph query languages such as Cypher. [23]

Graph-based LBD systems represent the relations between concepts using a different relation types, such as those in the UMLS Semantic Network. [24] Some approaches go further and try to apply contextualized relations, [25] an approach also used by the Gene Ontology for their Causal Activity Modeling (GO-CAM). [26]

Use of databases

Besides extracting information from the body of scientific articles, LBD systems often employ structured knowledge from biocurated biological resources, like the Online Mendelian Inheritance in Men (OMIM). [27]

List of systems

The Anni 2.0 literature-based discovery system, employing a workflow similar to other LBD systems. Anni 2.0 literature-based discovery system.png
The Anni 2.0 literature-based discovery system, employing a workflow similar to other LBD systems.

These are the published LBD systems, ordered by date of publication: [29]

Semantic typing

A common task in literature-based discovery is assigning words/concepts to different semantic types. A concept might be classified under one type or multiple types. For example in the Unified Medical Language System (UMLS) the term migraine is classified under the type disease and syndrome, while the term magnesium is under two types: biologically active substance and element, ion, or isotope. [16] The typing of concepts hones the discovery of connections between particular classes of concepts, i.e. diseases-genes or diseases-drugs. [16]

System evaluation

The evaluation of literature-based discoveries is challenging, and includes both experimental and in silico methods. [45] Methods try to quantify the amount of knowledge generated by systems, that should be provided in an amount and richness that is useful for scientists. [46]

Evaluation is difficult in LBD for several reasons: disagreement about the role of LBD systems in research and thus what makes a successful one; difficulty in determining how useful, interesting or actionable a discovery is; and difficulty in objectively defining a ‘discovery’, which hinders the creation of a standard evaluation set which quantifies when a discovery has been replicated or found. [1]

A popular method used in LBD is to replicate previous discoveries. [4] [47] [48] These are usually LBD-based discoveries as they are relatively easy to quantify compared to other discoveries. There are only a handful of such discoveries and approaches tuned to perform well on these discoveries might not generalise. In this type of evaluation, the literature before the discovery to be replicated is used to generate a ranked list of discovery candidates as target or linking terms. Success is measured by reporting the rank of the term(s) of interest; the higher the rank, the better the approach.

Literature- or time-slicing involves splitting the existing literature at a point in time. The LBD system is then exposed to the literature before the split and is evaluated by how many of the discoveries in the later period it can discover. LBD systems have used term co-occurrences, [49] relationships from external biomedical resources (e.g SemMedDB) [50] and semantic relationships [51] to generate the gold standards. A high precision approach is to get expert opinion to generate the gold standard, [52] but this is time-consuming, expensive and tends to produce low recall rates. [1]

The advantage of time-slicing in comparison to the replication of previous discoveries is the evaluation on a large number of test instances. This raises the need for evaluation metrics which can quantify performance on large, ranked lists. [1] LBD works have used metrics popular in Information Retrieval [53] which include Precision, Recall, Area Under the Curve (AUC), Precision at k, Mean Average Precision (MAP) and others. [1]

The approach of Proposing new discoveriesor treatments goes beyond replicating past discoveries or predicting time-sliced instances of a particular relationship and shows that a system is capable of being used in realistic situations. [54] [47] [55] [56] This is usually accompanied by peer-reviewed publication in the domain or vetting by a domain expert. [1]

Text mining

Gene name normalization, an important step in LBD when dealing with genes Gene name normalization.jpg
Gene name normalization, an important step in LBD when dealing with genes

The automation of literature-based discovery relies heavily on text mining. [58]

The language in scientific articles often include ambiguities, and an important step for coeherent parsing of the literature is the extraction of the sense of each term in the context they are used, a task called Word-sense disambiguation (WSD). [59] For example, terms for genes like CT (PCYT1A) called and MR (NR3C2) can be confused with the acronyms for Computational Tomography and Magnetic Resonance, requiring sofisticated disambiguation systems. [60] Terms are often reconciled to ontologies or other sources of unique identifiers, such as the Unified Medical Language System (UMLS). [61] This process of mapping multiple different utterances to a single name or identifier is called normalization. [57]

Usage

Life sciences

LBD has already been used in different ways to identify new connections between biomedical entities and new candidate genes and treatments for illnesses. [62] [1]

Drug discovery

LBD has seen use in drug development and repurposing [54] [63] as well as predicting adverse drug reactions. [64] [65] [1]

The method of literature-based discovery has been used to search for treatments for a number of human diseases, including:

Gene and protein function discovery

The approach has also been used to propose relations of genes with particular diseases, [70] like breast cancer. [71]

In the context of systems vaccinology, it was used to identify proteins related to interferon gamma and that play a role in the response to vaccines. [57]

It has also been used to propose mechanisms for currently used drugs. [72]

Biomarker discovery

LBD has been explored as a tool to identify biomarkers for diagnostic and prognostic for diseases, e.g. for the risk of type 2 diabetes. [73]

Other uses

Besides providing scientific hypotheses about the world, LBD has also been used to improve data analysis, via the automatic identification of possible confounding factors using the medical literature. [74]

It has also been used to understand better disease etiology and the relation of different diseases, for example looking for the genes connecting myocardial infarction and depression, [75] and connections between psychiatric and somatic diseases. [76]

Beyond life sciences

LBD has mostly been deployed in the biomedical domain, but it has also been used outside of it as it has been applied to research into developing water purification systems, accelerating development of developing countries and identifying promising research collaborations. [77] [78] [79]

See also

Additional reading

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

The Unified Medical Language System (UMLS) is a compendium of many controlled vocabularies in the biomedical sciences. It provides a mapping structure among these vocabularies and thus allows one to translate among the various terminology systems; it may also be viewed as a comprehensive thesaurus and ontology of biomedical concepts. UMLS further provides facilities for natural language processing. It is intended to be used mainly by developers of systems in medical informatics.

Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated in documents.

Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies in this field have been applied to the biomedical literature available through services such as PubMed.

The Open Biological and Biomedical Ontologies (OBO) Foundry is a group of people dedicated to build and maintain ontologies related to the life sciences. The OBO Foundry establishes a set of principles for ontology development for creating a suite of interoperable reference ontologies in the biomedical domain. Currently, there are more than a hundred ontologies that follow the OBO Foundry principles.

<span class="mw-page-title-main">Alan Rector</span> British computer scientist

Alan L. Rector is a Professor of Medical Informatics in the Department of Computer Science at the University of Manchester in the UK.

The National Centre for Text Mining (NaCTeM) is a publicly funded text mining (TM) centre. It was established to provide support, advice and information on TM technologies and to disseminate information from the larger TM community, while also providing services and tools in response to the requirements of the United Kingdom academic community.

<span class="mw-page-title-main">Lawrence Hunter</span>

Lawrence E. Hunter is a Professor and Director of the Center for Computational Pharmacology and of the Computational Bioscience Program at the University of Colorado School of Medicine and Professor of Computer Science at the University of Colorado Boulder. He is an internationally known scholar, focused on computational biology, knowledge-driven extraction of information from the primary biomedical literature, the semantic integration of knowledge resources in molecular biology, and the use of knowledge in the analysis of high-throughput data, as well as for his foundational work in computational biology, which led to the genesis of the major professional organization in the field and two international conferences.

Anne O'Tate is a free, web-based application that analyses sets of records identified on PubMed, the bibliographic database of articles from over 5,500 biomedical journals worldwide. While PubMed has its own wide range of search options to identify sets of records relevant to a researchers query it lacks the ability to analyse these sets of records further, a process for which the terms text mining and drill down have been used. Anne O'Tate is able to perform such analysis and can process sets of up to 25,000 PubMed records.

Translational bioinformatics (TBI) is a field that emerged in the 2010s to study health informatics, focused on the convergence of molecular bioinformatics, biostatistics, statistical genetics and clinical informatics. Its focus is on applying informatics methodology to the increasing amount of biomedical and genomic data to formulate knowledge and medical tools, which can be utilized by scientists, clinicians, and patients. Furthermore, it involves applying biomedical research to improve human health through the use of computer-based information system. TBI employs data mining and analyzing biomedical informatics in order to generate clinical knowledge for application. Clinical knowledge includes finding similarities in patient populations, interpreting biological information to suggest therapy treatments and predict health outcomes.

A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics.

Arrowsmith was a literature-based discovery system built by Don R. Swanson using the concept of undiscovered public knowledge. He called it Arrowsmith: ‘An intellectual adventure’

"Imagine that the pieces of a puzzle are independently designed and created, and that, when retrieved and assembled, they then reveal a pattern – undesigned, unintended, and never before seen, yet a pattern that commands interest and invites interpretation. So it is, I claim, that independently created pieces of knowledge can harbor an unseen, unknown, and unintended pattern. And so it is that the world of recorded knowledge can yield genuinely new discoveries"

Cathy H. Wu is the Edward G. Jefferson Chair and professor and director of the Center for Bioinformatics & Computational Biology (CBCB) at the University of Delaware. She is also the director of the Protein Information Resource (PIR) and the North east Bioinformatics Collaborative Steering Committee, and the adjunct professor at the Georgetown University Medical Center.

Dr. Fabio Rinaldi is head of NLP research at IDSIA, Switzerland. He earned his PhD in Computational Linguistics from the University of Zurich, Switzerland in 2008. He continued to work at the University of Zurich as a lecturer, senior researcher and group leader until 2020.

Nanoinformatics is the application of informatics to nanotechnology. It is an interdisciplinary field that develops methods and software tools for understanding nanomaterials, their properties, and their interactions with biological entities, and using that information more efficiently. It differs from cheminformatics in that nanomaterials usually involve nonuniform collections of particles that have distributions of physical properties that must be specified. The nanoinformatics infrastructure includes ontologies for nanomaterials, file formats, and data repositories.

Biocuration is the field of life sciences dedicated to organizing biomedical data, information and knowledge into structured formats, such as spreadsheets, tables and knowledge graphs. The biocuration of biomedical knowledge is made possible by the cooperative work of biocurators, software developers and bioinformaticians and is at the base of the work of biological databases.

<span class="mw-page-title-main">Biological data</span>

Biological data refers to a compound or information derived from living organisms and their products. A medicinal compound made from living organisms, such as a serum or a vaccine, could be characterized as biological data. Biological data is highly complex when compared with other forms of data. There are many forms of biological data, including text, sequence data, protein structure, genomic data and amino acids, and links among others.

<span class="mw-page-title-main">Noémie Elhadad</span> American data scientist and academic

Noémie Elhadad is an American data scientist who is an associate professor of Biomedical Informatics at the Columbia University Vagelos College of Physicians and Surgeons. As of 2022, she serves as the Chair of the Department of Biomedical Informatics. Her research considers machine learning in bioinformatics, natural language processing and medicine.

Suzanne B. Bakken Henry is an American nurse who is a professor of biomedical informatics at Columbia University. Her research considers health equity and informatics. She is a Fellow of the New York Academy of Medicine, American College of Medical Informatics and American Academy of Nursing.

References

  1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Crichton, Gamal; Baker, Simon; Guo, Yufan; Korhonen, Anna (2020-05-15). "Neural networks for open and closed Literature-based Discovery". PLOS ONE. 15 (5): e0232891. Bibcode:2020PLoSO..1532891C. doi: 10.1371/JOURNAL.PONE.0232891 . PMC   7228051 . PMID   32413059. Creative Commons by small.svg  This article incorporates text available under the CC BY 4.0 license.
  2. Smalheiser, Neil R; Swanson, Don R (November 1998). "Using Arrowsmith: a computer-assisted approach to formulating and assessing scientific hypotheses". Computer Methods and Programs in Biomedicine. 57 (3): 149–153. doi:10.1016/s0169-2607(98)00033-9. ISSN   0169-2607. PMID   9822851.
  3. Gordon, Michael D.; Lindsay, Robert K. (February 1996). "Toward discovery support systems: A replication, re-examination, and extension of Swanson's work on literature-based discovery of a connection between Raynaud's and fish oil". Journal of the American Society for Information Science. 47 (2): 116–128. doi:10.1002/(sici)1097-4571(199602)47:2<116::aid-asi3>3.0.co;2-1. ISSN   0002-8231.
  4. 1 2 Cohen, Trevor; Schvaneveldt, Roger; Widdows, Dominic (April 2010). "Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections". Journal of Biomedical Informatics. 43 (2): 240–256. doi: 10.1016/j.jbi.2009.09.003 . ISSN   1532-0464. PMID   19761870.
  5. Smalheiser, Neil R. (2017-12-01). "Rediscovering Don Swanson:The Past, Present and Future of Literature-based Discovery". Journal of Data and Information Science. 2 (4): 43–64. doi:10.1515/jdis-2017-0019. PMC   5771422 . PMID   29355246.
  6. 1 2 Swanson, Don R. (1986). "Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge". Perspectives in Biology and Medicine. 30 (1): 7–18. doi:10.1353/pbm.1986.0087. ISSN   1529-8795. PMID   3797213. S2CID   33675760.
  7. Ricco, Jean Baptiste (May 1990). "Fish-oil dietary supplementation in patients with Raynaud's phenomenon: a double blind, controlled, prospective study". Journal of Vascular Surgery. 11 (5): 733–734. doi: 10.1016/0741-5214(90)90229-4 . ISSN   0741-5214.
  8. Swanson, Don R. (1988). "Migraine and Magnesium: Eleven Neglected Connections". Perspectives in Biology and Medicine. 31 (4): 526–557. doi:10.1353/pbm.1988.0009. ISSN   1529-8795. PMID   3075738. S2CID   12482481.
  9. Swanson, Don R. (1990). "Somatomedin C and Arginine: Implicit Connections between Mutually Isolated Literatures". Perspectives in Biology and Medicine. 33 (2): 157–186. doi:10.1353/pbm.1990.0031. ISSN   1529-8795. PMID   2406696. S2CID   41205674.
  10. Smalheiser, Neil R.; Swanson, Don R. (September 1996). "Linking estrogen to Alzheimer's disease". Neurology. 47 (3): 809–810. doi:10.1212/wnl.47.3.809. ISSN   0028-3878. PMID   8797484. S2CID   9636182.
  11. Stegmann J, Grohmann G. Hypothesis generation guided by co-word clustering. Scientometrics. 2003;56:111–135. As quoted by Bekhuis
  12. Bekhuis, Tanja (2006). "Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy". Biomedical Digital Libraries. 3: 2. doi: 10.1186/1742-5581-3-2 . PMC   1459187 . PMID   16584552.
  13. Smalheiser, Neil R. (2011-07-26). "Literature-based discovery: Beyond the ABCs". Journal of the Association for Information Science and Technology. 63 (2): 218–224. doi:10.1002/ASI.21599.
  14. Yetisgen-Yildiz, Meliha; Pratt, Wanda (2008-12-16). "A new evaluation methodology for literature-based discovery systems". Journal of Biomedical Informatics. 42 (4): 633–643. doi: 10.1016/J.JBI.2008.12.001 . PMID   19124086.
  15. Hur, Junguk; Schuyler, Adam D.; States, David J.; Feldman, Eva L. (2009-02-02). "SciMiner: web-based literature mining tool for target identification and functional enrichment analysis". Bioinformatics. 25 (6): 838–840. doi:10.1093/bioinformatics/btp049. ISSN   1460-2059. PMC   2654801 . PMID   19188191.
  16. 1 2 3 4 Yetisgen-Yildiz, Meliha; Pratt, Wanda (2006-01-04). "Using statistical and knowledge-based approaches for literature-based discovery". Journal of Biomedical Informatics. 39 (6): 600–611. doi: 10.1016/J.JBI.2005.11.010 . PMID   16442852.
  17. Smalheiser, Neil R.; Torvik, Vetle I. (2008), Bruza, Peter; Weeber, Marc (eds.), "The Place of Literature-Based Discovery in Contemporary Scientific Practice", Literature-based Discovery, Information Science and Knowledge Management, Berlin, Heidelberg: Springer, pp. 13–22, Bibcode:2008lbd..book...13S, doi:10.1007/978-3-540-68690-3_2, ISBN   978-3-540-68690-3
  18. "ARROWSMITH: Start". arrowsmith.psych.uic.edu. Retrieved 2022-03-04.
  19. 1 2 Pyysalo, Sampo; Baker, Simon; Ali, Imran; Haselwimmer, Stefan; Shah, Tejas; Young, Andrew; Guo, Yufan; Högberg, Johan; Stenius, Ulla; Narita, Masashi; Korhonen, Anna (2018-10-09). "LION LBD: a literature-based discovery system for cancer biology". Bioinformatics. 35 (9): 1553–1561. doi:10.1093/bioinformatics/bty845. ISSN   1367-4803. PMC   6499247 . PMID   30304355.
  20. Wei, Chih-Hsuan; Kao, Hung-Yu; Lu, Zhiyong (2013-05-22). "PubTator: a web-based text mining tool for assisting biocuration". Nucleic Acids Research. 41 (W1): W518–W522. doi:10.1093/nar/gkt441. ISSN   1362-4962. PMC   3692066 . PMID   23703206.
  21. Baker, Simon; Ali, Imran; Silins, Ilona; Pyysalo, Sampo; Guo, Yufan; Högberg, Johan; Stenius, Ulla; Korhonen, Anna (2017-07-14). "Cancer Hallmarks Analytics Tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer". Bioinformatics. 33 (24): 3973–3981. doi:10.1093/bioinformatics/btx454. ISSN   1367-4803. PMC   5860084 . PMID   29036271.
  22. Cameron, Delroy; Kavuluru, Ramakanth; Rindflesch, Thomas C.; Sheth, Amit P.; Thirunarayan, Krishnaprasad; Bodenreider, Olivier (2015-02-07). "Context-driven automatic subgraph creation for literature-based discovery". Journal of Biomedical Informatics. 54: 141–157. doi:10.1016/J.JBI.2015.01.014. PMC   4888806 . PMID   25661592.
  23. Hristovski, Dimitar; Kastrin, Andrej; Dinevski, Dejan; Rindflesch, Thomas C. (2015-01-01). "Constructing a Graph Database for Semantic Literature-Based Discovery". Studies in Health Technology and Informatics. 216: 1094. PMID   26262393.
  24. Preiss, Judita; Stevenson, Mark; Gaizauskas, Robert (2015-05-13). "Exploring relation types for literature-based discovery". Journal of the American Medical Informatics Association. 22 (5): 987–992. doi:10.1093/JAMIA/OCV002. PMC   4986660 . PMID   25971437.
  25. Kim, Yong Hwan; Song, Min (2019-04-24). "A context-based ABC model for literature-based discovery". PLOS ONE. 14 (4): e0215313. Bibcode:2019PLoSO..1415313K. doi: 10.1371/JOURNAL.PONE.0215313 . PMC   6481912 . PMID   31017923.
  26. Thomas, Paul D.; Hill, David P.; Mi, Huaiyu; Osumi-Sutherland, David; Auken, Kimberly Van; Carbon, Seth J.; Balhoff, James P.; Albou, Laurent-Philippe; Good, Benjamin M.; Gaudet, Pascale; Lewis, Suzanna (2019-10-01). "Gene Ontology Causal Activity Modeling (GO-CAM) moves beyond GO annotations to structured descriptions of biological functions and systems". Nature Genetics. 51 (10): 1429–1433. doi:10.1038/S41588-019-0500-1. PMC   7012280 . PMID   31548717.
  27. Hristovski, Dimitar; Peterlin, Borut; Mitchell, Joyce A.; Humphrey, Susanne M. (2003-01-01). "Improving literature based discovery support by genetic knowledge integration". Studies in Health Technology and Informatics. 95: 68–73. PMID   14663965.
  28. 1 2 Jelier, Rob; Schuemie, Martijn J.; Schuemie, Martijn J.; Veldhoven, Antoine; Dorssers, Lambert C. J.; Jenster, Guido; Kors, Jan A.; Kors, Jan A. (2008-06-12). "Anni 2.0: a multipurpose text-mining tool for the life sciences". Genome Biology. 9 (6): R96. doi: 10.1186/GB-2008-9-6-R96 . PMC   2481428 . PMID   18549479.
  29. Gopalakrishnan, Vishrawas; Jha, Kishlay; Jin, Wei; Zhang, Aidong (2019-05-01). "A survey on literature based discovery approaches in biomedical domain". Journal of Biomedical Informatics. 93: 103141. doi: 10.1016/j.jbi.2019.103141 . ISSN   1532-0464. PMID   30857950.
  30. Hristovski, Dimitar; Džeroski, Sašo; Peterlin, Borut; Rožić, Anamajirja (2000), "Supporting Discovery in Medicine by Association Rule Mining of Bibliographic Databases", Principles of Data Mining and Knowledge Discovery, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 446–451, doi: 10.1007/3-540-45372-5_49 , ISBN   978-3-540-41066-9
  31. Weeber, Marc; Klein, Henny; de Jong-van den Berg, Lolkje T.W.; Vos, Rein (2001). "Using concepts in literature-based discovery: Simulating Swanson's Raynaud-fish oil and migraine-magnesium discoveries". Journal of the American Society for Information Science and Technology. 52 (7): 548–557. doi:10.1002/asi.1104. ISSN   1532-2882.
  32. Pratt, Wanda; Yetisgen-Yildiz, Meliha (2003). "LitLinker". Proceedings of the 2nd international conference on Knowledge capture. New York, New York, USA: ACM Press. p. 105. doi:10.1145/945645.945662. ISBN   1581135831. S2CID   2221335.
  33. van der Eijk, C. Christiaan; van Mulligen, Erik M.; Kors, Jan A.; Mons, Barend; van den Berg, Jan (2004). "Constructing an associative concept space for literature-based discovery". Journal of the American Society for Information Science and Technology. 55 (5): 436–444. doi:10.1002/asi.10392. ISSN   1532-2882.
  34. Srinivasan, P.; Libbus, B. (2004-07-19). "Mining MEDLINE for implicit links between dietary substances and diseases". Bioinformatics. 20 (Suppl 1): i290–i296. doi:10.1093/bioinformatics/bth914. ISSN   1367-4803. PMID   15262811.
  35. Wren, Jonathan D (2004). "Extending the mutual information measure to rank inferred literature relationships". BMC Bioinformatics. 5 (1): 145. doi: 10.1186/1471-2105-5-145 . PMC   526381 . PMID   15471547.
  36. Hristovski, Dimitar; Peterlin, Borut; Mitchell, Joyce A.; Humphrey, Susanne M. (March 2005). "Using literature-based discovery to identify disease candidate genes". International Journal of Medical Informatics. 74 (2–4): 289–298. doi:10.1016/j.ijmedinf.2004.04.024. ISSN   1386-5056. PMID   15694635.
  37. Yetisgen-Yildiz, Meliha; Pratt, Wanda (December 2006). "Using statistical and knowledge-based approaches for literature-based discovery". Journal of Biomedical Informatics. 39 (6): 600–611. doi: 10.1016/j.jbi.2005.11.010 . ISSN   1532-0464. PMID   16442852.
  38. Torvik, Vetle I.; Smalheiser, Neil R. (2007-04-26). "A quantitative model for linking two disparate sets of articles in MEDLINE". Bioinformatics. 23 (13): 1658–1665. doi: 10.1093/bioinformatics/btm161 . ISSN   1460-2059. PMID   17463015.
  39. Frijters, R.; Heupers, B.; van Beek, P.; Bouwhuis, M.; van Schaik, R.; de Vlieg, J.; Polman, J.; Alkema, W. (2008-05-19). "CoPub: a literature-based keyword enrichment tool for microarray data analysis". Nucleic Acids Research. 36 (Web Server): W406–W410. doi:10.1093/nar/gkn215. ISSN   0305-1048. PMC   2447728 . PMID   18442992.
  40. Petriĕ, Ingrid; Urbanĕiĕ, Tanja; Cestnik, Bojan; Macedoni-Lukšiĕ, Marta (April 2009). "Literature mining method RaJoLink for uncovering relations between biomedical concepts". Journal of Biomedical Informatics. 42 (2): 219–227. doi: 10.1016/j.jbi.2008.08.004 . ISSN   1532-0464. PMID   18771753.
  41. Hristovski, Dimitar; Kastrin, Andrej; Peterlin, Borut; Rindflesch, Thomas C. (2010), "Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation", Linking Literature, Information, and Knowledge for Biology, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 53–61, doi:10.1007/978-3-642-13131-8_7, ISBN   978-3-642-13130-1, S2CID   8957416
  42. Cameron, Delroy; Kavuluru, Ramakanth; Rindflesch, Thomas C.; Sheth, Amit P.; Thirunarayan, Krishnaprasad; Bodenreider, Olivier (April 2015). "Context-driven automatic subgraph creation for literature-based discovery". Journal of Biomedical Informatics. 54: 141–157. doi:10.1016/j.jbi.2015.01.014. ISSN   1532-0464. PMC   4888806 . PMID   25661592.
  43. Workman, T. Elizabeth; Fiszman, Marcelo; Cairelli, Michael J.; Nahl, Diane; Rindflesch, Thomas C. (2016-04-01). "Spark, an application based on Serendipitous Knowledge Discovery". Journal of Biomedical Informatics. 60: 23–37. doi: 10.1016/j.jbi.2015.12.014 . ISSN   1532-0464. PMID   26732995.
  44. Peng, Yufang; Bonifield, Gary; Smalheiser, Neil R. (2017-05-22). "Gaps within the Biomedical Literature: Initial Characterization and Assessment of Strategies for Discovery". Frontiers in Research Metrics and Analytics. 2. doi: 10.3389/frma.2017.00003 . ISSN   2504-0537. PMC   5736374 . PMID   29271976.
  45. Henry, M. S. Sam; McInnes, Bridget T. (2017-08-21). "Literature Based Discovery: models, methods, and trends". Journal of Biomedical Informatics. 74: 20–32. doi: 10.1016/J.JBI.2017.08.011 . PMID   28838802.
  46. Preiss, Judita; Stevenson, Mark (2017-05-31). "Quantifying and filtering knowledge generated by literature based discovery". BMC Bioinformatics. 18 (Suppl 7): 249. doi: 10.1186/S12859-017-1641-9 . PMC   5471938 . PMID   28617217.
  47. 1 2 Swanson, Don R.; Smalheiser, Neil R. (April 1997). "An interactive system for finding complementary literatures: a stimulus to scientific discovery". Artificial Intelligence. 91 (2): 183–203. doi: 10.1016/s0004-3702(97)00008-8 . ISSN   0004-3702.
  48. R. Weeber; M. Klein; H. Aronson; A. R. Mork; J. G. de Jong-van den Berg; L. T. Vos (2000). "Text-based discovery in biomedicine: the architecture of the DAD-system". Proceedings. AMIA Symposium. American Medical Informatics Association: 903–907. OCLC   678976989. PMC   2243779 . PMID   11080015.
  49. Hristovski, Dimitar; Džeroski, Sašo; Peterlin, Borut; Rožić, Anamajirja (2000), "Supporting Discovery in Medicine by Association Rule Mining of Bibliographic Databases", Principles of Data Mining and Knowledge Discovery, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 446–451, doi: 10.1007/3-540-45372-5_49 , ISBN   978-3-540-41066-9
  50. Eronen, Lauri; Hintsanen, Petteri; Toivonen, Hannu (2012), "Biomine: A Network-Structured Resource of Biological Entities for Link Prediction", Bisociative Knowledge Discovery, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 364–378, doi: 10.1007/978-3-642-31830-6_26 , ISBN   978-3-642-31829-0
  51. Preiss, Judita; Stevenson, Mark; Gaizauskas, Robert (2015-05-12). "Exploring relation types for literature-based discovery". Journal of the American Medical Informatics Association. 22 (5): 987–992. doi:10.1093/jamia/ocv002. ISSN   1527-974X. PMC   4986660 . PMID   25971437.
  52. Yetisgen-Yildiz, Meliha; Pratt, Wanda (August 2009). "A new evaluation methodology for literature-based discovery systems". Journal of Biomedical Informatics. 42 (4): 633–643. doi: 10.1016/j.jbi.2008.12.001 . ISSN   1532-0464. PMID   19124086.
  53. Yetisgen-Yildiz, M.; Pratt, W. (2008), "Evaluation of Literature-Based Discovery Systems", Literature-based Discovery, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 101–113, Bibcode:2008lbd..book..101Y, doi:10.1007/978-3-540-68690-3_7, ISBN   978-3-540-68685-9
  54. 1 2 Hristovski, Dimitar; Kastrin, Andrej; Peterlin, Borut; Rindflesch, Thomas C. (2010), "Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation", Linking Literature, Information, and Knowledge for Biology, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 53–61, doi:10.1007/978-3-642-13131-8_7, ISBN   978-3-642-13130-1, S2CID   8957416
  55. Stegmann, Johannes; Grohmann, Guenter (2003). "Hypothesis generation guided by co-word clustering". Scientometrics. 56 (1): 111–135. doi:10.1023/A:1021954808804. S2CID   14362816.
  56. Wren, J. D.; Bekeredjian, R.; Stewart, J. A.; Shohet, R. V.; Garner, H. R. (2004-01-22). "Knowledge discovery by automated identification and ranking of implicit relationships". Bioinformatics. 20 (3): 389–398. doi: 10.1093/bioinformatics/btg421 . ISSN   1367-4803. PMID   14960466.
  57. 1 2 3 Ozgür, Arzucan; Xiang, Zuoshuang; Radev, Dragomir R.; He, Yongqun (2010-06-03). "Literature-based discovery of IFN-gamma and vaccine-mediated gene interaction networks". Journal of Biomedicine and Biotechnology. 2010: 426479. doi: 10.1155/2010/426479 . PMC   2896678 . PMID   20625487.
  58. Korhonen, Anna; Guo, Yufan; Baker, Simon; Yetisgen-Yildiz, Meliha; Stenius, Ulla; Narita, Masashi; Liò, Pietro (2015-01-01). "Improving Literature-Based Discovery with Advanced Text Mining". Computational Intelligence Methods for Bioinformatics and Biostatistics. Lecture Notes in Computer Science. Vol. 8623. pp. 89–98. doi:10.1007/978-3-319-24462-4_8. ISBN   978-3-319-24461-7.
  59. Preiss, Judita; Stevenson, Mark (July 2016). "The effect of word sense disambiguation accuracy on literature based discovery". BMC Medical Informatics and Decision Making. 16 (S1): 57. doi: 10.1186/s12911-016-0296-1 . ISSN   1472-6947. PMC   4959388 . PMID   27455071. S2CID   45296293.
  60. Kastrin, Andrej; Hristovski, Dimitar (2008-11-06). "A fast document classification algorithm for gene symbol disambiguation in the BITOLA literature-based discovery support system". AMIA Annual Symposium Proceedings. 2008: 358–362. PMC   2655979 . PMID   18998999.
  61. 1 2 Gabetta, Matteo; Larizza, Cristiana; Bellazzi, Riccardo (2013-01-01). "A Unified Medical Language System (UMLS) based system for Literature-Based Discovery in medicine". Studies in Health Technology and Informatics. 192: 412–416. PMID   23920587.
  62. Hristovski, Dimitar; Rindflesch, Thomas; Peterlin, Borut (2013-01-01). "Using Literature-based Discovery to Identify Novel Therapeutic Approaches". Cardiovascular & Hematological Agents in Medicinal Chemistry. 11 (1): 14–24. doi:10.2174/1871525711311010005. ISSN   1871-5257. PMID   22845900.
  63. 1 2 Zhang, Rui; Cairelli, Michael J.; Fiszman, Marcelo; Kilicoglu, Halil; Rindflesch, Thomas C.; Pakhomov, Serguei V.; Melton, Genevieve B. (January 2014). "Exploiting Literature-derived Knowledge and Semantics to Identify Potential Prostate Cancer Drugs". Cancer Informatics. 13s1 (Suppl 1): 103–111. doi:10.4137/cin.s13889. ISSN   1176-9351. PMC   4216049 . PMID   25392688.
  64. Benzschawel, Eric (2016). "Identifying Potential Adverse Drug Events in Tweets Using Bootstrapped Lexicons". Proceedings of the ACL 2016 Student Research Workshop. Stroudsburg, PA, USA: Association for Computational Linguistics: 15–21. doi: 10.18653/v1/p16-3003 . S2CID   3008644.
  65. Shang, Ning; Xu, Hua; Rindflesch, Thomas C.; Cohen, Trevor (December 2014). "Identifying plausible adverse drug reactions using knowledge extracted from the literature". Journal of Biomedical Informatics. 52: 293–310. doi:10.1016/j.jbi.2014.07.011. ISSN   1532-0464. PMC   4261011 . PMID   25046831.
  66. Maver, Ales; Hristovski, Dimitar; Rindflesch, Thomas C.; Peterlin, Borut (2013-11-24). "Integration of Data from Omic Studies with the Literature-Based Discovery towards Identification of Novel Treatments for Neovascularization in Diabetic Retinopathy". BioMed Research International. 2013: e848952. doi: 10.1155/2013/848952 . ISSN   2314-6133. PMC   3857903 . PMID   24350292.
  67. Kostoff, Ronald N.; Briggs, Michael B. (February 2008). "Literature-Related Discovery (LRD): Potential treatments for Parkinson's Disease". Technological Forecasting and Social Change. 75 (2): 226–238. doi:10.1016/j.techfore.2007.11.007. ISSN   0040-1625.
  68. Dong, Weiwei; Liu, Yixuan; Zhu, Weijie; Mou, Quan; Wang, Jinliang; Hu, Yi (2014-06-20). "Simulation of Swanson's literature-based discovery: anandamide treatment inhibits growth of gastric cancer cells in vitro and in silico". PLOS ONE. 9 (6): e100436. Bibcode:2014PLoSO...9j0436D. doi: 10.1371/JOURNAL.PONE.0100436 . PMC   4065097 . PMID   24949851.
  69. Kostoff, Ronald N.; Briggs, Michael B.; Lyons, Terence J. (February 2008). "Literature-related discovery (LRD): Potential treatments for Multiple Sclerosis". Technological Forecasting and Social Change. 75 (2): 239–255. doi:10.1016/j.techfore.2007.11.002. ISSN   0040-1625.
  70. Hristovski, Dimitar; B, Peterlin; S, Dzeroski (2001-01-01). "Literature-based Discovery Support System and Its Application to Disease Gene Identification". Proceedings. AMIA Annual Symposium: 928. PMC   2243305 .
  71. Sarkar, Indra Neil; Agrawal, Abha (2006). "Literature based discovery of gene clusters using phylogenetic methods". AMIA ... Annual Symposium Proceedings. AMIA Symposium. 2006: 689–693. ISSN   1942-597X. PMC   1839645 . PMID   17238429.
  72. Ahlers, Caroline B.; Hristovski, Dimitar; Kilicoglu, Halil; Rindflesch, Thomas C. (2007-10-11). "Using the literature-based discovery paradigm to investigate drug mechanisms". AMIA ... Annual Symposium Proceedings. AMIA Symposium. 2007: 6–10. ISSN   1942-597X. PMC   2655783 . PMID   18693787.
  73. Srinivasan, Mythily; Blackburn, Corinne; Mohamed, Mohamed; Sivagami, A. V.; Blum, Janice S. (2015-05-14). "Literature-based discovery of salivary biomarkers for type 2 diabetes mellitus". Biomarker Insights. 10: 39–45. doi:10.4137/BMI.S22177. PMC   4433061 . PMID   26005324.
  74. Malec, Scott A.; Wei, Peng; Xu, Hua; Bernstam, Elmer V.; Myneni, Sahiti; Cohen, Trevor (2016-01-01). "Literature-Based Discovery of Confounding in Observational Clinical Data". AMIA Annual Symposium Proceedings. 2016: 1920–1929. PMC   5333204 . PMID   28269951.
  75. Dai, Zhenguo; Li, Qian; Yang, Guang; Wang, Yini; Liu, Yang; Zheng, Zhilei; Tu, Yingfeng; Yang, Shuang; Yu, Bo (2019-06-11). "Using literature-based discovery to identify candidate genes for the interaction between myocardial infarction and depression". BMC Medical Genetics. 20 (1): 104. doi: 10.1186/S12881-019-0841-8 . PMC   6560897 . PMID   31185929.
  76. Vos, Rein; Aarts, Sil; Mulligen, Erik M. van; Metsemakers, Job; Boxtel, Martin P. van; Verhey, Frans RJ; Akker, Marjan van den (2013-06-17). "Finding potentially new multimorbidity patterns of psychiatric and somatic diseases: exploring the use of literature-based discovery in primary care research". Journal of the American Medical Informatics Association. 21 (1): 139–145. doi:10.1136/AMIAJNL-2012-001448. PMC   3912726 . PMID   23775174.
  77. Kostoff, Ronald N.; Solka, Jeffrey L.; Rushenberg, Robert L.; Wyatt, Jeffrey A. (February 2008). "Literature-related discovery (LRD): Water purification". Technological Forecasting and Social Change. 75 (2): 256–275. doi:10.1016/j.techfore.2007.11.009. ISSN   0040-1625.
  78. Gordon, M. D.; Awad, N. F. (2008), "The Tip of the Iceberg: The Quest for Innovation at the Base of the Pyramid", Literature-based Discovery, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 23–37, Bibcode:2008lbd..book...23G, doi:10.1007/978-3-540-68690-3_3, ISBN   978-3-540-68685-9
  79. Hristovski, Dimitar; Kastrin, Andrej; Rindflesch, Thomas C. (2015-08-25). "Semantics-Based Cross-domain Collaboration Recommendation in the Life Sciences". Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. New York, NY, USA: ACM. pp. 805–806. doi:10.1145/2808797.2809300. ISBN   9781450338547. S2CID   8079114.