Materials informatics

Last updated

Materials informatics is a field of study that applies the principles of informatics and data science to materials science and engineering to improve the understanding, use, selection, development, and discovery of materials. The term "materials informatics" is frequently used interchangeably with "data science", "machine learning", and "artificial intelligence" by the community. This is an emerging field, with a goal to achieve high-speed and robust acquisition, management, analysis, and dissemination of diverse materials data with the goal of greatly reducing the time and risk required to develop, produce, and deploy new materials, which generally takes longer than 20 years. [1] [2] [3] This field of endeavor is not limited to some traditional understandings of the relationship between materials and information. Some more narrow interpretations include combinatorial chemistry, process modeling, materials databases, materials data management, and product life cycle management. Materials informatics is at the convergence of these concepts, but also transcends them and has the potential to achieve greater insights and deeper understanding by applying lessons learned from data gathered on one type of material to others. By gathering appropriate meta data, the value of each individual data point can be greatly expanded.

Contents

Databases

Databases are essential for any informatics research and applications. In material informatics many databases exist containing both empirical data obtained experimentally, and theoretical data obtained computationally. Big data that can be used for machine learning is particularly difficult to obtain for experimental data due to the lack of a standard for reporting data and the variability in the experimental environment. This lack of big data has led to growing effort in developing machine learning techniques that utilize data extremely data sets. On the other hand, large uniform database of theoretical density functional theory (DFT) calculations exists. These databases have proven their utility in high-throughput material screening and discovery. Some common DFT databases and high throughput tools are listed below:

Beyond computational methods?

The concept of materials informatics is addressed by the Materials Research Society. For example, materials informatics was the theme of the December 2006 issue of the MRS Bulletin. The issue was guest-edited by John Rodgers of Innovative Materials, Inc., and David Cebon of Cambridge University, who described the "high payoff for developing methodologies that will accelerate the insertion of materials, thereby saving millions of investment dollars."

The editors focused on the limited definition of materials informatics as primarily focused on computational methods to process and interpret data. They stated that "specialized informatics tools for data capture, management, analysis, and dissemination" and "advances in computing power, coupled with computational modeling and simulation and materials properties databases" will enable such accelerated insertion of materials.

A broader definition of materials informatics goes beyond the use of computational methods to carry out the same experimentation, [4] viewing materials informatics as a framework in which a measurement or computation is one step in an information-based learning process that uses the power of a collective to achieve greater efficiency in exploration. When properly organized, this framework crosses materials boundaries to uncover fundamental knowledge of the basis of physical, mechanical, and engineering [5] properties.

Challenges

While there are many who believe in the future of informatics in the materials development and scaling process, many challenges remain. Hill, et al., write that "Today, the materials community faces serious challenges to bringing about this data-accelerated research paradigm, including diversity of research areas within materials, lack of data standards, and missing incentives for sharing, among others. Nonetheless, the landscape is rapidly changing in ways that should benefit the entire materials research enterprise." [6] This remaining tension between traditional materials development methodologies and the use of more computationally, machine learning, and analytics approaches will likely exist for some time as the materials industry overcomes some of the cultural barriers necessary to fully embrace such new ways of thinking.

Analogy from Biology

The overarching goals of bioinformatics and systems biology may provide a useful analogy. Andrew Murray of Harvard University expresses the hope that such an approach "will save us from the era of "one graduate student, one gene, one PhD". [7] Similarly, the goal of materials informatics is to save us from one graduate student, one alloy, one PhD. Such goals will require more sophisticated strategies and research paradigms than applying data-science methods to the same tasks set currently undertaken by students.

See also

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Computational chemistry</span> Branch of chemistry

Computational chemistry is a branch of chemistry that uses computer simulations to assist in solving chemical problems. It uses methods of theoretical chemistry incorporated into computer programs to calculate the structures and properties of molecules, groups of molecules, and solids. The importance of this subject stems from the fact that, with the exception of some relatively recent findings related to the hydrogen molecular ion, achieving an accurate quantum mechanical depiction of chemical systems analytically, or in a closed form, is not feasible. The complexity inherent in the many-body problem exacerbates the challenge of providing detailed descriptions of quantum mechanical systems. While computational results normally complement information obtained by chemical experiments, it can occasionally predict unobserved chemical phenomena.

<span class="mw-page-title-main">Computational biology</span> Branch of biology

Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.

<span class="mw-page-title-main">Biological database</span>

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

Biomedical text mining refers to the methods and study of how text mining may be applied to texts and literature of the biomedical domain. As a field of research, biomedical text mining incorporates ideas from natural language processing, bioinformatics, medical informatics and computational linguistics. The strategies in this field have been applied to the biomedical literature available through services such as PubMed.

Neuroinformatics is the field that combines informatics and neuroscience. Neuroinformatics is related with neuroscience data and information processing by artificial neural networks. There are three main directions where neuroinformatics has to be applied:

Erik Bongcam-Rudloff is a Chilean-born Swedish biologist and computer scientist. He received his doctorate in medical sciences from Uppsala University in 1994. He is Professor of Bioinformatics and the head of SLU-Global Bioinformatics Centre at the Swedish University of Agricultural Sciences. His main research deals with development of bioinformatics solutions for the Life Sciences community.

Vasant G. Honavar is an Indian-American computer scientist, and artificial intelligence, machine learning, big data, data science, causal inference, knowledge representation, bioinformatics and health informatics researcher and professor.

<span class="mw-page-title-main">Søren Brunak</span> Danish bioinformatics professor, scientist

Søren Brunak is a Danish biological and physical scientist working in bioinformatics, systems biology, and medical informatics. He is a professor of Disease Systems Biology at the University of Copenhagen and professor of bioinformatics at the Technical University of Denmark. As Research Director at the Novo Nordisk Foundation Center for Protein Research at the University of Copenhagen Medical School, he leads a research effort where molecular-level systems biology data are combined with phenotypic data from the healthcare sector, such as electronic patient records, registry information, and biobank questionnaires. A major aim is to understand the network biology basis for time-ordered comorbidities and discriminate between treatment-related disease correlations and other comorbidities in disease trajectories. Søren Brunak also holds a position as a Medical Informatics Officer at Rigshospitalet, the Capital Region of Denmark.

<span class="mw-page-title-main">International Society for Computational Biology</span> Scholarly society

The International Society for Computational Biology (ISCB) is a scholarly society for researchers in computational biology and bioinformatics. The society was founded in 1997 to provide a stable financial home for the Intelligent Systems for Molecular Biology (ISMB) conference and has grown to become a larger society working towards advancing understanding of living systems through computation and for communicating scientific advances worldwide.

Biology data visualization is a branch of bioinformatics concerned with the application of computer graphics, scientific visualization, and information visualization to different areas of the life sciences. This includes visualization of sequences, genomes, alignments, phylogenies, macromolecular structures, systems biology, microscopy, and magnetic resonance imaging data. Software tools used for visualizing biological data range from simple, standalone programs to complex, integrated systems.

<span class="mw-page-title-main">Clean Energy Project</span> BOINC based World Community Grid volunteer computing subproject

The Clean Energy Project (CEP) was a virtual high-throughput discovery and design effort for the next generation of plastic solar cell materials that has finished. It studies millions of candidate structures to identify suitable compounds for the harvesting of renewable energy from the sun and for other organic electronic applications. It ran on the BOINC platform.

<span class="mw-page-title-main">Lawrence Hunter</span>

Lawrence E. Hunter is a Professor and Director of the Center for Computational Pharmacology and of the Computational Bioscience Program at the University of Colorado School of Medicine and Professor of Computer Science at the University of Colorado Boulder. He is an internationally known scholar, focused on computational biology, knowledge-driven extraction of information from the primary biomedical literature, the semantic integration of knowledge resources in molecular biology, and the use of knowledge in the analysis of high-throughput data, as well as for his foundational work in computational biology, which led to the genesis of the major professional organization in the field and two international conferences.

Translational bioinformatics (TBI) is a field that emerged in the 2010s to study health informatics, focused on the convergence of molecular bioinformatics, biostatistics, statistical genetics and clinical informatics. Its focus is on applying informatics methodology to the increasing amount of biomedical and genomic data to formulate knowledge and medical tools, which can be utilized by scientists, clinicians, and patients. Furthermore, it involves applying biomedical research to improve human health through the use of computer-based information system. TBI employs data mining and analyzing biomedical informatics in order to generate clinical knowledge for application. Clinical knowledge includes finding similarities in patient populations, interpreting biological information to suggest therapy treatments and predict health outcomes.

Gerbrand Ceder is a Belgian–American scientist who is the Daniel M. Tellep Distinguished Professor of Materials Science and Engineering at University of California, Berkeley. He has a joint appointment as a senior faculty scientist in the Materials Sciences Division of Lawrence Berkeley National Laboratory. He is notable for his pioneering research in high-throughput computational materials design, and in the development of novel lithium-ion battery technologies. He is co-founder of the Materials Project, an open-source online database of ab initio calculated material properties, which inspired the Materials Genome Initiative by the Obama administration in 2011. He is also the Founder and CTO of Pellion Technologies, which aims to commercialize magnesium-ion batteries. In 2017 Gerbrand Ceder was elected a member of the National Academy of Engineering, "For the development of practical computational materials design and its application to the improvement of energy storage technology."

Nanoinformatics is the application of informatics to nanotechnology. It is an interdisciplinary field that develops methods and software tools for understanding nanomaterials, their properties, and their interactions with biological entities, and using that information more efficiently. It differs from cheminformatics in that nanomaterials usually involve nonuniform collections of particles that have distributions of physical properties that must be specified. The nanoinformatics infrastructure includes ontologies for nanomaterials, file formats, and data repositories.

<span class="mw-page-title-main">Kristin Persson</span> American physicist and chemist

Kristin Aslaug Persson is a Swedish/Icelandic American physicist and chemist. She was born in Lund, Sweden, in 1971, to Eva Haettner-Aurelius and Einar Benedikt Olafsson. She is a faculty senior staff scientist at Lawrence Berkeley National Laboratory and the Daniel M. Tellep Distinguished Professor of Materials Science and Engineering at University of California, Berkeley. Currently, she is also the director of the Molecular Foundry, a national user facility managed by the US Department of Energy at Lawrence Berkeley National Laboratory. Persson is the director and founder of the Materials Project, a multi-national effort to compute the properties of all inorganic materials. Her research group focuses on the data-driven computational design and prediction of new materials for clean energy production and storage applications.

<span class="mw-page-title-main">Materials Project</span>

The Materials Project is an open-access database offering material properties to accelerate the development of technology by predicting how new materials–both real and hypothetical–can be used. The project was established in 2011 with an emphasis on battery research, but includes property calculations for many areas of clean energy systems such as photovoltaics, thermoelectric materials, and catalysts. Most of the known 35,000 molecules and over 130,000 inorganic compounds are included in the database.

Biomedical data science is a multidisciplinary field which leverages large volumes of data to promote biomedical innovation and discovery. Biomedical data science draws from various fields including Biostatistics, Biomedical informatics, and machine learning, with the goal of understanding biological and medical data. It can be viewed as the study and application of data science to solve biomedical problems. Modern biomedical datasets often have specific features which make their analyses difficult, including:

<span class="mw-page-title-main">Kamal Choudhary</span> Indian American physicist and chemist

Kamal Choudhary is an Indian American physicist and computational materials scientist in the thermodynamics and kinetics group at the National Institute of Standards and Technology. He is most notable for establishing the NIST-JARVIS infrastructure for data-driven materials design and Materials informatics. He is also an associate editor of the journals npj Computational Materials and Scientific Data.

References

  1. Mulholland, Gregory; Paradiso, Sean (23 March 2016). "Perspective: Materials informatics across the product lifecycle: Selection, manufacturing, and certification". APL Materials. 4 (5): 053207. Bibcode:2016APLM....4e3207M. doi: 10.1063/1.4945422 .
  2. Rickman, J.M.; Lookman, T.; Kalinin, S.V. (15 April 2019). "Materials informatics: From the atomic-level to the continuum". Acta Materialia. 168: 473–510. Bibcode:2019AcMat.168..473R. doi: 10.1016/j.actamat.2019.01.051 . OSTI   1875378. S2CID   127078420.
  3. Frydrych, K.; Karimi, K.; Pecelerowicz, M.; Alvarez, R.; Dominguez-Gutiérrez, F.J.; Rovaris, F.; Papanikolaou, S. (2 October 2021). "Materials Informatics for Mechanical Deformation: A Review of Applications and Challenges". Materials. 14 (19): 5764. Bibcode:2021Mate...14.5764F. doi: 10.3390/ma14195764 . PMC   8510221 . PMID   34640157.
  4. "informaticsresearch.net". Archived from the original on 2007-04-29. Retrieved 2007-03-10.
  5. Papanikolaou, S. (27 May 2019). "Microstructural inelastic fingerprints and data-rich predictions of plasticity and damage in solids". Computational Mechanics. 66: 141–154. arXiv: 1905.11289 . doi:10.1007/s00466-020-01845-x. S2CID   254038042.
  6. Hill, Joanne; Mulholland, Gregory; Persson, Kristin; Seshadri, Ram; Wolverton, Chris; Meredig, Bryce (4 May 2016). "Materials science with large-scale data and informatics: Unlocking new opportunities". MRS Bulletin. 41 (5): 399–409. Bibcode:2016MRSBu..41..399H. doi: 10.1557/mrs.2016.93 .
  7. "Stories of Cells : The American Society for Cell Biology San Francisco(基础医学)".