Ecoinformatics

Last updated

Ecoinformatics, or ecological informatics, is the science of information in ecology and environmental science. It integrates environmental and information sciences to define entities and natural processes with language common to both humans and computers. However, this is a rapidly developing area in ecology and there are alternative perspectives on what constitutes ecoinformatics.

Contents

A few definitions have been circulating, mostly centered on the creation of tools to access and analyze natural system data. However, the scope and aims of ecoinformatics are certainly broader than the development of metadata standards to be used in documenting datasets. Ecoinformatics aims to facilitate environmental research and management by developing ways to access, integrate databases of environmental information, and develop new algorithms enabling different environmental datasets to be combined to test ecological hypotheses. Ecoinformatics is related to the concept of ecosystem services. [1]

Ecoinformatics characterize the semantics of natural system knowledge. For this reason, much of today's ecoinformatics research relates to the branch of computer science known as knowledge representation, and active ecoinformatics projects are developing links to activities such as the Semantic Web.

Current initiatives to effectively manage, share, and reuse ecological data are indicative of the increasing importance of fields like ecoinformatics to develop the foundations for effectively managing ecological information. Examples of these initiatives are National Science Foundation Datanet projects, DataONE, Data Conservancy, and Artificial Intelligence for Environment & Sustainability. [1]

Software Development Lifecycle

Central to the concept of ecoinformatics is the Software Development Lifecycle (SDLC), a systematic framework for writing, implementing, and maintaining software products. Typically in Ecoinformatics projects, the development pipeline includes data collection, usually from several different environmental data sources, then integrating these data sources together, and then analyzing the data. Here, each step of the SDLC is described in the context of ecoinformatics, per Michener et al. [2] It is important to note that the plan, collect, assure, describes and preserve steps refer to the data collection entity, which can be individual researchers or large data-collection networks, while the discover, integrate, and analyze steps typically refer to the individual researcher.

Plan: Ecoinformatics projects require data from several databases. Each database holds different data, and therefore researchers should identify what types of environmental or ecological data they will need to answer their research question.

Collect: Data is collected in several different ways. In ecoinformatics, this is usually restricted to manually entering data into a spreadsheet, and parsing data from an existing database. The growth of relational databases has made it easier for ecologists to download relevant data and integrate datasets together

Assure: Data entries should be checked thoroughly to validate their accuracy and usability, such as to check for outliers and erroneous points. The same principle applies to data downloaded from datasets. This responsibility falls on both the ecologist downloading the data, and the entity that sets up the data collection system.

Describe: An accurate description of the metadata of a dataset that is used in a study should include enough information to deduce the data collection and processing methodology, when the data were collected, why the data were collected, and how the data were stored. This is important for reproducibility, especially for projects that build on each other and may recycle data

Preserve: After data is collected by an institutional entity, it should be archived such that it is easily accessible. Ideally, this is in databases that are maintained and not at risk of deprecation

Discover: While there are good practices for discovering data to start a research project, this process is often marred by a lack of usable, published data, as researchers may collect data specific to their study, but may not publish this data for wider use. On the data collection end, this can be addressed by better data-sharing practices, such as by linking datasets when publishing papers or studies. On the data procurement end, this can be addressed by more precise data searching, such as using key words to find relevant datasets.

Integrate: Synthesizing datasets together can be difficult and labor-intensive, largely due to the methodological differences in data collection. There are several approaches to this, but the best practices typically involve computational approaches, namely using R or Python, to automate the processes and prevent errors

Analyze: Data analysis can take several forms, and should be tailored to the specific ecological project. However, all data analysis methods should be well-documented, including the procedure for analysis, justification for analysis methods, and any shortcomings in a specific approach.

Applications of Ecoinformatics Across Ecology

Ecosystem Ecology [3]

Ecosystem studies, by definition, encompass interactions across the entire life sciences spectrum, from microscopic biochemical reactions to large-scale geological phenomena. As a result, big databases may not be designed specifically for any particular research question, but should be inclusive enough to support most studies. Since ecosystem-level questions require a broad perspective, data-related ecosystem projects would likely incorporate data from several databases.

A common framework for incorporating data into ecosystem-level studies is the network science model, in which data collection mechanisms and resources are treated like a large, interconnected network instead of individual entities. The network may include several data collection stations within one databases, or may span across multiple databases. Currently there are several large-scale networks, but they do not generate data on the scale to consider ecology as a big data science.

A current challenge for ecoinformatics in ecosystem ecology is that most funding is prioritized for generating new data rather than maintaining existing data infrastructures. Integrating data across the different spatial scales can also be difficult, since each dataset may hold different types of data.

Urban Ecology [4]

The current push for smart cities, and sensor network integration into infrastructure, has positioned as a major source of data for ecological studies. Typical urban ecology questions address the effects of urbanization on the local ecosystem, and how to drive future development to promote urban biodiversity.

While sensor networks in cities typically collect environmental data to optimize city processes, they may also be used for ecological initiatives, especially with respect to understanding the complex, multi-layered relationship between cities and their local ecosystem. It can also be used to better understand the current landscape of cities, and identify avenues for rewinding of cities. For example, analyzing mobility patterns can identify areas that may lend themselves well to building parks and green spaces. Bird watching data can also be used to identify the types of bird species in a local area.

Infectious Disease [5]

Like other disciplines of ecology, emerging infectious disease and epidemiology span multiple scales, from understanding the genetics that drive disease trends to large-scale spatiotemporal analyses. As a result, infectious disease studies can incorporate everything from bioinformatics, genetic sequences, amino acid sequences, and environmental observation data.

On the micro-scale, these data can then be used to predict infectivity/transmissibility, drug resistance, drug candidates, and mutation sites. On the macro-scale, it can be used to identify societal trends or environmental factors that lend themselves to spillover, locations of infection, and practices that cause disease transmission.

Databases [6]

Related Research Articles

<span class="mw-page-title-main">Bioinformatics</span> Computational analysis of large, complex sets of biological data

Bioinformatics is an interdisciplinary field of science that develops methods and software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The subsequent process of analyzing and interpreting data is referred to as computational biology.

<span class="mw-page-title-main">Computational biology</span> Branch of biology

Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has foundations in applied mathematics, chemistry, and genetics. It differs from biological computing, a subfield of computer science and engineering which uses bioengineering to build computers.

<span class="mw-page-title-main">Landscape ecology</span> Science of relationships between ecological processes in the environment and particular ecosystems

Landscape ecology is the science of studying and improving relationships between ecological processes in the environment and particular ecosystems. This is done within a variety of landscape scales, development spatial patterns, and organizational levels of research and policy. Concisely, landscape ecology can be described as the science of "landscape diversity" as the synergetic result of biodiversity and geodiversity.

Howard Thomas Odum, usually cited as H. T. Odum, was an American ecologist. He is known for his pioneering work on ecosystem ecology, and for his provocative proposals for additional laws of thermodynamics, informed by his work on general systems theory.

<span class="mw-page-title-main">Spatial analysis</span> Formal techniques which study entities using their topological, geometric, or geographic properties

Spatial analysis is any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques using different analytic approaches, especially spatial statistics. It may be applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, or to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is geospatial analysis, the technique applied to structures at the human scale, most notably in the analysis of geographic data. It may also be applied to genomics, as in transcriptomics data.

<span class="mw-page-title-main">Systems ecology</span> Holistic approach to the study of ecological systems

Systems ecology is an interdisciplinary field of ecology, a subset of Earth system science, that takes a holistic approach to the study of ecological systems, especially ecosystems. Systems ecology can be seen as an application of general systems theory to ecology. Central to the systems ecology approach is the idea that an ecosystem is a complex system exhibiting emergent properties. Systems ecology focuses on interactions and transactions within and between biological and ecological systems, and is especially concerned with the way the functioning of ecosystems can be influenced by human interventions. It uses and extends concepts from thermodynamics and develops other macroscopic descriptions of complex systems.

<span class="mw-page-title-main">KEGG</span> Collection of bioinformatics databases

KEGG is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. KEGG is utilized for bioinformatics research and education, including data analysis in genomics, metagenomics, metabolomics and other omics studies, modeling and simulation in systems biology, and translational research in drug development.

Biodiversity informatics is the application of informatics techniques to biodiversity information, such as taxonomy, biogeography or ecology. It is defined as the application of Information technology technologies to management, algorithmic exploration, analysis and interpretation of primary data regarding life, particularly at the species level organization. Modern computer techniques can yield new ways to view and analyze existing information, as well as predict future situations. Biodiversity informatics is a term that was only coined around 1992 but with rapidly increasing data sets has become useful in numerous studies and applications, such as the construction of taxonomic databases or geographic information systems. Biodiversity informatics contrasts with "bioinformatics", which is often used synonymously with the computerized handling of data in the specialized area of molecular biology.

<span class="mw-page-title-main">National Ecological Observatory Network</span> Organization providing ecological data in the United States

National Ecological Observatory Network (NEON) is a large facility program operated by Battelle Memorial Institute and funded by the National Science Foundation. In full operation since 2019, NEON gathers and provides long-term, standardized data on ecological responses of the biosphere to changes in land use and climate, and on feedback with the geosphere, hydrosphere, and atmosphere. NEON is a continental-scale research platform for understanding how and why our ecosystems are changing.

The National Center for Ecological Analysis and Synthesis (NCEAS) is a research center at the University of California, Santa Barbara, in Santa Barbara, California. Better known by its acronym, NCEAS (pronounced “n-seas”) opened in May 1995. Funding for NCEAS is diverse and includes supporters such as the U.S. National Science Foundation, the State of California, and the University of California, Santa Barbara.

<span class="mw-page-title-main">Biological network</span> Method of representing systems

A biological network is a method of representing systems as complex sets of binary interactions or relations between various biological entities. In general, networks or graphs are used to capture relationships between entities or objects. A typical graphing representation consists of a set of nodes connected by edges.

Environmental informatics is the science of information applied to environmental science. As such, it provides the information processing and communication infrastructure to the interdisciplinary field of environmental sciences aiming at data, information and knowledge integration, the application of computational intelligence to environmental data as well as the identification of environmental impacts of information technology. The UK Natural Environment Research Council defines environmental informatics as the "research and system development focusing on the environmental sciences relating to the creation, collection, storage, processing, modelling, interpretation, display and dissemination of data and information." Kostas Karatzas defined environmental informatics as the "creation of a new 'knowledge-paradigm' towards serving environmental management needs." Karatzas argued further that environmental informatics "is an integrator of science, methods and techniques and not just the result of using information and software technology methods and tools for serving environmental engineering needs."

Forest informatics is the combined science of forestry and informatics, with a special emphasis on collection, management, and processing of data, information and knowledge, and the incorporation of informatic concepts and theories specific to enrich forest management and forest science; it has a similar relationship to library science and information science.

A social-ecological system consists of 'a bio-geo-physical' unit and its associated social actors and institutions. Social-ecological systems are complex and adaptive and delimited by spatial or functional boundaries surrounding particular ecosystems and their context problems.

Translational bioinformatics (TBI) is a field that emerged in the 2010s to study health informatics, focused on the convergence of molecular bioinformatics, biostatistics, statistical genetics and clinical informatics. Its focus is on applying informatics methodology to the increasing amount of biomedical and genomic data to formulate knowledge and medical tools, which can be utilized by scientists, clinicians, and patients. Furthermore, it involves applying biomedical research to improve human health through the use of computer-based information system. TBI employs data mining and analyzing biomedical informatics in order to generate clinical knowledge for application. Clinical knowledge includes finding similarities in patient populations, interpreting biological information to suggest therapy treatments and predict health outcomes.

In bioinformatics, a Gene Disease Database is a systematized collection of data, typically structured to model aspects of reality, in a way to comprehend the underlying mechanisms of complex diseases, by understanding multiple composite interactions between phenotype-genotype relationships and gene-disease mechanisms. Gene Disease Databases integrate human gene-disease associations from various expert curated databases and text mining derived associations including Mendelian, complex and environmental diseases.

DisGeNET is a discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET is one of the largest and comprehensive repositories of human gene-disease associations (GDAs) currently available. It also offers a set of bioinformatic tools to facilitate the analysis of these data by different user profiles. It is maintained by the Integrative Biomedical Informatics (IBI) GroupArchived 2016-11-26 at the Wayback Machine, of the (GRIB)-IMIM/UPF, based at the Barcelona Biomedical Research Park (PRBB), Barcelona, Spain.

<span class="mw-page-title-main">Population informatics</span>

The field of population informatics is the systematic study of populations via secondary analysis of massive data collections about people. Scientists in the field refer to this massive data collection as the social genome, denoting the collective digital footprint of our society. Population informatics applies data science to social genome data to answer fundamental questions about human society and population health much like bioinformatics applies data science to human genome data to answer questions about individual health. It is an emerging research area at the intersection of SBEH sciences, computer science, and statistics in which quantitative methods and computational tools are used to answer fundamental questions about our society. [[File:Data science.png|alt=Data Science|thumb|Data Science]

The Landscape Conservation Cooperatives (LCC), established in 2009 in the United States, are a network of 22 regional conservation bodies covering the entire United States and adjacent areas. They are autonomous cooperatives sponsored by the U.S. Department of the Interior and aim to develop coordinated conservation strategies applicable to large areas of land. Partnerships are formed with government and non-government conservation organizations to achieve common goals of conservation. While fairly new as government supported entities, the LCCs are similar to initiatives that have been started or advocated in other countries.

In science, the concept of a macroscope is the antithesis of the microscope, namely a method, technique or system appropriate to the study of very large objects or very complex processes, for example the Earth and its contents, or conceptually, the Universe. Obviously, a single system or instrument does not presently exist that could fulfil this function, however its concept may be approached by some current or future combination of existing observational systems. The term "macroscope" has also been applied to a method or compendium which can view some more specific aspect of global scientific phenomena in its entirety, such as all plant life, specific ecological processes, or all life on earth. The term has also been used in the humanities, as a generic label for tools which permit an overview of various other forms of "big data". As discussed here, the concept of a "macroscope" differs in essence from that of the macroscopic scale, which simply takes over from where the microscopic scale leaves off, covering all objects large enough to be visible to the unaided eye, as well as from macro photography, which is the imaging of specimens at magnifications greater than their original size, and for which a specialised microscope-related instrument known as a "Macroscope" has previously been marketed. For some workers, one or more "macroscopes" can already be constructed, to access the sum of relevant existing observations, while for others, deficiencies in current sampling regimes and/or data availability point to additional sampling effort and deployment of new methodologies being required before a true "macroscope" view of Earth can be obtained.

References

  1. 1 2 Villa, Ferdinando; Ceroni, Marta; Bagstad, Ken; Johnson, Gary; Krivov, Sergey (2009-01-01). "ARIES (ARtificial Intelligence for Ecosystem Services): A new tool for ecosystem services assessment, planning, and valuation". ResearchGate. Retrieved 2022-01-23.
  2. Michener, William K.; Jones, Matthew B. (February 2012). "Ecoinformatics: supporting ecology as a data-intensive science". Trends in Ecology & Evolution. 27 (2): 85–93. doi:10.1016/j.tree.2011.11.016. ISSN   0169-5347. PMID   22240191. S2CID   12268743.
  3. LaDeau, S. L.; Han, B. A.; Rosi-Marshall, E. J.; Weathers, K. C. (2017-03-01). "The Next Decade of Big Data in Ecosystem Science". Ecosystems. 20 (2): 274–283. Bibcode:2017Ecosy..20..274L. doi: 10.1007/s10021-016-0075-y . ISSN   1435-0629.
  4. Yang, Jun (2020-10-01). "Big data and the future of urban ecology: From the concept to results". Science China Earth Sciences. 63 (10): 1443–1456. Bibcode:2020ScChD..63.1443Y. doi:10.1007/s11430-020-9666-3. ISSN   1869-1897. S2CID   221285047.
  5. Kasson, Peter M. (2020-07-20). "Infectious Disease Research in the Era of Big Data". Annual Review of Biomedical Data Science. 3 (1): 43–59. doi: 10.1146/annurev-biodatasci-121219-025722 . ISSN   2574-3414.
  6. Farley, Scott S; Dawson, Andria; Goring, Simon J; Williams, John W (2018-07-18). "Situating Ecology as a Big-Data Science: Current Advances, Challenges, and Solutions". BioScience. 68 (8): 563–576. doi: 10.1093/biosci/biy068 . ISSN   0006-3568.