E-Science

Last updated

E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." [1] In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. [2] E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" [1] and the human digital footprint for the social sciences. [3]

Contents

Turing Award winner Jim Gray imagined "data-intensive science" or "e-science" as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge. [4] [5]

E-Science revolutionizes both fundamental legs of the scientific method: empirical research, especially through digital big data; and scientific theory, especially through computer simulation model building. [6] [7] These ideas were reflected by The White House's Office and Science Technology Policy in February 2013, which slated many of the aforementioned e-Science output products for preservation and access requirements under the memorandum's directive. [8] E-sciences include particle physics, earth sciences and social simulations.

Characteristics and examples

Most of the research activities into e-Science have focused on the development of new computational tools and infrastructures to support scientific discovery. Due to the complexity of the software and the backend infrastructural requirements, e-Science projects usually involve large teams managed and developed by research laboratories, large universities or governments. Currently[ when? ] there is a large focus in e-Science in the United Kingdom, where the UK e-Science programme provides significant funding. In Europe the development of computing capabilities to support the CERN Large Hadron Collider has led to the development of e-Science and Grid infrastructures which are also used by other disciplines.

Consortiums

Example e-Science infrastructures include the Worldwide LHC Computing Grid, a federation with various partners including the European Grid Infrastructure, the Open Science Grid and the Nordic DataGrid Facility.

To support e-Science applications, Open Science Grid combines interfaces to more than 100 nationwide clusters, 50 interfaces to geographically distributed storage caches, and 8 campus grids (Purdue, Wisconsin-Madison, Clemson, Nebraska-Lincoln, FermiGrid at FNAL, SUNY-Buffalo, and Oklahoma in the United States; and UNESP in Brazil). Areas of science benefiting from Open Science Grid include:

UK programme

After his appointment as Director General of the Research Councils in 1999 John Taylor, with the support of the Science Minister David Sainsbury and the Chancellor of the Exchequer Gordon Brown, bid to HM Treasury to fund a programme of e-infrastructure development for science which would provide the foundation for UK science and industry to be a world leader in the knowledge economy which motivated the Lisbon Strategy for sustainable economic growth that the UK government committed to in March 2000.

In November 2000 John Taylor announced £98 million for a national UK e-Science programme. An additional £20 million contribution was planned from UK industry in matching funds to projects that they participated in. From this budget of £120 million over three years, £75 million was to be spent on grid application pilots in all areas of science, administered by the Research Council responsible for each area, while £35 million was to be administered by the EPSRC as a Core Programme to develop "industrial strength" Grid middleware. Phase 2 of the programme for 2004-2006 was supported by a further £96 million for application projects, and £27 million for the EPSRC core programme. Phase 3 of the programme for 2007-2009 was supported by a further £14 million for the EPSRC core programme and a further sum for applications. Additional funding for UK e-Science activities was provided from European Union funding, from university funding council SRIF funding for hardware, and from Jisc for networking and other infrastructure.

The UK e-Science programme comprised a wide range of resources, centres and people including the National e-Science Centre (NeSC) which is managed by the Universities of Glasgow and Edinburgh, with facilities in both cities. [9] Tony Hey led the core programme from 2001 to 2005. [10]

Within the UK regional e-Science centres support their local universities and projects, including:

There are also various centres of excellence and research centres.

In addition to centres, the grid application pilot projects were funded by the Research Council responsible for each area of UK science funding.

The EPSRC funded 11 pilot e-Science projects in three phases (for about £3 million each in the first phase):

The PPARC/STFC funded two projects: GridPP (phase 1 for £17 million, phase 2 for £5.9 million, phase 3 for £30 million and a 4th phase running from 2011 to 2014) and Astrogrid (£14 million over 3 phases).

The remaining £23 million of phase one funding was divided between the application projects funded by BBSRC, MRC and NERC:

The funded UK e-Science programme was reviewed on its completion in 2009 by an international panel led by Daniel E. Atkins, director of the Office of Cyberinfrastructure of the US NSF. The report concluded that the programme had developed a skilled pool of expertise, some services, and had led to cooperation between academia and industry, but that these achievements were at a project level rather than by generating infrastructure or transforming disciplines to adopt e-Science as a normal method of work, and that they were not self-sustainable without further investment.

United States

United States-based initiatives, where the term cyberinfrastructure is typically used to define e-Science projects, are primarily funded by the National Science Foundation office of cyberinfrastructure (NSF OCI) [11] and Department of Energy (in particular the Office of Science). After the conclusion of TeraGrid in 2011, the ACCESS program was established and funded by the National Science Foundation to help researchers and educators, with or without supporting grants, to utilize the nation’s advanced computing systems and services.

The Netherlands

Dutch eScience research is coordinated by the Netherlands eScience Center in Amsterdam, an initiative founded by NWO and SURF.

Europe

Plan-Europe is a Platform of National e-Science/Data Research Centers in Europe, as established during the constituting meeting 29–30 October 2014 in Amsterdam, the Netherlands, and which is based on agreed Terms of Reference. PLAN-E has a kernel group of active members and convenes twice annually. More can be found on PLAN-E.

Sweden

Two academic research projects have been carried out in Sweden by two different groups of universities, to help researches share and access scientific computing resources and knowledge:

Comparison with traditional science

Traditional science is representative of two distinct philosophical traditions within the history of science, but e-Science, it is being argued, requires a paradigm shift, and the addition of a third branch of the sciences. "The idea of open data is not a new one; indeed, when studying the history and philosophy of science, Robert Boyle is credited with stressing the concepts of skepticism, transparency, and reproducibility for independent verification in scholarly publishing in the 1660s. The scientific method later was divided into two major branches, deductive and empirical approaches. Today, a theoretical revision in the scientific method should include a new branch, Victoria Stodden advocate[s], that of the computational approach, where like the other two methods, all of the computational steps by which scientists draw conclusions are revealed. This is because within the last 20 years, people have been grappling with how to handle changes in high performance computing and simulation." [1] As such, e-science aims at combining both empirical and theoretical traditions, [3] while computer simulations can create artificial data, and real-time big data can be used to calibrate theoretical simulation models. [7] Conceptually, e-Science revolves around developing new methods to support scientists in conducting scientific research with the aim of making new scientific discoveries by analyzing vast amounts of data accessible over the internet using vast amounts of computational resources. However, discoveries of value cannot be made simply by providing computational tools, a cyberinfrastructure or by performing a pre-defined set of steps to produce a result. Rather, there needs to be an original, creative aspect to the activity that by its nature cannot be automated. This has led to various research that attempts to define the properties that e-Science platforms should provide in order to support a new paradigm of doing science, and new rules to fulfill the requirements of preserving and making computational data results available in a manner such that they are reproducible in traceable, logical steps, as an intrinsic requirement for the maintenance of modern scientific integrity that allows an extenuation of "Boyle's tradition in the computational age". [1]

Modelling e-Science processes

One view [14] argues that since a modern discovery process instance serves a similar purpose to a mathematical proof it should have similar properties, namely it allows results to be deterministically reproduced when re-executed and that intermediate results can be viewed to aid examination and comprehension. In this case, simply modelling the provenance of data is not sufficient. One has to model the provenance of the hypotheses and results generated from analyzing the data as well so as to provide evidence that support new discoveries. Scientific workflows have thus been proposed and developed to assist scientists to track the evolution of their data, intermediate results and final results as a means to document and track the evolution of discoveries within a piece of scientific research.

Science 2.0

Other views include Science 2.0 where e-Science is considered to be a shift from the publication of final results by well-defined collaborative groups towards a more open approach, which includes the public sharing of raw data, preliminary experimental results, and related information. To facilitate this shift, the Science 2.0 view is on providing tools that simplify communication, cooperation and collaboration between interested parties. Such an approach has the potential to: speed up the process of scientific discovery; overcome problems associated with academic publishing and peer review; and remove time and cost barriers, limiting the process of generating new knowledge.

See also

Related Research Articles

<span class="mw-page-title-main">National Center for Supercomputing Applications</span> Illinois-based applied supercomputing research organization

The National Center for Supercomputing Applications (NCSA) is a state-federal partnership to develop and deploy national-scale cyberinfrastructure that advances research, science and engineering based in the United States. NCSA operates as a unit of the University of Illinois Urbana-Champaign, and provides high-performance computing resources to researchers across the country. Support for NCSA comes from the National Science Foundation, the state of Illinois, the University of Illinois, business and industry partners, and other federal agencies.

Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.

<span class="mw-page-title-main">San Diego Supercomputer Center</span> Supercomputer at UC San Diego.

The San Diego Supercomputer Center (SDSC) is an organized research unit of the University of California, San Diego (UCSD). SDSC is located at the UCSD campus' Eleanor Roosevelt College east end, immediately north the Hopkins Parking Structure.

United States federal research funders use the term cyberinfrastructure to describe research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services distributed over the Internet beyond the scope of a single institution. In scientific usage, cyberinfrastructure is a technological and sociological solution to the problem of efficiently connecting laboratories, data, computers, and people with the goal of enabling derivation of novel scientific theories and knowledge.

<span class="mw-page-title-main">TeraGrid</span>

TeraGrid was an e-Science grid computing infrastructure combining resources at eleven partner sites. The project started in 2001 and operated from 2004 through 2011.

<span class="mw-page-title-main">Charlie Catlett</span> American computer scientist

Charlie Catlett is a senior computer scientist at Argonne National Laboratory and a visiting senior fellow at the Mansueto Institute for Urban Innovation at the University of Chicago. From 2020 to 2022 he was a senior research scientist at the University of Illinois Discovery Partners Institute. He was previously a senior computer scientist at Argonne National Laboratory and a senior fellow in the Computation Institute, a joint institute of Argonne National Laboratory and The University of Chicago, and a senior fellow at the University of Chicago's Harris School of Public Policy.

<span class="mw-page-title-main">European Grid Infrastructure</span> Effort to provide access to high-throughput computing resources across Europe

European Grid Infrastructure (EGI) is a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The EGI links centres in different European countries to support international research in many scientific disciplines. Following a series of research projects such as DataGrid and Enabling Grids for E-sciencE, the EGI Foundation was formed in 2010 to sustain the services of EGI.

The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that is based on comprehensive advanced computing resources and supports services to researchers in Texas and across the U.S. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high performance computing, scientific visualization, data analysis & storage systems, software, research & development and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable the research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.

The George E. Brown, Jr. Network for Earthquake Engineering Simulation (NEES) was created by the National Science Foundation (NSF) to improve infrastructure design and construction practices to prevent or minimize damage during an earthquake or tsunami. Its headquarters were at Purdue University in West Lafayette, Indiana as part of cooperative agreement #CMMI-0927178, and it ran from 2009 till 2014. The mission of NEES is to accelerate improvements in seismic design and performance by serving as a collaboratory for discovery and innovation.

<span class="mw-page-title-main">Renaissance Computing Institute</span>

Renaissance Computing Institute (RENCI) was launched in 2004 as a collaboration involving the State of North Carolina, University of North Carolina at Chapel Hill (UNC-CH), Duke University, and North Carolina State University. RENCI is organizationally structured as a research institute within UNC-CH, and its main campus is located in Chapel Hill, NC, a few miles from the UNC-CH campus. RENCI has engagement centers at UNC-CH, Duke University (Durham), and North Carolina State University (Raleigh).

The National E-Infrastructure Service (NES), formerly the National Grid Service, was an organisation for UK academics and researchers from 2004 through 2011. It was funded by two governmental bodies, Engineering and Physical Sciences Research Council (EPSRC) and the Joint Information Systems Committee (JISC).

nanoHUB

nanoHUB.org is a science and engineering gateway comprising community-contributed resources and geared toward education, professional networking, and interactive simulation tools for nanotechnology. Funded by the United States National Science Foundation (NSF), it is a product of the Network for Computational Nanotechnology (NCN). NCN supports research efforts in nanoelectronics; nanomaterials; nanoelectromechanical systems (NEMS); nanofluidics; nanomedicine, nanobiology; and nanophotonics.

The National Institute for Computational Sciences (NICS) is funded by the National Science Foundation and managed by the University of Tennessee. NICS was home to Kraken, the most powerful computer in the world managed by academia. The NICS petascale scientific computing environment is housed at Oak Ridge National Laboratory (ORNL), home to the world's most powerful computing complex. The mission of NICS, a member of the Extreme Science and Engineering Discovery Environment (XSEDE - formerly TeraGrid), is to enable the scientific discoveries of researchers nationwide by providing leading-edge computational resources, together with support for their effective use, and leveraging extensive partnership opportunities.

Discovery Net is one of the earliest examples of a scientific workflow system allowing users to coordinate the execution of remote services based on Web service and Grid Services standards. The system was designed and implemented at Imperial College London as part of the Discovery Net pilot project funded by the UK e-Science Programme. Many of the concepts pioneered by Discovery Net have been later incorporated into a variety of other scientific workflow systems.

Integrated computational materials engineering (ICME) involves the integration of experimental results, design models, simulations, and other computational data related to a variety of materials used in multiscale engineering and design. Central to the achievement of ICME goals has been the creation of a cyberinfrastructure, a Web-based, collaborative platform which provides the ability to accumulate, organize and disseminate knowledge pertaining to materials science and engineering to facilitate this information being broadly utilized, enhanced, and expanded.

Polish Grid Infrastructure PL-Grid, a nationwide computing structure, built in 2009-2011, under the scientific project PL-Grid – Polish Infrastructure for Supporting Computational Science in the European Research Space. Its purpose was to enable scientific research based on advanced computer simulations and large-scale computations using the computer clusters, and to provide convenient access to the computer resources for research teams, also outside the communities, in which the high performance computing centers operate.

iPlant Collaborative

The iPlant Collaborative, renamed Cyverse in 2017, is a virtual organization created by a cooperative agreement funded by the US National Science Foundation (NSF) to create cyberinfrastructure for the plant sciences (botany). The NSF compared cyberinfrastructure to physical infrastructure, "... the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor". In September 2013 it was announced that the National Science Foundation had renewed iPlant's funding for a second 5-year term with an expansion of scope to all non-human life science research.

A data infrastructure is a digital infrastructure promoting data sharing and consumption.

Data Infrastructure Building Blocks (DIBBs) is a U.S. National Science Foundation program.

Science gateways provide access to advanced resources for science and engineering researchers, educators, and students. Through streamlined, online, user-friendly interfaces, gateways combine a variety of cyberinfrastructure (CI) components in support of a community-specific set of tools, applications, and data collections.: In general, these specialized, shared resources are integrated as a Web portal, mobile app, or a suite of applications. Through science gateways, broad communities of researchers can access diverse resources which can save both time and money for themselves and their institutions. As listed below, functions and resources offered by science gateways include shared equipment and instruments, computational services, advanced software applications, collaboration capabilities, data repositories, and networks.

References

  1. 1 2 3 4 Bohle, S. "What is E-science and How Should it Be Managed?" Nature.com, Spektrum der Wissenschaft (Scientific American), http://www.scilogs.com/scientific_and_medical_libraries/what-is-e-science-and-how-should-it-be-managed/.
  2. IEEE International Conference on eScience, homepage, accessed December 18, 2014, https://escience-conference.org/
  3. 1 2 DT&SC 7-2: Computational Social Science. https://www.youtube.com/watch?v=TEo0Au1brHs From the DT&SC online course at the University of California: https://canvas.instructure.com/courses/949415
  4. Stewart Tansley; Kristin Michele Tolle (2009). The Fourth Paradigm: Data-intensive Scientific Discovery. Microsoft Research. ISBN   978-0-9825442-0-4.
  5. Bell, G.; Hey, T.; Szalay, A. (2009). "COMPUTER SCIENCE: Beyond the Data Deluge". Science. 323 (5919): 1297–1298. doi:10.1126/science.1170411. ISSN   0036-8075. PMID   19265007. S2CID   9743327.
  6. DT&SC 7-1: Introduction to e-Science: https://www.youtube.com/watch?v=9x3d75ZMuYU . From the DT&SC online course at the University of California: https://canvas.instructure.com/courses/949415
  7. 1 2 Hilbert, M. (2015). e-Science for Digital Development: ICT4ICT4D. Centre for Development Informatics, SEED, University of Manchester. "Archived copy" (PDF). Archived from the original (PDF) on 2015-09-24. Retrieved 2015-08-13.{{cite web}}: CS1 maint: archived copy as title (link)
  8. Executive Office of the President, Office of Science and Technology Policy, "Memorandum for the Heads of Executive Departments and Agencies: Increasing Access to the Results of Federally Funded Scientific Research." February 22, 2013, accessed July 7, 2013, https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf.
  9. "National e-Science Centre". official website. Archived from the original on 16 December 2008. Retrieved 29 September 2011.
  10. Richard Poynder (12 December 2006). "A Conversation with Microsoft's Tony Hey". Open and Shut? blog. Retrieved 20 September 2011. It just happens that in the US they chose another name. Personally, I think e-Science is a much better name than cyberinfrastructure. Full transcript Archived March 25, 2012, at the Wayback Machine updated 15 December 2006.
  11. "Office of Cyberinfrastructure (OCI)" . Retrieved 19 September 2011.
  12. "Swedish e-Science Research Center(SeRC)".
  13. "eSSENCE, The e-Science Collaboration".
  14. Syed, J.; Ghanem, M.; Guo, Y. (2007). "Supporting scientific discovery processes in Discovery Net". Concurrency and Computation: Practice and Experience. 19 (2): 167. doi:10.1002/cpe.1049. S2CID   16212949.