Science gateway

Last updated

Science gateways provide access to advanced resources for science and engineering researchers, educators, and students. Through streamlined, online, user-friendly interfaces, gateways combine a variety of cyberinfrastructure (CI) components in support of a community-specific set of tools, applications, and data collections.: [1] In general, these specialized, shared resources are integrated as a Web portal, mobile app, or a suite of applications. [2] Through science gateways, broad communities of researchers can access diverse resources which can save both time and money for themselves and their institutions. [3] As listed below, functions and resources offered by science gateways include shared equipment and instruments, computational services, advanced software applications, collaboration capabilities, data repositories, and networks. [2]

Contents

shared equipment and instruments

computational services

advanced software applications

collaboration capabilities

data repositories

networks

History

For decades, science gateways existed in various forms that would not have been called science gateways at the time, but in the last decade, more projects have coalesced around the term. For example, the Protein Data Bank: [4] started in 1971 and continues to provide a crucial service for its community.

Science gateways are often labeled with other names, depending on the community or region of the world. Alternative names include [5]

Some of the earliest gateways to provide simplified interfaces to high-performance grid computing used the US-based TeraGrid computing infrastructure, funded by the National Science Foundation. TeraGrid (now continued under the auspices of the Extreme Science and Engineering Discovery Environment, or XSEDE) brought together a diverse community of software developers, who were otherwise isolated from each other by disparate fields of application. In Australia, e-Research body NeCTAR provides similar resources and support for gateways. With the growth of this community in the US, Europe, and Australasia, several workshop series helped gateway creators and users coalesce around the concept and form a community of practice: Gateway Computing Environments (US, started in 2005), International Workshop on Science Gateways (Europe, started in 2009), and International Workshop on Science Gateways - Australia (started in 2016).

Additionally, middleware to support gateways have proliferated, including:

As some of the earliest gateways (and other digital resources) reached the end of their first funding cycle, users began to see some gateways shutting down due to lack of funding or insufficient progress building a sustainable tool; this prompted further investigation into the keys to sustainability for such projects. [13] [14] Recognizing a need for software reuse and community exchange, the US National Science Foundation funded in 2016 the Science Gateways Community Institute, [2] which provides subsidized services and resources to the developers and users of science gateways. On an international level, the International Coalition on Science Gateways brings together organizations from multiple countries and continents to share best practices and future directions in the field.

See also

Related Research Articles

Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.

E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.

United States federal research funders use the term cyberinfrastructure to describe research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services distributed over the Internet beyond the scope of a single institution. In scientific usage, cyberinfrastructure is a technological and sociological solution to the problem of efficiently connecting laboratories, data, computers, and people with the goal of enabling derivation of novel scientific theories and knowledge.

TeraGrid

TeraGrid was an e-Science grid computing infrastructure combining resources at eleven partner sites. The project started in 2001 and operated from 2004 through 2011.

NorduGrid

NorduGrid is a collaboration aiming at development, maintenance and support of the free Grid middleware, known as the Advanced Resource Connector (ARC).

The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that provides comprehensive advanced computing resources and support services to researchers in Texas and across the USA. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high performance computing, scientific visualization, data analysis & storage systems, software, research & development and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable computational research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.

nanoHUB

nanoHUB.org is a science and engineering gateway comprising community-contributed resources and geared toward education, professional networking, and interactive simulation tools for nanotechnology. Funded by the United States National Science Foundation (NSF), it is a product of the Network for Computational Nanotechnology (NCN). NCN supports research efforts in nanoelectronics; nanomaterials; nanoelectromechanical systems (NEMS); nanofluidics; nanomedicine, nanobiology; and nanophotonics.

Cloud computing Form of Internet-based computing that provides shared processing resources and data to computers and other devices on demand

Cloud computing is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. The term is generally used to describe data centers available to many users over the Internet. Large clouds, predominant today, often have functions distributed over multiple locations from central servers. If the connection to the user is relatively close, it may be designated an edge server.

The Open Science Grid Consortium is an organization that administers a worldwide grid of technological resources called the Open Science Grid, which facilitates distributed computing for scientific research. Founded in 2004, the consortium is composed of service and resource providers, researchers from universities and national laboratories, as well as computing centers across the United States. Members independently own and manage the resources which make up the distributed facility, and consortium agreements provide the framework for technological and organizational integration.

Techila Distributed Computing Engine is a commercial grid computing software product. It speeds up simulation, analysis and other computational applications by enabling scalability across the IT resources in user's on-premises data center and in the user's own cloud account. Techila Distributed Computing Engine is developed and licensed by Techila Technologies Ltd, a privately held company headquartered in Tampere, Finland. The product is also available as an on-demand solution in Google Cloud Launcher, the online marketplace created and operated by Google. According to IDC, the solution enables organizations to create HPC infrastructure without the major capital investments and operating expenses required by new HPC hardware.

MTA SZTAKI Laboratory of Parallel and Distributed Systems

The Laboratory of Parallel and Distributed Systems (LPDS), as a department of MTA SZTAKI, is a research laboratory in distributed grid and cloud technologies. LPDS is a founding member of the Hungarian Grid Competence Centre, the Hungarian National Grid Initiative and the Hungarian OpenNebula Community and also coordinates several European grid/cloud projects.

gUSE

The Grid and Cloud User Support Environment (gUSE), also known as WS-PGRADE /gUSE, is an open source science gateway framework that enables users to access grid and cloud infrastructures. gUSE is developed by the Laboratory of Parallel and Distributed Systems (LPDS) at Institute for Computer Science and Control (SZTAKI) of the Hungarian Academy of Sciences.

A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application.

iPlant Collaborative

The iPlant Collaborative, recently renamed Cyverse, is a virtual organization created by a cooperative agreement funded by the US National Science Foundation (NSF) to create cyberinfrastructure for the plant sciences (botany). The NSF compared cyberinfrastructure to physical infrastructure, "... the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor". In September 2013 it was announced that the National Science Foundation had renewed iPlant's funding for a second 5-year term with an expansion of scope to all non-human life science research.

Airavata is a software suite to compose, manage, execute, and monitor large scale applications and workflows on computational resources ranging from local clusters to national grids and computing clouds. Airavata consists of four components:

  1. a workflow suite, enabling a user to compose and monitor workflows. These can be run on an Apache environment or exported to other workflow programming languages such as BPEL and Java.
  2. an application wrapper service to convert command line programs into services that can be used reliably on a network.
  3. a registry service that records how workflows and wrapped programs have been deployed.
  4. a message broking service to enable communication over possibly unreliable networks to clients behind organisations' firewalls.
Francine Berman American computer scientist

Francine Berman is an American computer scientist, and a leader in digital data preservation and cyber-infrastructure. In 2009, she was the inaugural recipient of the IEEE/ACM-CS Ken Kennedy Award "for her influential leadership in the design, development and deployment of national-scale cyberinfrastructure, her inspiring work as a teacher and mentor, and her exemplary service to the high performance community". In 2004, Business Week called her the "reigning teraflop queen".

HUBzero is an open source software platform for building websites that support scientific activities.

OnlineHPC

The OnlineHPC was a free public web service that supplied tools to deal with high performance computers and online workflow editor. OnlineHPC allowed users to design and execute workflows using the online workflow designer and to work with high performance computers – clusters and clouds. Access to high performance resources was available as directly from the service user interface, as from workflow components. The workflow engine of the OnlineHPC service was Taverna as traditionally used for scientific workflow execution in such domains, as bioinformatics, cheminformatics, medicine, astronomy, social science, music, and digital preservation.

The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data, adverse events, metagenomic data, etc. Currently it is supported and continuously developed by US Food and Drug Administration, George Washington University, and by DNA-HIVE, WHISE-Global and Embleema. HIVE currently operates fully functionally within the US FDA supporting wide variety (+60) of regulatory research and regulatory review projects as well as for supporting MDEpiNet medical device postmarket registries. Academic deployments of HIVE are used for research activities and publications in NGS analytics, cancer research, microbiome research and in educational programs for students at GWU. Commercial enterprises use HIVE for oncology, microbiology, vaccine manufacturing, gene editing, healthcare-IT, harmonization of real-world data, in preclinical research and clinical studies.

CyberGIS, or cyber geographic information science and systems, is an interdisciplinary field combining cyberinfrastructure, e-science, and geographic information science and systems (GIS). CyberGIS has a particular focus on computational and data-intensive geospatial problem-solving within various research and education domains. The need for GIS has extended beyond traditional forms of geographic analysis and study, which includes adapting to new sources and kinds of data, high-performance computing resources, and online platforms based on existing and emerging information networks. The name cyberGIS first appeared in Geographic Information Science literature in 2010. CyberGIS is characterized as digital geospatial ecosystems. These systems are developed and have evolved through heterogeneous computing environments, as well as human communication and information environments. CyberGIS can be considered a new generation of geographic information systems (GIS). These systems are based on advanced computing and information infrastructure, which analyze and model geospatial data, providing computationally intensive spatial analysis, modeling, and collaborative geospatial problem-solving at previously unprecedented scales.

References

  1. Wilkins‐Diehr, Nancy. "Special issue: science gateways—common community interfaces to grid resources." Concurrency and Computation: Practice and Experience 19, no. 6 (2007): 743-749.
  2. 1 2 3 Lawrence, Katherine A., Michael Zentner, Nancy Wilkins‐Diehr, Julie A. Wernert, Marlon Pierce, Suresh Marru, and Scott Michael. "Science gateways today and tomorrow: positive perspectives of nearly 5000 members of the research community," Concurrency and Computation: Practice and Experience 27, No. 16 (2015): 4252-4268.
  3. Kiss, Tamas. "Science gateways for the broader take-up of distributed computing infrastructures." Journal of Grid Computing (2012): 1-2
  4. Berman, H. M. (January 2008). "The Protein Data Bank: a historical perspective" (PDF). Acta Crystallographica Section A. A64 (1): 88–95. PMID   18156675. doi:10.1107/S0108767307035623
  5. Shahand, S. (2015). Science gateways for biomedical big data analysis (Doctoral dissertation). Retrieved from University of Amsterdam Digital Academic Repository
  6. Pierce, Marlon E., Suresh Marru, Lahiru Gunathilake, Don Kushan Wijeratne, Raminder Singh, Chathuri Wimalasena, Shameera Ratnayaka, and Sudhakar Pamidighantam. "Apache Airavata: design and directions of a science gateway framework." Concurrency and Computation: Practice and Experience 27, no. 16 (2015): 4282-4291. http://onlinelibrary.wiley.com/doi/10.1002/cpe.3534/full
  7. Wang, Y., Kaplan, N., Newman, G., & Scarpino, R. "CitSci.org: A New Model for Managing, Documenting, and Sharing Citizen Science Data," PLoS Biology, vol. 13, (10), 2015. DOI: https://dx.doi.org/10.1371/journal.pbio.1002280.
  8. Merchant, Nirav, et al., "The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences," PLOS Biology (2016), doi: 10.1371/journal.pbio.1002342.
  9. Goff, Stephen A., et al., "The iPlant Collaborative: Cyberinfrastructure for Plant BIology," Frontiers in Plant Science 2 (2011), doi: 10.3389/fpls.2011.00034.
  10. Enis Afgan, Dannon Baker, Marius van den Beek, Daniel Blankenberg, Dave Bouvier, Martin Čech, John Chilton, Dave Clements, Nate Coraor, Carl Eberhard, Björn Grüning, Aysam Guerler, Jennifer Hillman-Jackson, Greg Von Kuster, Eric Rasche, Nicola Soranzo, Nitesh Turaga, James Taylor, Anton Nekrutenko, and Jeremy Goecks. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Research (2016) 44(W1): W3-W10 doi:10.1093/nar/gkw343
  11. Kacsuk, Péter (Ed.) 2014. Science Gateways for Distributed Computing Infrastructures: Development Framework and Exploitation by Scientific User Communities, Switzerland: Springer International.
  12. M. McLennan, R. Kennell, "HUBzero: A Platform for Dissemination and Collaboration in Computational Science and Engineering," Computing in Science and Engineering, 12(2), pp. 48-52, March/April, 2010.
  13. Maron, Nancy L., K. Kirby Smith, and Matthew Loy. "Sustaining Digital Resources: An On-the-Ground View of Projects Today." Ithaka S+R. Last Modified 14 July 2009. https://doi.org/10.18665/sr.22408.
  14. Wilkins-Diehr, N. and K. A. Lawrence (2010) “Opening Science Gateways to Future Success: The Challenges of Gateway Sustainability,” Gateway Computing Environments Workshop (GCE), 2010, pp.1-10; November 14, 2010. IEEE Computer Society (Xplore Digital Library). doi: 10.1109/GCE.2010.5676121.