HUBzero is an open source software platform for building websites that support scientific activities. [1]
HUBzero was created by researchers at Purdue University in conjunction with the NSF-sponsored Network for Computational Nanotechnology. It was based on the Purdue University Network Computing Hubs (PUNCH) project that had begun in the 1990s under Mark Lundstrom, Josef Fortes, and Nirav Kapadia. [2]
HUBzero allows individuals to create web sites that connect a community in scientific research and educational activities. HUBzero sites combine Web 2.0 concepts with middleware that provides access to interactive simulation tools including access to TeraGrid, [3] the Open Science Grid, and other national grid computing resources.
The software later became supported by a consortium and used for some other projects. [4] [5] HUBzero is released under various open source licenses. [6]
HUBzero provides free preconfigured virtual machines images that contain the full version of the HUBzero platform. [7] The HUBzero Essential instance is also available through Amazon Web Services. [7] HUBzero also offers two paid services, the HUBzero Foundation [8] and a No Hassle Hosting service. [9] The HUBzero Foundation is a community-based, non-profit organization that promotes the use of HUBzero and ensures ongoing sustainability of the core software. The HUBzero No Hassle Hosting service offers hosting solutions for other non-profit institutions. Through No Hassle Hosting, sites based on the HUBzero platform are maintained and supported by the HUBzero Development Team at Purdue University with better than 99% uptime.
The website is built from open-source software: the Linux operating system, the Apache web server, the MySQL database, the Joomla content management system, and the PHP web scripting language. The HUBzero software allows individuals to access simulation tools and share information. Sites using the hub infrastructure are standardized with the following modules:
Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.
The Berkeley Open Infrastructure for Network Computing is an open-source middleware system for volunteer computing and grid computing. Originally developed to support the SETI@home project, it became generalized as a platform for other distributed applications in areas as diverse as mathematics, linguistics, medicine, molecular biology, climatology, environmental science, and astrophysics, among others. BOINC aims to enable researchers to tap into the enormous processing resources of many personal computers around the world.
Wolfram Research, Inc. is an American multinational company that creates computational technology. Wolfram's flagship product is the technical computing program Wolfram Mathematica, first released on June 23, 1988. Other products include WolframAlpha, Wolfram SystemModeler, Wolfram Workbench, gridMathematica, Wolfram Finance Platform, webMathematica, the Wolfram Cloud, and the Wolfram Programming Lab. Wolfram Research founder Stephen Wolfram is the CEO. The company is headquartered in Champaign, Illinois, United States.
The San Diego Supercomputer Center (SDSC) is an organized research unit of the University of California, San Diego (UCSD). SDSC is located at the UCSD campus' Eleanor Roosevelt College east end, immediately north the Hopkins Parking Structure.
E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.
United States federal research funders use the term cyberinfrastructure to describe research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services distributed over the Internet beyond the scope of a single institution. In scientific usage, cyberinfrastructure is a technological and sociological solution to the problem of efficiently connecting laboratories, data, computers, and people with the goal of enabling derivation of novel scientific theories and knowledge.
TeraGrid was an e-Science grid computing infrastructure combining resources at eleven partner sites. The project started in 2001 and operated from 2004 through 2011.
The D-Grid Initiative was a government project to fund computer infrastructure for education and research (e-Science) in Germany. It uses the term grid computing. D-Grid started September 1, 2005 with six community projects and an integration project (DGI) as well as several partner projects.
GARUDA(Global Access to Resource Using Distributed Architecture) is India's Grid Computing initiative connecting 17 cities across the country. The 45 participating institutes in this nationwide project include all the IITs and C-DAC centers and other major institutes in India.
The George E. Brown, Jr. Network for Earthquake Engineering Simulation (NEES) was created by the National Science Foundation (NSF) to improve infrastructure design and construction practices to prevent or minimize damage during an earthquake or tsunami. Its headquarters were at Purdue University in West Lafayette, Indiana as part of cooperative agreement #CMMI-0927178, and it ran from 2009 till 2014. The mission of NEES is to accelerate improvements in seismic design and performance by serving as a collaboratory for discovery and innovation.
Kepler is a free software system for designing, executing, reusing, evolving, archiving, and sharing scientific workflows. Kepler's facilities provide process and data monitoring, provenance information, and high-speed data movement. Workflows in general, and scientific workflows in particular, are directed graphs where the nodes represent discrete computational components, and the edges represent paths along which data and results can flow between components. In Kepler, the nodes are called 'Actors' and the edges are called 'channels'. Kepler includes a graphical user interface for composing workflows in a desktop environment, a runtime engine for executing workflows within the GUI and independently from a command-line, and a distributed computing option that allows workflow tasks to be distributed among compute nodes in a computer cluster or computing grid. The Kepler system principally targets the use of a workflow metaphor for organizing computational tasks that are directed towards particular scientific analysis and modeling goals. Thus, Kepler scientific workflows generally model the flow of data from one step to another in a series of computations that achieve some scientific goal.
nanoHUB.org is a science and engineering gateway comprising community-contributed resources and geared toward education, professional networking, and interactive simulation tools for nanotechnology. Funded by the United States National Science Foundation (NSF), it is a product of the Network for Computational Nanotechnology (NCN). NCN supports research efforts in nanoelectronics; nanomaterials; nanoelectromechanical systems (NEMS); nanofluidics; nanomedicine, nanobiology; and nanophotonics.
Rmetrics is a free, open-source and open development software project for teaching computational finance. Rmetrics is based primarily on the statistical R programming language, but does contain contributions in other programming languages, Fortran, C, and C++. The project was started in 2001 by Diethelm Wuertz, based at the Swiss Federal Institute of Technology in Zurich.
Polish Grid Infrastructure PL-Grid, a nationwide computing infrastructure, built in 2009-2011, under the scientific project PL-Grid - Polish Infrastructure for Supporting Computational Science in the European Research Space. Its purpose was to enable scientific research based on advanced computer simulations and large-scale computations using the computer clusters, and to provide convenient access to the computer resources for research teams, also outside the communities, in which the High Performance Computing centers operate.
The iPlant Collaborative, renamed Cyverse in 2017, is a virtual organization created by a cooperative agreement funded by the US National Science Foundation (NSF) to create cyberinfrastructure for the plant sciences (botany). The NSF compared cyberinfrastructure to physical infrastructure, "... the distributed computer, information and communication technologies combined with the personnel and integrating components that provide a long-term platform to empower the modern scientific research endeavor". In September 2013 it was announced that the National Science Foundation had renewed iPlant's funding for a second 5-year term with an expansion of scope to all non-human life science research.
The European Middleware Initiative (EMI) is a computer software platform for high performance distributed computing. It is developed and distributed directly by the EMI project. It is the base for other grid middleware distributions used by scientific research communities and distributed computing infrastructures all over the world especially in Europe, South America and Asia. EMI supports broad scientific experiments and initiatives, such as the Worldwide LHC Computing Grid.
The Neuroimaging Tools and Resources Collaboratory is a neuroimaging informatics knowledge environment for MR, PET/SPECT, CT, EEG/MEG, optical imaging, clinical neuroinformatics, imaging genomics, and computational neuroscience tools and resources.
Science gateways provide access to advanced resources for science and engineering researchers, educators, and students. Through streamlined, online, user-friendly interfaces, gateways combine a variety of cyberinfrastructure (CI) components in support of a community-specific set of tools, applications, and data collections.: In general, these specialized, shared resources are integrated as a Web portal, mobile app, or a suite of applications. Through science gateways, broad communities of researchers can access diverse resources which can save both time and money for themselves and their institutions. As listed below, functions and resources offered by science gateways include shared equipment and instruments, computational services, advanced software applications, collaboration capabilities, data repositories, and networks.
Project Jupyter is a project and community whose goal is to "develop open-source software, open-standards, and services for interactive computing across dozens of programming languages". It was spun off from IPython in 2014 by Fernando Pérez and Brian Granger. Project Jupyter's name is a reference to the three core programming languages supported by Jupyter, which are Julia, Python and R, and also a homage to Galileo's notebooks recording the discovery of the moons of Jupiter. Project Jupyter has developed and supported the interactive computing products Jupyter Notebook, JupyterHub, and JupyterLab. Jupyter is financially sponsored by NumFOCUS.