Company type | Scientific support |
---|---|
Industry | |
Founded | 2010 |
Headquarters | Science Park 140 1098 XG, , The Netherlands |
Area served | Europe |
Key people |
|
Revenue | 6,130,989 Euro (2023) |
Website | egi |
EGI is a federation of computing and storage resource providers that deliver advanced computing and data analytics services for research and innovation. The Federation [1] is governed by its participants represented in the EGI Council and coordinated by the EGI Foundation.
As of 2024, the EGI Federation supports 160 scientific communities worldwide and over 95,000 users in their intensive data analysis. The most significant scientific communities supported by EGI in 2022 were Medical and Health Sciences, High Energy Physics, and Engineering and Technology. The EGI Federation provideds services through over 150 data centres, of which 25 are cloud sites, in 43 countries and 64 Research Infrastructures (4 of which are members of the Federation).
Originally, EGI stood for European Grid Infrastructure. This reflected its focus on providing access to high-throughput computing resources across Europe using Grid computing techniques. However, as EGI's service offerings expanded beyond traditional grid computing, particularly with the incorporation of federated cloud services, the original meaning of the acronym became less accurate.
To emphasise the broader scope of EGI's services and avoid any confusion associated with the outdated term "grid," it is recommended to refer to EGI simply as EGI.
The EGI Federation delivers a scalable digital research infrastructure (e-infrastructure), empowering tens of thousands of researchers across diverse scientific disciplines. Through the EGI Federation, researchers gain access to advanced computing and data analytics capabilities, including large-scale data analysis, while benefiting from the collaborative efforts of hundreds of service providers from both public and private sectors, consolidating resources from Europe and beyond. Overall, the EGI Federation offers a range of services, encompassing distributed high-throughput computing and cloud computing, storage and data management capabilities, co-development of new solutions, expert support, and comprehensive training opportunities. This ecosystem propels collaboration, scientific progress and innovation.
The EGI Foundation is the coordinating body of the EGI Federation. It was established in 2010 with headquarters in Amsterdam, Netherlands. The Foundation coordinates the research and innovation efforts of its members, spanning technical areas critical to data-intensive science, including large-scale data processing and analysis, distributed AI/ML, federated Identity and access management and the application of digital twins for research. The day-to-day running of the EGI Foundation is supervised by the Executive Board. The board’s members work closely with the EGI Director on operational, technical and financial issues. The Executive Board’s members are appointed by the EGI Council for a two-year term.
The EGI Council is responsible for defining the strategic direction of the EGI Federation. The Council acts as the senior decision-making and supervisory authority of the EGI Foundation, with a mandate to define the strategic direction of the entire EGI ecosystem.
EGI offers a suite of services [2] to support data-intensive research. These services include compute resources, orchestration tools, storage and data management solutions, training programmes, security and identity services, and applications. Compute resources encompass cloud compute, cloud container compute, high-throughput compute, and software distribution. Orchestration tools include the Workload Manager and infrastructure manager. Storage and data management solutions include online storage, data transfer, and DataHub. Training programmes cover FitSM, ISO 27001, and general training infrastructure. EGI Check-in and Secrets Store are key security and identity services, while applications such as Notebooks and Replay enhance research productivity. In addition to services for Research, EGI also provides services for Federation and Business. Services for Federation are designed to help resource providers and user communities collaborate and share resources. EGI also offers a range of services to support businesses in their digital transformation. Through the EGI Digital Innovation Hub (EGI DIH), [3] companies can access advanced computing resources, networking, funding and training opportunities, collaborate with research institutions, and test solutions before investing.
In 2002, the first large-scale experimental facility was successfully demonstrated by the DataGrid project [4] under the lead of CERN with tens of technical architects from the major High Energy Physics institutes in the world. For the first time, distributed computing was applied to data-intensive processing. It aimed at developing a large-scale computational grid to facilitate distributed data-intensive scientific computing across High Energy Physics, Earth Observation, and Biology science applications.
On 28 February 2003, the first software release of LCG-MW was published. gLite, [5] the Lightweight Middleware for Grid Computing and LCG, Large Hadron Collider Computing Grid, are the cornerstone of the Worldwide LHC Computing Grid, which expanded over time towards the EGI Federation.
2004 marks the year of the first pilot infrastructure, seeing the participation of CERN and data centres in the United Kingdom, Spain, Germany, the Netherlands, France, Canada, Russia, Bulgaria, the Asia-Pacific region and Switzerland. Over the years, the infrastructure has grown into a federation of 128 data centres and 25 cloud providers serving more than 95,000 users worldwide. [6] In 2004, the first data processing tasks started being formally recorded in a central accounting system. The EGI Accounting Portal [7] provides the accounting data for Compute, Storage and Data services gathered from the data centres of the EGI Federation.
A few years later, in 2010, EGI [8] was established as the coordinating body of the EGI Federation to build an integrated pan-European infrastructure to support European research communities primarily. In the same year, EGI launched the flagship project EGI Inspire. [9] That project brought together European organisations to establish a sustainable European Grid Infrastructure for large-scale data analysis. The success of the project was due to the adoption of a distributed computing model to solve big data problems. Moreover, EGI-Inspire harmonised operational policies across its federation of affiliated data centres and cloud service providers worldwide, integrating e-infrastructures from 57 countries. The EGI Federation was the first to apply federation to cloud provisioning, [10] opening a new avenue in large-scale interactive data analysis. In 2015, within EGI Engage, [11] opening a new avenue in large-scale interactive data analysis. The EGI Federated Cloud is an IaaS-type cloud, incorporating academic and private clouds and virtualised resources built using open standards. Its development is driven by the needs of the scientific community, resulting in a novel research e-infrastructure that relies on well-established federated operational services, making EGI a dependable resource for scientific endeavours. In 2015, EGI, EUDAT, GÉANT, LIBER and OpenAIRE published a position paper [12] on a 'European Open Science Cloud for Research'. With the EOSC-hub project [13] in 2016, EGI started contributing in practice to shaping the services for the EOSC. The work continued with a series of projects, like EOSC Enhance, [14] EOSC Life [15] and EOSC Synergy. [16] With EGI-ACE [17] and its contribution to EOSC Future, [18] EGI has continued developing the EOSC Core. In early 2024, EGI started providing services to the EOSC EU Node, [19] and with EOSC Beyond [20] it will provide new EOSC Core capabilities and pilot additional national and thematic nodes. In October 2024, EUDAT, GÉANT, OpenAIRE, PRACE and EGI signed a Memorandum of Understanding establishing the European e-Infrastructures Assembly. [21] This collaboration will bolster the position and promote the services of e-Infrastructures, empowering researchers across Europe to drive innovation and advance scientific discovery.
Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.
E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications." In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers. E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics" and the human digital footprint for the social sciences.
UNICORE (UNiform Interface to COmputing REsources) is a grid computing technology for resources such as supercomputers or cluster systems and information stored in databases. UNICORE was developed in two projects funded by the German ministry for education and research (BMBF). In European-funded projects UNICORE evolved to a middleware system used at several supercomputer centers. UNICORE served as a basis in other research projects. The UNICORE technology is open source under BSD licence and available at SourceForge.
United States federal research funders use the term cyberinfrastructure to describe research environments that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services distributed over the Internet beyond the scope of a single institution. In scientific usage, cyberinfrastructure is a technological and sociological solution to the problem of efficiently connecting laboratories, data, computers, and people with the goal of enabling derivation of novel scientific theories and knowledge.
The term e-Research refers to the use of information technology to support existing and new forms of research. This extends cyber-infrastructure practices established in STEM fields such as e-Science to cover other all research areas, including HASS fields such as digital humanities.
The INFN Grid project was an initiative of the Istituto Nazionale di Fisica Nucleare (INFN) —Italy's National Institute for Nuclear Physics—for grid computing. It was intended to develop and deploy grid middleware services to allow INFN's users to transparently and securely share the computing and storage resources together with applications and technical facilities for scientific collaborations.
"Cloud computing is a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand," according to ISO.
The Open Cloud Computing Interface (OCCI) is a set of specifications delivered through the Open Grid Forum, for cloud computing service providers. OCCI has a set of implementations that act as proofs of concept. It builds upon World Wide Web fundamentals by using the Representational State Transfer (REST) approach for interacting with services.
OpenNebula is an open source cloud computing platform for managing heterogeneous data center, public cloud and edge computing infrastructure resources. OpenNebula manages on-premises and remote virtual infrastructure to build private, public, or hybrid implementations of infrastructure as a service (IaaS) and multi-tenant Kubernetes deployments. The two primary uses of the OpenNebula platform are data center virtualization and cloud deployments based on the KVM hypervisor, LXD/LXC system containers, and AWS Firecracker microVMs. The platform is also capable of offering the cloud infrastructure necessary to operate a cloud on top of existing VMware infrastructure. In early June 2020, OpenNebula announced the release of a new Enterprise Edition for corporate users, along with a Community Edition. OpenNebula CE is free and open-source software, released under the Apache License version 2. OpenNebula CE comes with free access to patch releases containing critical bug fixes but with no access to the regular EE maintenance releases. Upgrades to the latest minor/major version is only available for CE users with non-commercial deployments or with significant open source contributions to the OpenNebula Community. OpenNebula EE is distributed under a closed-source license and requires a commercial Subscription.
Ignacio Martín Llorente is an entrepreneur, researcher and educator in the field of cloud and distributed computing. He is the director of OpenNebula, a visiting scholar at Harvard University and a full-time professor at the Complutense University of Madrid. Dr. Llorente is an IEEE Senior Member. He holds a Ph.D. in Computer Science from UCM and an Executive MBA from IE Business School.
The SHIWA project within grid computing was a project led by the LPDS of MTA Computer and Automation Research Institute. The project coordinator was Prof. Dr. Peter Kacsuk. It started on the 1st of July 2010 and lasted two years. SHIWA was supported by a grant from the European Commission's FP7 INFRASTRUCTURES-2010-2 call under grant agreement n°261585.
Polish Grid Infrastructure PL-Grid, a nationwide computing structure, built in 2009-2011, under the scientific project PL-Grid – Polish Infrastructure for Supporting Computational Science in the European Research Space. Its purpose was to enable scientific research based on advanced computer simulations and large-scale computations using the computer clusters, and to provide convenient access to the computer resources for research teams, also outside the communities, in which the high performance computing centers operate.
Several centers for supercomputing exist across Europe, and distributed access to them is coordinated by European initiatives to facilitate high-performance computing. One such initiative, the HPC Europa project, fits within the Distributed European Infrastructure for Supercomputing Applications (DEISA), which was formed in 2002 as a consortium of eleven supercomputing centers from seven European countries. Operating within the CORDIS framework, HPC Europa aims to provide access to supercomputers across Europe.
The European Middleware Initiative (EMI) was a computer software platform for high performance distributed computing. It was developed and distributed directly by the EMI project. It was the base for other grid middleware distributions used by scientific research communities and distributed computing infrastructures all over the world especially in Europe, South America and Asia. EMI supported broad scientific experiments and initiatives, such as the Worldwide LHC Computing Grid.
ELIXIR is an initiative that allows life science laboratories across Europe to share and store their research data as part of an organised network. Its goal is to bring together Europe's research organisations and data centres to help coordinate the collection, quality control and storage of large amounts of biological data produced by life science experiments. ELIXIR aims to ensure that biological data is integrated into a federated system easily accessible by the scientific community.
CELAR was a research project which successfully developed an open source set of tools designed to provide automatic, multi-grained resource allocation for cloud applications. In this way CELAR developed a solution that competes directly with Ubuntu Juju (software), Openstack Heat and Amazon Web Services. CELAR was developed with funding from the European Commission under the Seventh Framework Programme for Research and Technological Development, sometimes abbreviated to FP7.
Many universities, vendors, institutes and government organizations are investing in cloud computing research:
The European Open Science Cloud (EOSC) is a European Commission initiative aiming at developing an infrastructure providing its users with services promoting open science practices. Besides being open science oriented, the envisaged infrastructure is built by aggregating services provided by several providers following a System of systems approach.
D4Science is a Data Infrastructure offering services by community-driven virtual research environments. In particular, it supports communities of practice willing to implement open science practices. The infrastructure follows the system of systems approach, where the constituent systems offer “resources” assembled together to implement the overall set of D4Science services. In particular, D4Science aggregates “domain agnostic” service providers as well as community-specific ones to build a unifying space where the aggregated resources can be exploited via Virtual research Environments and their services.
{{cite news}}
: CS1 maint: DOI inactive as of December 2024 (link){{cite journal}}
: CS1 maint: multiple names: authors list (link)