DOME project

Last updated August 26, 2024

DOME is a Dutch government-funded project between IBM and ASTRON in form of a public-private-partnership focussing on the Square Kilometre Array (SKA), the world's largest planned radio telescope. SKA will be built in Australia and South Africa. The DOME project objective is technology roadmap development that applies both to SKA and IBM. The 5-year project was started in 2012 and is co-funded by the Dutch government and IBM Research in Zürich, Switzerland and ASTRON in the Netherlands.^[1]^[2]^[3]^[4] The project ended officially on 30 September 2017.

P1 Algorithms & Machines – As traditional computing scaling have essentially hit a wall, a new set of methodologies and principles is needed for the design of future large-scale computers. This will be an umbrella project for the other six.
P2 Access Patterns – When faced with storing petabytes of data per day, new thinking of data storage tiering and storage medium must be developed.
P3 Nano Photonics – Fiber optic communication over long distances and between systems is nothing new, but there is a lot to do for optic communications within computer systems and within the telescopes themselves.
P4 Microservers – New demands on higher computing density, higher performance per Watt, and reduced complexity of systems suggests a new kind of custom designed server
P5 Accelerators – With the flattening of general computing performance, special architectures for addressing next level of performance will be investigated for specialized tasks like signal processing and analysis.
P6 Compressive Sampling – Fundamental research into tailored signal processing and machine learning algorithms for the capture, processing, and analysis of the radio astronomy data. Compressive sensing, algebraic systems, machine learning and pattern recognition are focus areas.
P7 Real-Time Communication – Reduce the latency caused by redundant network operations at very large scale systems and optimize the utility of the communications bandwidth so that the correct data gets to the correct processing unit in real time.

P1 Algorithms & Machines

The design of computers has changed dramatically in the last decades but the old paradigms still reign. Current designs stem from single computers working on small data sets in one location. SKA will face a completely different landscape, working on an extremely large data set, collected on myriad of geographically separated locations using tens of thousands of separate computers in real time. The fundamental principles for designing such a machine will have to be reexamined. Parameters concerning power envelope, accelerator technologies, workload distribution, memory size, CPU architecture, node intercommunications, must be investigated to draw new baseline to design from.^[5] The tools that result from this project are being open-sourced early 2018.

This fundamental research will work as the umbrella for the other six focus areas, help making proper decisions regarding architectural directions.

A first step will be a retrospective analysis of the design of the LOFAR and MeerKAT telescopes and development of a design tool to use when designing very large and distributed computers.

P2 Access Patterns

This project will focus on the very large amount of data the DOME must handle. SKA will generates petabytes of data daily and this must be handled differently according to urgency and geographical location whether its near the telescope arrays or in the datacenters. A complex tiered solution must be devised using a lot of technologies that currently is beyond the state of the art. Driving forces behind the designs will be lowest possible cost, accessibility and energy efficiency.

This multi-tier approach will combine several different kinds of software technologies to analyze, sift, distribute, store and retrieve data on hardware ranging from traditional storage media like magnetic tape and hard drives to newly developed technologies like phase-change memory. The suitability of different storage media heavily depends on the usage patterns when writing and reading data, and these patterns will change over time, so there must also be room for changes to the designs.^[6]

P3 Nano Photonics

Transport of data is a major factor, influencing design on the largest scales to the smallest of DOME. The cost of communicating electrically on copper wires will drive the application of low-power photonic interconnects, from connections between collecting antennas and datacenters to connecting devices inside the computers. Both IBM and ASTRON have advanced research programs into nano photonics, beamforming and optical links and they will combine their efforts for the new designs.^[7]

This research project is divided into four R&D sections, investigating digital optical interconnects, analog optical interconnects and analog optical signal processing.

Digital optical interconnect technology for astronomy signal processing boards.
Analog optical interconnection technology for focal-plane array front-ends.
Analog optical interconnection technology for photonic phased array receiver tiles.
Analog optical interconnection and signal processing technology for photonic focal plane arrays.

In February 2013 at the International Solid-State Circuits Conference (ISSCC), IBM and École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland showed a 100 Gbit/s analog-to-digital converter (ADC).^[8] In February 2014 at ISSCC, IBM and ASTRON demoed a 400 Gbit/s ADC.^[9]

P4 Microservers

In 2012 a team at IBM led by Ronald P. Luijten started pursuing a computational dense, and energy efficient 64-bit compute server design based on commodity components, running Linux. A system-on-chip (SoC) design where most necessary components would fit on a single chip would fit these goals best, and a definition of "microserver" emerged where essentially a complete motherboard (except RAM and boot flash) would fit on chip. ARM, x86 and Power ISA based solutions were investigated and a solution based on Freescale's Power ISA-based dual core P5020 / quad core P5040 processor came out on top.

Design

The resulting microserver is fit inside the same form factor as standard FB-DIMM socket. The SoC chip, about 20 GB of DRAM and a few control chips (such as the PSoC 3 from Cypress used for monitoring, debugging and booting) comprise a complete compute node with the physical dimensions of 133×55 mm. The card's pins are used for a SATA, five Gbit and two 10 Gbit Ethernet ports, one SD card interface, one USB 2 interface, and power.

The compute card operates within a 35 W power envelope with headroom up to 70 W. The idea is to fit about a hundred of these compute cards within a 19" rack 2U drawer together with network switchboards for external storage and communication. Cooling will be provided via the Aquasar hot water cooling solution pioneered by the SuperMUC supercomputer in Germany.

Future

In late 2013 a new SoC was chosen. Freescale's newer 12 core T4240 is significantly more powerful and operates within the same power envelope as the T5020. A new prototype micro server card was built and validated for the larger scale deployment in the full 2U drawer in early 2014. Later an 8-core ARMv8 board was developed using the LS2088A part from NXP (Formerly Freescale). At the end of 2017, IBM is licensing the technology to a startup who plans to take this to market by mid 2018.

P5 Accelerators

Traditional high performance processors hit a performance wall during the late 2000s when clock-speeds couldn't be increased anymore due to increasing power requirements. One of the solutions is to include hardware to off load the most common and/or compute intensive tasks to specialized hardware called accelerators. This research area will try to identify these areas and design algorithms and hardware to overcome the bottlenecks. There will probably be accelerators doing pattern detection, parsing, data lookup and signal processing. The hardware will be of two classes; fixed accelerators for static tasks, or programmable accelerators for a family of tasks with similar characteristics. The project will also look att massively parallel computing using commodity graphics processors.^[10]

P6 Compressive Sampling

The compressive sampling project is fundamental research into signal processing in collabrotation with Delft University of Technology. In the context of radio astronomy capture, analysis and processing of signals is extremely compute intensive on enormous datasets. The goal is to do sampling and compression simultaneously and use machine learning to detect what to keep and what to throw away, preferably as close to the data collectors as possible. This project's goal is to develop compressive sampling algorithms to use in capturing the signal and to calibrate the patterns to keep, in an ever-increasing number of pattern clusters. The research will also tackle the problem of degraded pattern quality, outlier detection, object classification and image formation.^[11]^[12]

P7 Real-Time Communication

Moving data from the collectors to the process facilities are traditionally bogged down due to high latency I/O, low bandwidth connections and data is often multiplied along the way due to lack of purposeful design of the communication network. This research project will try to reduce latency to a minimum and design the I/O systems so data will be written directly into the processing engines on an exascale computer design. The first phase will identify system bottlenecks, and investigate Remote direct memory access (RDMA). The second phase will investigate using standard RDMA technology onto interconnect networking. Phase three includes development of functional prototypes.^[13]

Related Research Articles

A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, supercomputers have existed which can perform over 10¹⁷ FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (10¹¹) to tens of teraFLOPS (10¹³). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems. It is designed to be scalable and uses a switched fabric network topology. Between 2014 and June 2016, it was the most commonly used interconnect in the TOP500 list of supercomputers.

Photonics is a branch of optics that involves the application of generation, detection, and manipulation of light in the form of photons through emission, transmission, modulation, signal processing, switching, amplification, and sensing. Photonics is closely related to quantum electronics, where quantum electronics deals with the theoretical part of it while photonics deal with its engineering applications. Though covering all light's technical applications over the whole spectrum, most photonic applications are in the range of visible and near-infrared light. The term photonics developed as an outgrowth of the first practical semiconductor light emitters invented in the early 1960s and optical fibers developed in the 1970s.

The Low-Frequency Array (LOFAR) is a large radio telescope, with an antenna network located mainly in the Netherlands, and spreading across 7 other European countries as of 2019. Originally designed and built by ASTRON, the Netherlands Institute for Radio Astronomy, it was first opened by Queen Beatrix of The Netherlands in 2010, and has since been operated on behalf of the International LOFAR Telescope (ILT) partnership by ASTRON.

Optical computing or photonic computing uses light waves produced by lasers or incoherent sources for data processing, data storage or data communication for computing. For decades, photons have shown promise to enable a higher bandwidth than the electrons used in conventional computers.

ASTRON is the Netherlands Institute for Radio Astronomy. Its main office is in Dwingeloo in the Dwingelderveld National Park in the province of Drenthe. ASTRON is part of the institutes organization of the Dutch Research Council (NWO).

Silicon photonics is the study and application of photonic systems which use silicon as an optical medium. The silicon is usually patterned with sub-micrometre precision, into microphotonic components. These operate in the infrared, most commonly at the 1.55 micrometre wavelength used by most fiber optic telecommunication systems. The silicon typically lies on top of a layer of silica in what is known as silicon on insulator (SOI).

William James Dally is an American computer scientist and educator. He is the chief scientist and senior vice president at Nvidia and was previously a professor of Electrical Engineering and Computer Science at Stanford University and MIT. Since 2021, he has been a member of the President's Council of Advisors on Science and Technology (PCAST).

MeerKAT, originally the Karoo Array Telescope, is a radio telescope consisting of 64 antennas in the Meerkat National Park, in the Northern Cape of South Africa. In 2003, South Africa submitted an expression of interest to host the Square Kilometre Array (SKA) Radio Telescope in Africa, and the locally designed and built MeerKAT was incorporated into the first phase of the SKA. MeerKAT was launched in 2018.

Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores. Manycore processors are used extensively in embedded computers and high-performance computing.

The C form-factor pluggable is a multi-source agreement to produce a common form-factor for the transmission of high-speed digital signals. The c stands for the Latin letter C used to express the number 100 (centum), since the standard was primarily developed for 100 Gigabit Ethernet systems.

Mellanox Technologies Ltd. was an Israeli-American multinational supplier of computer networking products based on InfiniBand and Ethernet technology. Mellanox offered adapters, switches, software, cables and silicon for markets including high-performance computing, data centers, cloud computing, computer data storage and financial services.

The Microsystems Technology Office (MTO) is one of seven current organizational divisions of DARPA, an agency responsible for the development of new technology for the United States Armed Forces. It is sometimes referred to as the Microelectronics Technology Office.

<span class="mw-page-title-main">DOME MicroDataCenter</span>

A microDataCenter contains compute, storage, power, cooling and networking in a very small volume, sometimes also called a "DataCenter-in-a-box". The term has been used to describe various incarnations of this idea over the past 20 years. Late 2017 a very tightly integrated version was shown at SuperComputing conference 2017: the DOME microDataCenter. Key features are its hot-watercooling, fully solid-state and being built with commodity components and standards only.

The Rice University Department of Electrical and Computer Engineering is one of nine academic departments at the George R. Brown School of Engineering at Rice University. Ashutosh Sabharwal is the Department Chair. Originally the Rice Department of Electrical Engineering, it was renamed in 1984 to Electrical and Computer Engineering.

Summit or OLCF-4 is a supercomputer developed by IBM for use at Oak Ridge Leadership Computing Facility (OLCF), a facility at the Oak Ridge National Laboratory, United States of America. As of June 2024, it is the 9th fastest supercomputer in the world on the TOP500 list. It held the number 1 position on this list from November 2018 to June 2020. Its current LINPACK benchmark is clocked at 148.6 petaFLOPS.

An AI accelerator, deep learning processor or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and computer vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFETs.

Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPUs) to external accelerators like graphics processing units (GPUs), ASICs, FPGAs or fast storage. It offers low latency, high speed, direct memory access connectivity between devices of different instruction set architectures.

An optical module is a typically hot-pluggable optical transceiver used in high-bandwidth data communications applications. Optical modules typically have an electrical interface on the side that connects to the inside of the system and an optical interface on the side that connects to the outside world through a fiber optic cable. The form factor and electrical interface are often specified by an interested group using a multi-source agreement (MSA). Optical modules can either plug into a front panel socket or an on-board socket. Sometimes the optical module is replaced by an electrical interface module that implements either an active or passive electrical connection to the outside world. A large industry supports the manufacturing and use of optical modules.

Sasikanth Manipatruni is an American engineer and inventor in the fields of Computer engineering, Integrated circuit technology, Materials Engineering and semiconductor device fabrication. Manipatruni contributed to developments in silicon photonics, spintronics and quantum materials.

References

↑ DOME: IBM and ASTRON’s Exascale Computer for SKA Radio Telescope
↑ "NLeSC signs DOME agreement with IBM and ASTRON". Archived from the original on 2014-07-14. Retrieved 2014-07-02.
↑ IBM looks to new technologies for unprecedented data processing challenge
↑ From Big Bang to Big Data: ASTRON and IBM Collaborate to Explore Origins of the Universe
↑ "ASTRON & IBM Center for Exascale Technology – Algorithms & Machines". Archived from the original on 2014-07-14. Retrieved 2014-07-02.
↑ "ASTRON & IBM Center for Exascale Technology – Access Patterns". Archived from the original on 2014-07-14. Retrieved 2014-07-02.
↑ "ASTRON & IBM Center for Exascale Technology – Nano Photonics". Archived from the original on 2014-07-14. Retrieved 2014-07-02.
↑ Ultra-Fast Ethernet Research Improves Internet Speeds to 100 Gb/second
↑ IBM opens the door to 400 Gbps internet
↑ "ASTRON & IBM Center for Exascale Technology – Accelerators". Archived from the original on 2014-07-14. Retrieved 2014-07-02.
↑ "ASTRON & IBM Center for Exascale Technology – Compressive Sampling". Archived from the original on 2014-07-14. Retrieved 2014-07-02.
↑ "Data reduction and image formation for future radio telescopes (DRIFT)". Archived from the original on 2014-07-14. Retrieved 2014-07-02.
↑ "ASTRON & IBM Center for Exascale Technology – RT Communication". Archived from the original on 2014-07-14. Retrieved 2014-07-02.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] DOME: IBM and ASTRON’s Exascale Computer for SKA Radio Telescope

[2] "NLeSC signs DOME agreement with IBM and ASTRON". Archived from the original on 2014-07-14. Retrieved 2014-07-02.

[3] IBM looks to new technologies for unprecedented data processing challenge

[4] From Big Bang to Big Data: ASTRON and IBM Collaborate to Explore Origins of the Universe

[5] "ASTRON & IBM Center for Exascale Technology – Algorithms & Machines". Archived from the original on 2014-07-14. Retrieved 2014-07-02.

[6] "ASTRON & IBM Center for Exascale Technology – Access Patterns". Archived from the original on 2014-07-14. Retrieved 2014-07-02.

[7] "ASTRON & IBM Center for Exascale Technology – Nano Photonics". Archived from the original on 2014-07-14. Retrieved 2014-07-02.

[8] Ultra-Fast Ethernet Research Improves Internet Speeds to 100 Gb/second

[9] IBM opens the door to 400 Gbps internet

[10] "ASTRON & IBM Center for Exascale Technology – Accelerators". Archived from the original on 2014-07-14. Retrieved 2014-07-02.

[11] "ASTRON & IBM Center for Exascale Technology – Compressive Sampling". Archived from the original on 2014-07-14. Retrieved 2014-07-02.

[12] "Data reduction and image formation for future radio telescopes (DRIFT)". Archived from the original on 2014-07-14. Retrieved 2014-07-02.

[13] "ASTRON & IBM Center for Exascale Technology – RT Communication". Archived from the original on 2014-07-14. Retrieved 2014-07-02.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]