Science DMZ Network Architecture

Last updated November 01, 2023

The term Science DMZ refers to a computer subnetwork that is structured to be secure, but without the performance limits that would otherwise result from passing data through a stateful firewall.^[1]^[2] The Science DMZ is designed to handle high volume data transfers, typical with scientific and high-performance computing, by creating a special DMZ to accommodate those transfers.^[3] It is typically deployed at or near the local network perimeter, and is optimized for a moderate number of high-speed flows, rather than for general-purpose business systems or enterprise computing.^[4]

The term Science DMZ was coined by collaborators at the US Department of Energy's ESnet in 2010.^[5] A number of universities and laboratories have deployed or are deploying a Science DMZ. In 2012 the National Science Foundation funded the creation or improvement of Science DMZs on several university campuses in the United States.^[6]^[7]^[8]

The Science DMZ^[9] is a network architecture to support Big Data. The so-called information explosion has been discussed since the mid 1960s, and more recently the term data deluge ^[10] has been used to describe the exponential growth in many types of data sets. These huge data sets, often need to be copied from one location to another using the Internet. The movement of data sets of this magnitude in a reasonable amount of time should be possible on modern networks. For example, it should only take less than 4 hours to transfer 10 Terabytes of data on a 10 Gigabit Ethernet network path, assuming disk performance is adequate^[11] The problem is that this requires networks that are free from packet loss and middleboxes such as traffic shapers or firewalls that slow network performance.

Stateful firewalls

Most businesses and other institutions use a firewall to protect their internal network from malicious attacks originating from outside. All traffic between the internal network and the external Internet must pass through a firewall, which discards traffic likely to be harmful.

A stateful firewall tracks the state of each logical connection passing through it, and rejects data packets inappropriate for the state of the connection. For example, a website would not be allowed to send a page to a computer on the internal network, unless the computer had requested it. This requires a firewall to keep track of the pages recently requested, and match requests with responses.

A firewall must also analyze network traffic in much more detail, compared to other networking components, such as routers and switches. Routers only have to deal with the network layer, but firewalls must also process the transport and application layers as well. All this additional processing takes time, and limits network throughput. While routers and most other networking components can handle speeds of 100 billion bits per second (Gbps), firewalls limit traffic to about 1 Gbit/s,^[12] which is unacceptable for passing large amounts of scientific data.

Modern firewalls can leverage custom hardware (ASIC) to accelerate traffic and inspection, in order to achieve higher throughput. This can present an alternative to Science DMZs and allows in place inspection through existing firewalls, as long as unified threat management (UTM) inspection is disabled.

While stateful firewall may be necessary for critical business data, such as financial records, credit cards, employment data, student grades, trade secrets, etc., science data requires less protection, because copies usually exist in multiple locations and there is less economic incentive to tamper.^[4]

DMZ

A firewall must restrict access to the internal network but allow external access to services offered to the public, such as web servers on the internal network. This is usually accomplished by creating a separate internal network called a DMZ, a play on the term "demilitarized zone." External devices are allowed to access devices in the DMZ. Devices in the DMZ are usually maintained more carefully to reduce their vulnerability to malware. Hardened devices are sometimes called bastion hosts.

The Science DMZ takes the DMZ idea one step farther, by moving high performance computing into its own DMZ.^[13] Specially configured routers pass science data directly to or from designated devices on an internal network, thereby creating a virtual DMZ. Security is maintained by setting access control lists (ACLs) in the routers to only allow traffic to/from particular sources and destinations. Security is further enhanced by using an intrusion detection system (IDS) to monitor traffic, and look for indications of attack. When an attack is detected, the IDS can automatically update router tables, resulting in what some call a Remotely Triggered BlackHole (RTBH).^[1]

Justification

The Science DMZ provides a well-configured location for the networking, systems, and security infrastructure that supports high-performance data movement. In data-intensive science environments, data sets have outgrown portable media, and the default configurations used by many equipment and software vendors are inadequate for high performance applications. The components of the Science DMZ are specifically configured to support high performance applications, and to facilitate the rapid diagnosis of performance problems. Without the deployment of dedicated infrastructure, it is often impossible to achieve acceptable performance. Simply increasing network bandwidth is usually not good enough, as performance problems are caused by many factors, ranging from underpowered firewalls to dirty fiber optics to untuned operating systems.

The Science DMZ is the codification of a set of shared best practices—concepts that have been developed over the years—from the scientific networking and systems community. The Science DMZ model describes the essential components of high-performance data transfer infrastructure in a way that is accessible to non-experts and scalable across any size of institution or experiment.

Components

The primary components of a Science DMZ are:

A high performance Data Transfer Node (DTN)^[14] running parallel data transfer tools such as GridFTP
A network performance monitoring host, such as perfSONAR
A high performance router/switch

Optional Science DMZ components include:

Support for layer-2 Multiprotocol Label Switching (MPLS) Virtual Private Networks (VPN)
Support for software-defined networking

Related Research Articles

Network address translation (NAT) is a method of mapping an IP address space into another by modifying network address information in the IP header of packets while they are in transit across a traffic routing device. The technique was originally used to bypass the need to assign a new address to every host when a network was moved, or when the upstream Internet service provider was replaced, but could not route the network's address space. It has become a popular and essential tool in conserving global address space in the face of IPv4 address exhaustion. One Internet-routable IP address of a NAT gateway can be used for an entire private network.

In computer security, a DMZ or demilitarized zone is a physical or logical subnetwork that contains and exposes an organization's external-facing services to an untrusted, usually larger, network such as the Internet. The purpose of a DMZ is to add an additional layer of security to an organization's local area network (LAN): an external network node can access only what is exposed in the DMZ, while the rest of the organization's network is protected behind a firewall. The DMZ functions as a small, isolated network positioned between the Internet and the private network.

<span class="mw-page-title-main">Stateful firewall</span> Connection tracking network security system

In computing, a stateful firewall is a network-based firewall that individually tracks sessions of network connections traversing it. Stateful packet inspection, also referred to as dynamic packet filtering, is a security feature often used in non-commercial and business networks.

Deep packet inspection (DPI) is a type of data processing that inspects in detail the data being sent over a computer network, and may take actions such as alerting, blocking, re-routing, or logging it accordingly. Deep packet inspection is often used for baselining application behavior, analyzing network usage, troubleshooting network performance, ensuring that data is in the correct format, checking for malicious code, eavesdropping, and internet censorship, among other purposes. There are multiple headers for IP packets; network equipment only needs to use the first of these for normal operation, but use of the second header is normally considered to be shallow packet inspection despite this definition.

A bastion host is a special-purpose computer on a network specifically designed and configured to withstand attacks, so named by analogy to the bastion, a military fortification. The computer generally hosts a single application or process, for example, a proxy server or load balancer, and all other services are removed or limited to reduce the threat to the computer. It is hardened in this manner primarily due to its location and purpose, which is either on the outside of a firewall or inside of a demilitarized zone (DMZ) and usually involves access from untrusted networks or computers. These computers are also equipped with special networking interfaces to withstand high-bandwidth attacks through the internet.

A middlebox is a computer networking device that transforms, inspects, filters, and manipulates traffic for purposes other than packet forwarding. Examples of middleboxes include firewalls, network address translators (NATs), load balancers, and deep packet inspection (DPI) devices.

<span class="mw-page-title-main">Energy Sciences Network</span>

The Energy Sciences Network (ESnet) is a high-speed computer network serving United States Department of Energy (DOE) scientists and their collaborators worldwide. It is managed by staff at the Lawrence Berkeley National Laboratory.

An application-level gateway is a security component that augments a firewall or NAT employed in a computer network. It allows customized NAT traversal filters to be plugged into the gateway to support address and port translation for certain application layer "control/data" protocols such as FTP, BitTorrent, SIP, RTSP, file transfer in IM applications. In order for these protocols to work through NAT or a firewall, either the application has to know about an address/port number combination that allows incoming packets, or the NAT has to monitor the control traffic and open up port mappings dynamically as required. Legitimate application data can thus be passed through the security checks of the firewall or NAT that would have otherwise restricted the traffic for not meeting its limited filter criteria.

An application delivery network (ADN) is a suite of technologies that, when deployed together, provide availability, security, visibility, and acceleration for Internet applications such as websites. ADN components provide supporting functionality that enables website content to be delivered to visitors and other users of that website, in a fast, secure, and reliable way.

An application delivery controller (ADC) is a computer network device in a datacenter, often part of an application delivery network (ADN), that helps perform common tasks, such as those done by web accelerators to remove load from the web servers themselves. Many also provide load balancing. ADCs are often placed in the DMZ, between the outer firewall or router and a web farm.

In computing, network virtualization is the process of combining hardware and software network resources and network functionality into a single, software-based administrative entity, a virtual network. Network virtualization involves platform virtualization, often combined with resource virtualization.

Junos OS is a FreeBSD-based network operating system used in Juniper Networks routing, switching and security devices.

<span class="mw-page-title-main">Firewall (computing)</span> Software or hardware-based network security system

In computing, a firewall is a network security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules. A firewall typically establishes a barrier between a trusted network and an untrusted network, such as the Internet.

In information security, a guard is a device or system for allowing computers on otherwise separate networks to communicate, subject to configured constraints. In many respects a guard is like a firewall and guards may have similar functionality to a gateway.

Google Compute Engine (GCE) is the Infrastructure as a Service (IaaS) component of Google Cloud Platform which is built on the global infrastructure that runs Google's search engine, Gmail, YouTube and other services. Google Compute Engine enables users to launch virtual machines (VMs) on demand. VMs can be launched from the standard images or custom images created by users. GCE users must authenticate based on OAuth 2.0 before launching the VMs. Google Compute Engine can be accessed via the Developer Console, RESTful API or command-line interface (CLI).

In computer networking, Cisco ASA 5500 Series Adaptive Security Appliances, or simply Cisco ASA, is Cisco's line of network security devices introduced in May 2005. It succeeded three existing lines of popular Cisco products:

Port Control Protocol (PCP) is a computer networking protocol that allows hosts on IPv4 or IPv6 networks to control how the incoming IPv4 or IPv6 packets are translated and forwarded by an upstream router that performs network address translation (NAT) or packet filtering. By allowing hosts to create explicit port forwarding rules, handling of the network traffic can be easily configured to make hosts placed behind NATs or firewalls reachable from the rest of the Internet, which is a requirement for many applications.

IPFire is a hardened open source Linux distribution that primarily performs as a router and a firewall; a standalone firewall system with a web-based management console for configuration.

Data center security is the set of policies, precautions and practices adopted at a data center to avoid unauthorized access and manipulation of its resources. The data center houses the enterprise applications and data, hence why providing a proper security system is critical. Denial of service (DoS), theft of confidential information, data alteration, and data loss are some of the common security problems afflicting data center environments.

A secure access service edge (SASE) is technology used to deliver wide area network (WAN) and security controls as a cloud computing service directly to the source of connection rather than a data center. It uses cloud and edge computing technologies to reduce the latency that results from backhauling all WAN traffic over long distances to one or a few corporate data centers, due to the increased movement off-premises of dispersed users and their applications. This also helps organizations support dispersed users and their devices with digital transformation and application modernization initiatives.

References

1 2 Dan Goodin (June 26, 2012). "Scientists experience life outside the firewall with "Science DMZs."" . Retrieved 2013-05-12.
↑ Eli Dart; Brian Tierney; Eric Pouyoul; Joe Breen (January 2012). "Achieving the Science DMZ" (PDF). Retrieved 2015-12-31.
↑ Dart, E.; Rotman, L.; Tierney, B.; Hester, M.; Zurawski, J. (2013). "The Science DMZ". Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13. p. 1. doi:10.1145/2503210.2503245. ISBN 978-1-4503-2378-9. S2CID 52861484.
1 2 "Why Science DMZ?" . Retrieved 2013-05-12.
↑ Dart, Eli; Metzger, Joe (June 13, 2011). "The Science DMZ". CERN LHCOPN/LHCONE workshop. Retrieved 2013-05-26. This is the earliest cite-able reference to the Science DMZ. Work on the concept had been going on for several years prior to this.
↑ "Implementation of a Science DMZ at San Diego State University to Facilitate High-Performance Data Transfer for Scientific Applications". National Science Foundation. September 10, 2012. Retrieved 2013-05-13.
↑ "SDNX - Enabling End-to-End Dynamic Science DMZ". National Science Foundation. September 7, 2012. Retrieved 2013-05-13.
↑ "Improving an existing science DMZ". National Science Foundation. September 12, 2012. Retrieved 2013-05-13.
↑ Dart, Eli; Rotman, Lauren (Aug 2012). "The Science DMZ: A Network Architecture for Big Data". LBNL-report.
↑ Brett Ryder (Feb 25, 2010). "The Data Deluge". The Economist.
↑ . "Network Requirements and Expectations". Lawrence Berkeley National Laboratory.
↑ "Firewall Performance Comparison" (PDF).
↑ pmoyer (Dec 13, 2012). "Research & Education Network (REN) Architecture: Science-DMZ" . Retrieved 2013-05-12.
↑ "Science DMZ: Data Transfer Nodes". Lawrence Berkeley Laboratory. 2013-04-04. Retrieved 2013-05-13.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[ars-1] 1 2 Dan Goodin (June 26, 2012). "Scientists experience life outside the firewall with "Science DMZs."" . Retrieved 2013-05-12.

[2] Eli Dart; Brian Tierney; Eric Pouyoul; Joe Breen (January 2012). "Achieving the Science DMZ" (PDF). Retrieved 2015-12-31.

[?-3] Dart, E.; Rotman, L.; Tierney, B.; Hester, M.; Zurawski, J. (2013). "The Science DMZ". Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13. p. 1. doi:10.1145/2503210.2503245. ISBN 978-1-4503-2378-9. S2CID 52861484.

[motive-4] 1 2 "Why Science DMZ?" . Retrieved 2013-05-12.

[5] Dart, Eli; Metzger, Joe (June 13, 2011). "The Science DMZ". CERN LHCOPN/LHCONE workshop. Retrieved 2013-05-26. This is the earliest cite-able reference to the Science DMZ. Work on the concept had been going on for several years prior to this.

[6] "Implementation of a Science DMZ at San Diego State University to Facilitate High-Performance Data Transfer for Scientific Applications". National Science Foundation. September 10, 2012. Retrieved 2013-05-13.

[7] "SDNX - Enabling End-to-End Dynamic Science DMZ". National Science Foundation. September 7, 2012. Retrieved 2013-05-13.

[8] "Improving an existing science DMZ". National Science Foundation. September 12, 2012. Retrieved 2013-05-13.

[9] Dart, Eli; Rotman, Lauren (Aug 2012). "The Science DMZ: A Network Architecture for Big Data". LBNL-report.

[10] Brett Ryder (Feb 25, 2010). "The Data Deluge". The Economist.

[11] . "Network Requirements and Expectations". Lawrence Berkeley National Laboratory.

[performance-12] "Firewall Performance Comparison" (PDF).

[13] yer (Dec 13, 2012). "Research & Education Network (REN) Architecture: Science-DMZ" . Retrieved 2013-05-12.

[14] "Science DMZ: Data Transfer Nodes". Lawrence Berkeley Laboratory. 2013-04-04. Retrieved 2013-05-13.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]