Trace-based simulation

Last updated

In computer science, trace-based simulation refers to system simulation performed by looking at traces of program execution or system component access with the purpose of performance prediction. [1]

Computer science study of the theoretical foundations of information and computation

Computer science is the study of processes that interact with data and that can be represented as data in the form of programs. It enables the use of algorithms to manipulate, store, and communicate digital information. A computer scientist studies the theory of computation and the practice of designing software systems.

Computer simulation simulation, run on a single computer, or a network of computers, to reproduce behavior of a system; modeling a real physical system in a computer

Computer simulation is the reproduction of the behavior of a system using a computer to simulate the outcomes of a mathematical model associated with said system. Since they allow to check the reliability of chosen mathematical models, computer simulations have become a useful tool for the mathematical modeling of many natural systems in physics, astrophysics, climatology, chemistry, biology and manufacturing, human systems in economics, psychology, social science, health care and engineering. Simulation of a system is represented as the running of the system's model. It can be used to explore and gain new insights into new technology and to estimate the performance of systems too complex for analytical solutions.

In computer science, performance prediction means to estimate the execution time or other performance factors of a program on a given computer. It is being widely used for computer architects to evaluate new computer designs, for compiler writers to explore new optimizations, and also for advanced developers to tune their programs.

Trace-based simulation may be used in a variety of applications, from the analysis of solid state disks to the message passing performance on very large computer clusters. [1] [2]

In computer science, message passing is a technique for invoking behavior on a computer. The invoking program sends a message to a process and relies on the process and the supporting infrastructure to select and invoke the actual code to run. Message passing differs from conventional programming where a process, subroutine, or function is directly invoked by name. Message passing is key to some models of concurrency and object-oriented programming.

Computer cluster group of computers

A computer cluster is a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.

Traced-based simulators usually have two components: one that executes actions and stores the results (i.e. traces) and another which reads the log files of traces and interpolates them to new (and often more complex) scenarios. [2]

For instance, in the case of large computer cluster design, the execution takes place on a small number of nodes, and traces are left in log files. The simulator reads those log files and simulates performance on a much larger number of nodes, thus providing a view of the performance of very large applications, based on the execution traces on a much smaller number of nodes. [2] [3]

See also

Related Research Articles

Beowulf cluster type of parallel computing cluster

A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. The result is a high-performance parallel computing cluster from inexpensive personal computer hardware.

Earth Simulator highly parallel vector supercomputer system for running global climate models

The Earth Simulator (ES), developed by the Japanese government's initiative "Earth Simulator Project", was a highly parallel vector supercomputer system for running global climate models to evaluate the effects of global warming and problems in solid earth geophysics. The system was developed for Japan Aerospace Exploration Agency, Japan Atomic Energy Research Institute, and Japan Marine Science and Technology Center (JAMSTEC) in 1997. Construction started in October 1999, and the site officially opened on 11 March 2002. The project cost 60 billion yen.

Scalability

Scalability is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. For example, a system is considered scalable if it is capable of increasing its total output under an increased load when resources are added. An analogous meaning is implied when the word is used in an economic context, where a company's scalability implies that the underlying business model offers the potential for economic growth within the company.

Google File System is a proprietary distributed file system developed by Google to provide efficient, reliable access to data using large clusters of commodity hardware. A new version of Google File System code named Colossus was released in 2010.

In computer network research, network simulation is a technique whereby a software program models the behavior of a network by calculating the interaction between the different network entities. Most simulators use discrete event simulation - the modeling of systems in which state variables change at discrete points in time. The behavior of the network and the various applications and services it supports can then be observed in a test lab; various attributes of the environment can also be modified in a controlled manner to assess how the network / protocols would behave under different conditions.

The Parallel Virtual File System (PVFS) is an open-source parallel file system. A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. PVFS was designed for use in large scale cluster computing. PVFS focuses on high performance access to large data sets. It consists of a server process and a client library, both of which are written entirely of user-level code. A Linux kernel module and pvfs-client process allow the file system to be mounted and used with standard utilities. The client library provides for high performance access via the message passing interface (MPI). PVFS is being jointly developed between The Parallel Architecture Research Laboratory at Clemson University and the Mathematics and Computer Science Division at Argonne National Laboratory, and the Ohio Supercomputer Center. PVFS development has been funded by NASA Goddard Space Flight Center, Argonne, NSF PACI and HECURA programs, and other government and private agencies. PVFS is now known as OrangeFS in its newest development branch.

Apache Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clusters built from commodity hardware—still the common use—it has also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

A computer architecture simulator is a program that simulates the execution of computer architecture.

Distributed networking is a distributed computing network system where components of the program and data depend on multiple sources.

The National Center for Computational Sciences (NCCS) is a United States Department of Energy Leadership Computing Facility. The NCCS provides resources for calculation and simulation in fields including astrophysics, materials, and climate research. This research is intended to enhance American competitiveness in industry. The NCCS, founded in 1992 and located at Oak Ridge National Laboratory (ORNL), currently manages a 2.33-petaflop Cray XT5 supercomputer named Jaguar for use in open research by academic and corporate researchers. Jaguar was named the world's fastest computer at SC09, a position it held until October 2010. Founded in 1992, the NCCS is a managed activity of the Advanced Scientific Computing Research program of the Department of Energy Office of Science (DOE-SC).

Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications which devote most of their execution time to computational requirements are deemed compute-intensive, whereas computing applications which require large volumes of data and devote most of their processing time to I/O and manipulation of data are deemed data-intensive.

HPCC open source, data-intensive computing system platform

HPCC, also known as DAS, is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing big data. The HPCC platform includes system configurations to support both parallel batch data processing (Thor) and high-performance online query applications using indexed data files (Roxie). The HPCC platform also includes a data-centric declarative programming language for parallel data processing called ECL.

Quasi-opportunistic supercomputing

Quasi-opportunistic supercomputing is a computational paradigm for supercomputing on a large number of geographically disperse computers. Quasi-opportunistic supercomputing aims to provide a higher quality of service than opportunistic resource sharing.

Supercomputer architecture

Approaches to supercomputer architecture have taken dramatic turns since the earliest systems were introduced in the 1960s. Early supercomputer architectures pioneered by Seymour Cray relied on compact innovative designs and local parallelism to achieve superior computational peak performance. However, in time the demand for increased computational power ushered in the age of massively parallel systems.

BIGSIM is a computer simulation and performance modeling system for parallel computing, typically used for very large computer clusters. BIGSIM was developed at the University of Illinois.

Message passing in computer clusters

Message passing is an inherent element of all computer clusters. All computer clusters, ranging from homemade Beowulfs to some of the fastest supercomputers in the world, rely on message passing to coordinate the activities of the many nodes they encompass. Message passing in computer clusters built with commodity servers and switches is used by virtually every internet service.

Microarchitecture simulation is an important technique in computer architecture research and computer science education. It is a tool for modeling the design and behavior of a microprocessor and its components, such as the ALU, cache memory, control unit, and data path, among others. The simulation allows researchers to explore the design space as well as to evaluate the performance and efficiency of novel microarchitecture features. For example, several microarchitecture components, such as branch predictors, re-order buffer, and trace cache, went through numerous simulation cycles before they become common components in contemporary microprocessors of today. In addition, the simulation also enables educators to teach computer organization and architecture courses with hand-on experiences.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

References

  1. 1 2 Software Technologies for Embedded and Ubiquitous Systems edited by Sunggu Lee and Priya Narasimhan 2009 ISBN   3642102646 page 28
  2. 1 2 3 Languages and Compilers for Parallel Computing edited by Keith Cooper, John Mellor-Crummey and Vivek Sarkar 2011 ISBN   3642195946 pages 202-203
  3. Petascale Computing: Algorithms and Applications by David A. Bader 2007 ISBN   1584889098 pages 435-435