Java Heterogeneous Distributed Computing

Last updated

Java Heterogeneous Distributed Computing refers to a programmable Java distributed system which was developed at the National University of Ireland in Maynooth. It allows researchers to access the spare clock cycles of a large number of semi-idle desktop PCs. It also allows for multiple problems to be processed in parallel with sophisticated scheduling mechanisms controlling the system. It has been successful when used for tackling problems in the areas of Bioinformatics, Biomedical engineering and cryptography.

It is an Open Source project licensed under the GPL.

See also


Related Research Articles

A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. Distributed computing is a field of computer science that studies distributed systems.

In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space, which is written as if it were a normal (local) procedure call, without the programmer explicitly writing the details for the remote interaction. That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. This is a form of client–server interaction, typically implemented via a request–response message-passing system. In the object-oriented programming paradigm, RPCs are represented by remote method invocation (RMI). The RPC model implies a level of location transparency, namely that calling procedures are largely the same whether they are local or remote, but usually, they are not identical, so local calls can be distinguished from remote calls. Remote calls are usually orders of magnitude slower and less reliable than local calls, so distinguishing them is important.

Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application. Grid computers also tend to be more heterogeneous and geographically dispersed than cluster computers. Although a single grid can be dedicated to a particular application, commonly a grid is used for a variety of purposes. Grids are often constructed with general-purpose grid middleware software libraries. Grid sizes can be quite large.

Parallel Virtual Machine (PVM) is a software tool for parallel networking of computers. It is designed to allow a network of heterogeneous Unix and/or Windows machines to be used as a single distributed parallel processor. Thus large computational problems can be solved more cost effectively by using the aggregate power and memory of many computers. The software is very portable; the source code, available free through netlib, has been compiled on everything from laptops to Crays.

Message-oriented middleware (MOM) is software or hardware infrastructure supporting sending and receiving messages between distributed systems. MOM allows application modules to be distributed over heterogeneous platforms and reduces the complexity of developing applications that span multiple operating systems and network protocols. The middleware creates a distributed communications layer that insulates the application developer from the details of the various operating systems and network interfaces. APIs that extend across diverse platforms and networks are typically provided by MOM.

<span class="mw-page-title-main">Concurrency (computer science)</span> Ability to execute a task in a non-serial manner

In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability of a program, algorithm, or problem into order-independent or partially-ordered components or units of computation.

Distributed Objects Everywhere (DOE) was a long-running Sun Microsystems project to build a distributed computing environment based on the CORBA system in the 'back end' and OpenStep as the user interface. First started in 1990 and announced soon thereafter, it remained vaporware for many years before it was finally released as NEO in 1995. It was sold for only a short period before being dropped in 1996. In its place is what is today known as Enterprise JavaBeans.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

In computer science, stream processing is a programming paradigm which views streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

<span class="mw-page-title-main">Volunteer computing</span> System where users donate computer resources to contribute to research

Volunteer computing is a type of distributed computing in which people donate their computers' unused resources to a research-oriented project, and sometimes in exchange for credit points. The fundamental idea behind it is that a modern desktop computer is sufficiently powerful to perform billions of operations a second, but for most users only between 10–15% of its capacity is used. Common tasks such as word processing or web browsing leave the computer mostly idle.

CNGrid is the Chinese national high performance computing network supported by 863 Program.

Middleware is a type of computer software program that provides services to software applications beyond those available from the operating system. It can be described as "software glue".

openQRM is a free and open-source cloud-computing management platform for managing heterogeneous data centre infrastructures.

Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA Foundation, which includes AMD and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer's perspective, relieving the programmer of the task of planning the moving of data between devices' disjoint memories.

Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data, adverse events, metagenomic data, etc. Currently it is supported and continuously developed by US Food and Drug Administration, George Washington University, and by DNA-HIVE, WHISE-Global and Embleema. HIVE currently operates fully functionally within the US FDA supporting wide variety (+60) of regulatory research and regulatory review projects as well as for supporting MDEpiNet medical device postmarket registries. Academic deployments of HIVE are used for research activities and publications in NGS analytics, cancer research, microbiome research and in educational programs for students at GWU. Commercial enterprises use HIVE for oncology, microbiology, vaccine manufacturing, gene editing, healthcare-IT, harmonization of real-world data, in preclinical research and clinical studies.

Sanjay Ghemawat is an Indian American computer scientist and software engineer. He is currently a Senior Fellow at Google in the Systems Infrastructure Group. Ghemawat's work at Google, much of it in close collaboration with Jeff Dean, has included big data processing model MapReduce, the Google File System, and databases Bigtable and Spanner. Wired have described him as one of the "most important software engineers of the internet age".