Plurality (company)

Last updated
Plurality Ltd.
Company typePrivate
Industry Semiconductors
Founded2004 [1]
Headquarters Israel
Products Multi-core
Website Plurality.com

Plurality Ltd. is an Israeli semiconductor company, the developer of the HyperCore technology and the HAL (HyperCore Architecture Line) multi-core processor. The company is a member of the Multicore Association. [2]

Contents

HyperCore

Plurality develops the HyperCore CPU technology, which is a MIMD 32-bit RISC based multi-processor on a single chip, [3] and contains from 16 to 256 cores. [4] HyperCore technology supports executing both fine-grained and coarse-grain parallelism due to its special hardware Synchronizer/Scheduler, shared memory and task based programming model.

The HyperCore technology's synchronizer/scheduler (patented, [5] see below also) eliminates the need of repeatedly executing a special kernel program controlling and deciding which task (or thread) to currently assign and execute on a given processor. The ability to synchronize tasks in hardware allows the processor to support fine-grained programs and to achieve almost a linear speedup. Fine grained programs can only be executed when tasks' duration is significantly shorter than the overhead time introduced by the scheduler. The HyperCore's shared memory (patent pending) avoid the coherency problem and keep a single memory space for all cores in the system thus simplifying the programming model significantly.

Patents

Synchronizer/scheduler

Dr. Nimrod Bayer and Dr. Ran Ginosar, two of Plurality's founders, received United States Patent 5202987 (“A High Flow-Rate Synchronizer/Scheduler for Multiprocessors”) for the company's core technology on April 13, 1993. The patent has been cited by more than 30 subsequent patents. The patent abstract is as follows:

"A high flow rate synchronizer/scheduler apparatus for a multiprocessor system during program run-time, comprises a connection matrix for monitoring and detecting computational tasks which are allowed for execution containing a task map and a network of nodes for distributing to the processors information or computational tasks detected to be enabled by the connection matrix. The network of nodes possesses the capability of decomposing information on a pack of allocated computational tasks into messages of finer sub-packs to be sent towards the processors, as well as the capability of unifying packs of information on termination of computational tasks into a more comprehensive pack. A method of performing the synchronization/scheduling in the multiprocessor system of this apparatus is also described."

See also

Related Research Articles

<span class="mw-page-title-main">Amdahl's law</span> Formula in computer architecture

In computer architecture, Amdahl's law is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It states that "the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used". It is named after computer scientist Gene Amdahl, and was presented at the American Federation of Information Processing Societies (AFIPS) Spring Joint Computer Conference in 1967.

<span class="mw-page-title-main">Thread (computing)</span> Smallest sequence of programmed instructions that can be managed independently by a scheduler

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. In many cases, a thread is a component of a process.

<span class="mw-page-title-main">Superscalar processor</span> CPU that implements instruction-level parallelism within a single processor

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor, but an execution resource within a single CPU such as an arithmetic logic unit.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern processor architectures.

<span class="mw-page-title-main">OpenMP</span> Open standard for parallelizing

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

In computing, single program, multiple data (SPMD) is a term that has been used to refer to computational models for exploiting parallelism where-by multiple processors cooperate in the execution of a program in order to obtain results faster.

Automatic parallelization, also auto parallelization, or autoparallelization refers to converting sequential code into multi-threaded and/or vectorized code in order to use multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine. Fully automatic parallelization of sequential programs is a challenge because it requires complex program analysis and the best approach may depend upon parameter values that are not known at compilation time.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

<span class="mw-page-title-main">Binary Modular Dataflow Machine</span>

Binary Modular Dataflow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications. BMDFM automatically identifies and exploits parallelism due to the static and mainly dynamic scheduling of the dataflow instruction sequences derived from the formerly sequential program.

<span class="mw-page-title-main">Multi-core processor</span> Microprocessor with more than one processing unit

A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core.

<span class="mw-page-title-main">Gustafson's law</span> Theoretical speedup formula in computer architecture

In computer architecture, Gustafson's law gives the speedup in the execution time of a task that theoretically gains from parallel computing, using a hypothetical run of the task on a single-core machine as the baseline. To put it another way, it is the theoretical "slowdown" of an already parallelized task if running on a serial machine. It is named after computer scientist John L. Gustafson and his colleague Edwin H. Barsis, and was presented in the article Reevaluating Amdahl's Law in 1988.

<span class="mw-page-title-main">Data parallelism</span> Parallelization across multiple processors in parallel computing environments

Data parallelism is parallelization across multiple processors in parallel computing environments. It focuses on distributing the data across different nodes, which operate on the data in parallel. It can be applied on regular data structures like arrays and matrices by working on each element in parallel. It contrasts to task parallelism as another form of parallelism.

Task parallelism is a form of parallelization of computer code across multiple processors in parallel computing environments. Task parallelism focuses on distributing tasks—concurrently performed by processes or threads—across different processors. In contrast to data parallelism which involves running the same task on different components of data, task parallelism is distinguished by running many different tasks at the same time on the same data. A common type of task parallelism is pipelining, which consists of moving a single set of data through a series of separate tasks where each task can execute independently of the others.

oneAPI Threading Building Blocks, is a C++ template library developed by Intel for parallel programming on multi-core processors. Using TBB, a computation is broken down into tasks that can run in parallel. The library manages and schedules threads to execute these tasks.

Explicit Multi-Threading (XMT) is a computer science paradigm for building and programming parallel computers designed around the parallel random-access machine (PRAM) parallel computational model. A more direct explanation of XMT starts with the rudimentary abstraction that made serial computing simple: that any single instruction available for execution in a serial program executes immediately. A consequence of this abstraction is a step-by-step (inductive) explication of the instruction available next for execution. The rudimentary parallel abstraction behind XMT, dubbed Immediate Concurrent Execution (ICE) in Vishkin (2011), is that indefinitely many instructions available for concurrent execution execute immediately. A consequence of ICE is a step-by-step (inductive) explication of the instructions available next for concurrent execution. Moving beyond the serial von Neumann computer, the aspiration of XMT is that computer science will again be able to augment mathematical induction with a simple one-line computing abstraction.

<span class="mw-page-title-main">Kunle Olukotun</span> British-born Nigerian computer scientist

Oyekunle Ayinde "Kunle" Olukotun is a British-born Nigerian computer scientist who is the Cadence Design Systems Professor of the Stanford School of Engineering, Professor of Electrical Engineering and Computer Science at Stanford University and the director of the Stanford Pervasive Parallelism Lab. Olukotun is known as the “father of the multi-core processor”, and the leader of the Stanford Hydra Chip Multiprocessor research project. Olukotun's achievements include designing the first general-purpose multi-core CPU, innovating single-chip multiprocessor and multi-threaded processor design, and pioneering multicore CPUs and GPUs, transactional memory technology and domain-specific languages programming models. Olukotun's research interests include computer architecture, parallel programming environments and scalable parallel systems, domain specific languages and high-level compilers.

For several years parallel hardware was only available for distributed computing but recently it is becoming available for the low end computers as well. Hence it has become inevitable for software programmers to start writing parallel applications. It is quite natural for programmers to think sequentially and hence they are less acquainted with writing multi-threaded or parallel processing applications. Parallel programming requires handling various issues such as synchronization and deadlock avoidance. Programmers require added expertise for writing such applications apart from their expertise in the application domain. Hence programmers prefer to write sequential code and most of the popular programming languages support it. This allows them to concentrate more on the application. Therefore, there is a need to convert such sequential applications to parallel applications with the help of automated tools. The need is also non-trivial because large amount of legacy code written over the past few decades needs to be reused and parallelized.

In parallel computing, work stealing is a scheduling strategy for multithreaded computer programs. It solves the problem of executing a dynamically multithreaded computation, one that can "spawn" new threads of execution, on a statically multithreaded computer, with a fixed number of processors. It does so efficiently in terms of execution time, memory usage, and inter-processor communication.

In parallel computing, granularity of a task is a measure of the amount of work which is performed by that task.

References

  1. Plurality's Profile
  2. "Multicore Association Members List". multicore-association.org.{{cite web}}: CS1 maint: url-status (link)
  3. "Massively parallel processors for DSP". www.informa.com.{{cite web}}: CS1 maint: url-status (link)
  4. Gilad, Assaf (2008-07-29). "בפלורליטי מבטיחים לשבור את חוק מור" [Plurality promises to break Moore's Law]. www.calcalist.co.il (in Hebrew). Retrieved 2024-03-27.
  5. "Plurality's Hypercore Joins the Multi-Core Fray". www.insidedsp.com.{{cite web}}: CS1 maint: url-status (link)