Processor technology had moved from single processor to
multiprocessor system with two variants either many single processors connected
together or many processors embedded to a single chip popularly known as
Chip-Multiprocessors (CMP). I am sure the system in which you are viewing this
blog has CMP architecture. The former variant which is generally found in
supercomputers has a connection between many individual processors via some
interconnection network and making them to communicate for executing a particular
task. These two forms of processor classes form the base for High Performance
Computing. The two architectures greatly differ in the way they perform and
also in the view presented to the programmer for performing their task giving
rise to a new form of computing known as parallel computing or parallel
programming which is the root power of High Performance Systems. The way you
program so far like in C, C++, Java etc are sequential programming model where
you will think of a single processing system with memory around it feeding and
getting data to and fro from the processing unit. Shocking news is that the world
is slowly moving to parallel models of programming in which the programmer
should think about many processing units sharing a memory system or having the
memory system distributed among all processing units according to the
aforementioned two variants of processor collaboration. So get ready to face
this shift.
Parallel
Programming:
Parallel computing is a form of computation in which many calculations
are carried out simultaneously, operating on the principle that large problems
can often be divided into smaller ones, which are then solved concurrently ("in parallel") since there are many processing
unit to work on many problems. There are
several different forms of parallel computing: bit-level, instruction
level, data, and task
parallelism. Parallelism has been employed for many
years, mainly in high-performance
computing, but interest in it has grown lately due to
the physical constraints preventing frequency
scaling. As power consumption (and consequently heat
generation) by computers has become a concern in recent years, parallel
computing has become the dominant paradigm in computer
architecture, mainly in the form of multicore processors.
Types of
Parallelism:
Bit-level parallelism:
Word Size (your processor labelled as 32 bit
or 64 bit) is the amount of the
amount of information the processor can manipulate per cycle and it has a very
great implication on the speed of the processor. Increasing the word size
correspondingly reduces the number of the instructions to be executed to
complete a task. For example, where an 8-bit processor
must add two 16-bit integers, the processor must first add the 8 lower-order bits
from each integer using the standard addition instruction, then add the
8 higher-order bits using an add-with-carry instruction and the carry bit from the lower order addition; thus, an 8-bit processor
requires two instructions to complete a single operation, where a 16-bit
processor would be able to complete the operation with a single instruction.
Instruction-level parallelism:
A computer program is, in essence, a stream of
instructions executed by a processor. These instructions can be re-ordered and combined into groups which are then executed in
parallel without changing the result of the program. This is known as
instruction-level parallelism. Advances in instruction-level parallelism
dominated computer architecture from the mid-1980s until the mid-1990s.
Instruction-level parallelism is realized by pipelines in the processor
architecture. Think of the pipeline at an automobile manufacturing site. At a
particular instance the number of instructions in execution will be equal to
the number of pipeline stages the single-issue processor architecture has,
issuing one instruction per cycle. On a double-issue processor, this number
will be double the number of pipeline stages and so on. Processors having the
capability of issuing more than one instruction per clock cycle are known as
superscalar processors.
Data parallelism:
Data parallelism is parallelism inherent in program loops, which focuses on distributing the data across
different computing nodes to be processed in parallel. Parallelizing loops
often leads to similar (not necessarily identical) operation sequences or
functions being performed on elements of a large data structure. Many
scientific and engineering applications exhibit data parallelism.
Task
parallelism:
Task parallelism is the characteristic of a parallel program
that "entirely different calculations can be performed on either the same
or different sets of data". This contrasts with data parallelism, where
the same calculation is performed on the same or different sets of data. Task
parallelism does not usually scale with the size of a problem.
Parallel Computer
Classes:
Keep in mind the classification given below are not mutually
exclusive.
Multicore computing:
A multicore processor is a processor that includes multiple execution units ("cores") on the same chip. These processors
differ from superscalar processors, which can issue multiple instructions per
cycle from one instruction stream (thread); in contrast, a multicore processor
can issue multiple instructions per cycle from multiple instruction streams.
Each core in a multicore processor can potentially be superscalar as well—that
is, on every cycle, each core can issue multiple instructions from one
instruction stream. Simultaneous multithreading (of which Intel's HyperThreading is the best known) was an early form of
pseudo-multicoreism. A processor capable of simultaneous multithreading has
only one execution unit ("core"), but when that execution unit is
idling (such as during a cache miss),
it uses that execution unit to process a second thread. IBM's Cell microprocessor, designed for use in the Sony PlayStation 3, is another prominent multicore processor.
Symmetric multiprocessing:
A symmetric multiprocessor (SMP) is a computer system with multiple
identical processors that share memory and connect via a bus. Bus contention
prevents bus architectures from scaling. As a result, SMPs generally do not
comprise more than 32 processors. "Because of the small size of the
processors and the significant reduction in the requirements for bus bandwidth
achieved by large caches, such symmetric multiprocessors are extremely
cost-effective, provided that a sufficient amount of memory bandwidth exists."
Distributed computing:
A distributed
computer (also known as a distributed memory multiprocessor) is a distributed
memory computer system in which the processing elements are connected by a
network. Distributed computers are highly scalable.
Cluster computing
A cluster is a group of loosely coupled computers that work together
closely, so that in some respects they can be regarded as a single computer. Clusters
are composed of multiple standalone machines connected by a network. While
machines in a cluster do not have to be symmetric, load balancing is more difficult if
they are not. The most common type of cluster is the Beowulf cluster, which is a cluster implemented
on multiple identical commercial off-the-shelf computers
connected with a TCP/IP Ethernet local area network. Beowulf technology was
originally developed by Thomas Sterling and Donald Becker.
The vast majority of the TOP500 supercomputers are clusters.
Massive parallel
processing
A massively parallel processor (MPP) is a single computer with many
networked processors. MPPs have many of the same characteristics as clusters,
but MPPs have specialized interconnect networks (whereas clusters use commodity
hardware for networking). MPPs also tend to be larger than clusters, typically
having "far more" than 100 processors. In an MPP, "each CPU
contains its own memory and copy of the operating system and application. Each
subsystem communicates with the others via a high-speed interconnect.”
Blue Gene/L, the fifth fastest
supercomputer in the world according to the June 2009 TOP500 ranking, is an MPP.
Grid computing
Distributed
computing is the most distributed form of parallel computing. It makes use of
computers communicating over the Internet to
work on a given problem. Because of the low bandwidth and extremely high
latency available on the Internet, distributed computing typically deals only
with embarrassingly parallel problems. Many distributed computing
applications have been created, of which SETI@home and Folding@home are
the best-known examples. Most grid computing applications use middleware,
software that sits between the operating system and the application to manage
network resources and standardize the software interface. The most common
distributed computing middleware is the Berkeley Open
Infrastructure for Network Computing (BOINC). Often,
distributed computing software makes use of "spare cycles",
performing computations at times when a computer is idling.