Today Supercomputing is considered to be an important backbone of
almost all scientific domains without which the world you see out would just be a dream. From where did the roots of such exemplary power rose?
To read the history of supercomputing anyone should go back to the 1960s when a legendary man named Saymour Cray lived
with his unquenchable thirst of designing extreme powered computers. He was
affectionately known as the ‘Father of Supercomputing’. The CDC (Control Data Corporation) 6600,
released in 1964, is generally considered the first supercomputer was one of
his innovations.
The
Beginning:
As
said above the beginning of supercomputing era appeared around 1960 by which
the world saw the release of one its biggest dreams ‘the CDC 6600’. Cray completed the CDC 1604, the
first solid state computer,
and the fastest computer in the world at a time when vacuum tubes were
found in most large computers, in the year 1960.
The term solid state says that the computer is made from semiconductors. This term is used in order to represent the transition of computing system from using vacuum tubes to semiconductor materials.
Around 1960 Cray decided to design a computer that would be the
fastest in the world to a greater extent than 1604. After four years of
experimentation along with Jim Thornton, and Dean Roush and about 30 other
engineers Cray completed the CDC 6600 in 1964. Given that
the 6600 outran all computers of the time by about 10 times, it was dubbed a supercomputer and defined the
supercomputing market when one hundred computers were sold at $8 million each.
The 6600 gained speed by "farming out" work to peripheral computing
elements, freeing the CPU (Central Processing Unit) to process actual data. The
Minnesota FORTRAN compiler for the machine was developed by Liddiard and Mundstock
at the University of Minnesota and with it the 6600 could sustain 500 KFLOPS on standard
mathematical operations. In 1968 Cray completed the CDC 7600, again the fastest computer in the world. At 36 MHz, the 7600 had about three and a half times the clock speed of the 6600, but ran significantly faster due to other technical
innovations. Cray left CDC in 1972 to form his own company. Two years after his
departure CDC delivered the STAR-100 which at 100 megaflops
was three times the speed of the 7600. Along with the Texas Instruments ASC, the STAR-100 was one
of the first machines to use vector processing - the idea having been
inspired around 1964 by the APL programming
language.
The CRAY Era:
Around 1976, Cray
delivered the 80 MHz Cray 1 , and it became one of
the most successful supercomputers in history. The Cray 1 was a vector processor which introduced a number of innovations such as chaining in which scalar and vector registers generate interim results
which can be used immediately, without additional memory references which
reduce computational speed.
Chaining is a technique used in computer architecture in which scalar and vector registers generate intermediate results which can be used immediately, without additional memory references which reduce computational speed.
A vector processor also known as array processor, is a central processing unit (CPU) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items.
In 1982, a 105 MHz shared-memory parallel vector
processor ‘The Cray X-MP’ was released, with better chaining support and multiple
memory pipelines (a concept of overlapping the execution of instructions). All
three floating point pipelines on the XMP could operate simultaneously. The Cray-2 released in 1985 was a 4 processor liquid cooled computer and Fluorinert was pumped
through it as it operated. It could perform to 1.9 gigaflops and was the
world's fastest until 1990 when ETA-10G from CDC overtook it.
The Cray 2 was a totally new design and did not use chaining and had high
memory latency, but used much pipelining and was ideal for problems that
required large amounts of memory. The software costs in
developing a supercomputer should not be underestimated, as evidenced by the
fact that in the 1980s the cost for software development at Cray came to equal
what was spent on hardware. That trend was partly responsible for a move away
from the in-house, Cray Operating System to UNICOS (UNIx based Cray Operating System) based on Unix. The Cray Y-MP, also designed by Steve Chen, was released in 1988 as an
improvement of the XMP and could have eight vector processors at 167 MHz with a peak performance of 333 megaflops per processor.
In the late 1980s, Cray's
experiment on the use of gallium arsenide semiconductors in the Cray-3 did not succeed. Cray began to work on a massively parallel computer in the early
1990s, but died in a car accident in 1996 before it could be completed.
The Massive Processing
Era:
The Cray-2 which
set the frontiers of supercomputing in the mid to late 1980s had only 8
processors. In the 1990s, supercomputers with thousands of processors began to
appear. Another development at the end of the 1980s was the arrival of Japanese
supercomputers, some of which were modelled after the Cray-1. The SX-3/44R was announced by NEC Corporation in 1989 and a year later earned the fastest in the world title
with a 4 processor model. However, Fujitsu's Numerical Wind Tunnel supercomputer used 166 vector processors to gain the top spot in
1994. It had a peak speed of 1.7 gigaflops per processor. The Hitachi SR2201 on the other obtained
a peak performance of 600 gigaflops in 1996 by using 2048 processors connected
via a fast three dimensional crossbar network. In the same
timeframe the Intel Paragon could have 1000 to 4000 Intel i860 processors in various
configurations, and was ranked the fastest in the world in 1993. The Paragon
was a MIMD (Multiple Instruction Multiple Data) machine which
connected processors via a high speed two dimensional mesh, allowing processes
to execute on separate nodes; communicating via the Message Passing Interface (technique used to pass data between
processors). By 1995 Cray was also
shipping massively parallel systems, e.g. the Cray T3E with over 2,000 processors, using a three dimensional torus interconnect.
Interconnect with its various form like mesh, torus etc are the way of connecting the various number of processors as a network of processors having communications between them.
The Paragon
architecture soon led to the Intel ASCI Red supercomputer which
held the top supercomputing spot to the end of the 20th century as part of the Advanced Simulation and Computing Initiative. This was also a mesh-based MIMD massively-parallel system with
over 9,000 compute nodes and well over 12 terabytes of disk storage, but used
off-the-shelf Pentium Pro processors that could be found in everyday personal computers.
ASCI Red was the first system ever to break through the 1 teraflop barrier on the MP-Linpack benchmark in 1996;
eventually reaching 2 teraflops.
The PETAFLOP Computing
Era:
The 21st century saw a
significant progress and it was shown that the power of a large number of small
processors can be harnessed to achieve high performance, e.g. as in System X's use of 1,100 Apple Power Mac G5 computers quickly assembled in the summer of 2003 to gain 12.25
Teraflops. The efficiency of supercomputers continued to increase, but not
dramatically so. The Cray C90 used 500 kilowatts of
power in 1991, while by 2003 the ASCI Q used 3,000 kW while
being 2,000 times faster, increasing the performance by watt 300 fold. In 2004
the Earth Simulator supercomputer built by NEC at the Japan Agency
for Marine-Earth Science and Technology (JAMSTEC) reached 131 teraflops, using
640 nodes, each with eight proprietary vector processing chips. The IBM Blue Gene supercomputer
architecture found widespread use in the early part of the 21st century, and 27
of the computers on the TOP500 list used that architecture. The Blue Gene approach is somewhat
different in that it trades processor speed for low power consumption so that a
larger number of processors can be used at air cooled temperatures. It can use
over 60,000 processors, with 2048 processors "per rack", and connects
them via a three-dimensional torus interconnect. Progress in China has been rapid, in that China placed 51st on the TOP500 list in June 2003, then 14th in November 2003 and 10th in June
2004 and then 5th during 2005, before gaining the top spot in 2010 with the 2.5
petaflop Tianhel supercomputer. In July 2011 the 8.1 petaflop Japanese K computers became the fastest in
the world using over 60,000 commercial scalar SPARC64 VIIIfx processors housed in
over 600 cabinets. The fact that K computer is over 60 times
faster than the Earth Simulator, and that the Earth Simulator ranks as the 68th
system in the world 7 years after holding the top spot demonstrates both the
rapid increase in top performance and the widespread growth of supercomputing
technology worldwide.
This is a list of the computers which appeared at the top of
the Top500 list since 1993.
Year
|
Supercomputer
|
Peak speed
|
Location
|
1993
|
Fujitsu Numerical
Wind Tunnel
|
124.50 GFLOPS
|
National Aerospace Laboratory, Tokyo, Japan
|
1993
|
Intel Paragon XP/S 140
|
143.40 GFLOPS
|
DoE-Sandia National Laboratories, New Mexico, USA
|
1994
|
Fujitsu Numerical
Wind Tunnel
|
170.40 GFLOPS
|
National Aerospace Laboratory, Tokyo, Japan
|
1996
|
Hitachi SR2201/1024
|
220.4 GFLOPS
|
University of Tokyo, Japan
|
Hitachi CP-PACS/2048
|
368.2 GFLOPS
|
University of Tsukuba, Tsukuba, Japan
|
|
1997
|
Intel ASCI Red/9152
|
1.338 TFLOPS
|
DoE-Sandia National Laboratories, New Mexico, USA
|
1999
|
Intel ASCI Red/9632
|
2.3796 TFLOPS
|
|
2000
|
IBM ASCI White
|
7.226 TFLOPS
|
DoE-Lawrence Livermore National Laboratory, California, USA
|
2002
|
NEC Earth Simulator
|
35.86 TFLOPS
|
Earth Simulator Center, Yokohama, Japan
|
2004
|
IBM Blue Gene/L
|
70.72 TFLOPS
|
|
2005
|
136.8 TFLOPS
|
DoE/U.S. National
Nuclear Security Administration,
Lawrence Livermore National Laboratory, California, USA |
|
280.6 TFLOPS
|
|||
2007
|
478.2 TFLOPS
|
||
2008
|
IBM Roadrunner
|
1.026 PFLOPS
|
DoE-Los Alamos National Laboratory, New Mexico, USA
|
1.105 PFLOPS
|
|||
2009
|
Cray Jaguar
|
1.759 PFLOPS
|
DoE-Oak Ridge National Laboratory, Tennessee, USA
|
2010
|
2.566 PFLOPS
|
National Supercomputing Center, Tianjin, China
|
|
2011
|
10.51 PFLOPS
|
RIKEN, Kobe, Japan
|
|
2012
|
IBM Sequoia
|
16.32 PFLOPS
|
Lawrence Livermore National Laboratory, California, USA
|