Connection Machine - University of Virginia

wonderfuldistinctAI and Robotics

Oct 16, 2013 (4 years and 22 days ago)

94 views

Connection Machine

Architecture

Greg Faust, Mike Gibson, Sal
Valente

CS
-
6354 Computer Architecture

Fall 2009


1

Historic Timeline


1981: MIT AI
-
Lab Technical Memo on CM


1982: Thinking Machines Inc. Founded


1985: Danny
Hillis

wins ACM “Best PhD” Award


1986: CM
-
1 Ships


1987: CM
-
2 Ships


1991: CM
-
5 Announced


1991: CM
-
5 Ships


1994: TMI Chapter 11


Sun/Oracle pick bones


Heavily DARPA funded/backed

$16M+ Direct Contracts plus subsidized CM sales




2

Involved Notables


Danny
Hillis



CM inventor and TMI Founder


C
harles
Leiserson



Fat tree inventor


Richard Feynman


Noble Prize winning Physicist


Marvin
Minsky



MIT AI Lab “Visionary”


Guy Steele


Common Lisp, Grace Hopper Award


Stephen Wolfram


Mathematica

inventor


Doug
Lenat



Mind/Body problem philosopher


Greg Papadopoulos


MIT Media lab, Sun CTO


v
arious others

3

CM
-
1 and CM
-
2 Architecture


Original design goal to support neuron like simulations


U
p to 64K single bit processors (actually 3 bits in and 2 out)


16 Processors/chip, 32chips/PCB, 16 PCBs/cube, 8cubes/hypercube


Hypercube architecture


Each 16
-
Proc chip a hyper
-
node


Each proc has 4K bits of bit addressable RAM


Distributed Physical Memory


Global Memory Addresses


Up to 4 front
-
end computers talk to sequencers via 4x4 crossbar


“Sequencers” issue SIMD instructions over a Broadcast Network


Bit
procs

communicate via 2D local HW grid connections (“NEWS”)


Bit
procs

communicate via hypercube network using MSG passing


Lots of Twinkling
L
ights!!


4

CM
-
1 CM
-
2 Architecture

5

CM
-
1 and CM
-
2 Programmin
g



ISA supports:


Bit
-
oriented operations


Arbitrary precision multi
-
bit scalar Ops

using bit
-
serial implementation on bit
procs


Full Multi
-
Dimensional Vector Ops


“Virtual Processor” idea similar to CUDA threads

but they are statically allocated


OS and Programming Tools run on front
-
ends


*Lisp as the initial programming language


Later C* and CM
-
Fortran


6

CM
-
2 Improvements


1
Weitek

IEEE FP coprocessor per 32 1
-
bit
procs


Up to 256K bits of memory per processor


Added ECC to Memory


Implemented the IO subsystem


Up to 80
GByte

RAID array called “Data Vault”

uses 39 Striped Disks and ECC, plus spare disks on standby


High Speed Graphics Output


En
-
route MSG combining in H
-
Cube router


New implementation of Multi
-
Dimensional

NEWS on top of H
-
Cube (special addressing mode)


7

CM
-
1 Photo

8

CM
-
5
vs

CM
-
1 and CM
-
2


Significant departure from CM
-
1 and CM
-
2


Targeted at more scientific and business applications


More Commercial Off
-
The
-
Shelf components (“COTS”)


Large Array of SPARC Processing Nodes


1
-
bit processors are abandoned


Abandoned “NEWS” Grid and Hyper
-
Cube Networks


Delivered 1024 node machine,

with claims 16K nodes possible


Even More Twinkling Lights!

9

CM
-
5 Photo


Watch it Blink

10

CM
-
5 Overall Architecture


"Coordinated Homogeneous Array

of RISC Processors“ or “CHARM”


Asymmetric
CoProcessors

Model


Large Array of Processor Nodes


Small Collection of Control Nodes


2 Separate scalable networks


One for data


One for control and synchronization


Still uses striped RAID for high disk
BandWidth

11

Division of Labor


Processor Nodes can be assigned to a “Partition”


One Control Node per Partition


Control Node runs scalar code,

then broadcasts parallel work to Processor Nodes


Processor Nodes receive a program,

not an instruction stream, have own Program Counter


Processor nodes can access other node's memory by
reading or writing a global memory address


Processor Nodes also communicate via MSG passing


Processor Nodes cannot issue system calls

12

Control Nodes


Full Sun Workstations


Running UNIX


Connected to the “Outside World”


Handles Partition Time Sharing


Connected to both data and control networks


Performs System Diagnostics

13

Processor Nodes


Nodes are a 5
-
chip microprocessor


Off the Shelf SPARC processor @ 40 MHz


32MBytes local node memory


M
ulti
-
port memory controller for added BW


“Caching techniques do not perform as
well on large parallel machines”


P
roprietary 4
-
FPU Vector coprocessor


Proprietary network controller

14

CM
-
5 Processor Node Diagram

15

Data Network Architecture


Point to Point Inter
-
node communication and I/O


Implemented as a Fat Tree


Fat Trees invented by TMI employee Charles
Leiserson


Claim: Onsite
BandWidth

Expandable


Delivering 5GB/sec Bisection BW on 1024 node machine


Data router chip is a 8x8 crossbar switch


Faulty nodes are mapped out of network


Programs can not assume a network topology


Network can be flushed when Time Share swaps occur


Network, not processors, guarantee end to end delivery

16

Fat Tree Structure

17

Separate Control Network


Synchronization & control network


Complete Binary Tree organization


Provides broadcast capability


Implements barrier operations


Implements interrupts for timesharing


Performs reduction operators

(Sum, Max, AND, OR, Count, etc)



18

CM
-
5 Programming


Supports multiple Parallel High Level Languages

and Programming Styles


Including Data Parallel Model from CM
-
1 and CM
-
2


Goal: Hide many decisions from programmers


CM
-
1, CM
-
2
vs

CM
-
5 ISA changes


Use of Processor Node CPU
vs

Vector
CoProcessors


Partition Wide Synchronizations generate by Compiler


Is it MIMD, SPMD, SIMD?


“Globally Synchronized MIMD”

19

Sample CM Apps


Machine Learning


Neural Nets, concept clustering, genetic algorithms


VLSI Design


Geophysics (Oil Exploration), Plate Tectonics


Particle Simulation


Fluid Flow Simulation


Computer Vision


Computer Graphics , Animation


Protein Sequence Matching


Global Climate Model Simulation



20

References


Danny
Hillis

PhD:
The Connection Machine


Inc:
The Rise and Fall of Thinking Machines


Wiki:
Connection Machine


ACM:
The
CM
-
5 Connection
Machine



ACM:
The Network Architecture of the CM
-
5


IEEE:
Architecture and Applications of the Connection
Machine


IEEE:
Fat
-
trees: universal networks for hardware
-
efficient
supercomputing


Encyclopedia of Computer Science and Technology


21