Intel Itanium Architecture

buninnateSoftware and s/w Development

Nov 18, 2013 (3 years and 8 months ago)

93 views

--

Satya P. Vedula

Intel
©


Itanium
TM

Architecture

Intel


Itanium Architecture

1.
History

2.
Introduction

3.
Block Diagram

4.
Pipeline

5.
Register Set

6.
Instruction Set

7.
EPIC

8.
x86 Compatibility

9.
Database on Itanium

10.
Security & Itanium

11.
Itanium and Java

12.
Itanium and Win64



Agenda

Intel


Itanium Architecture

History

1978
-
81

1984

8086/8088

80286

1987
-
88

80386 DX/SX

1990
-
92

80486 SX/DX

1993
-
95

Pentium

1997

Pentium MMX

1

2

3

4

5

5+

29k

134k

275k

1.2M

3.1M

4.5M

Transistors

FPU

8087

80287

80387

None/
built
-
In

built
-
in

built
-
in

Cache

8k


L1

16k L1

32k L1

Generation

Intel


Itanium Architecture

1995

Pentium Pro

1997

Pentium II

2001

Itanium

1999

Pentium III

2001

Pentium 4

1997

Mobile Pentium

6

6+

7

5.5M


7.5M

27.4M

9.3M

42M

25M

Transistors

8

Cache

16k L1

512k L2

32k


L1

96k


L2

4M


L3

32k L1

History contd..

Generation

Intel


Itanium Architecture

The Intel
®

Itanium
TM

processor is the first in a family of processors
based on the new Itanium architecture.

Introduction
-

Itanium


Explicitly Parallel Instruction Computing (EPIC) technology enables up to 20
operations/clock.


Three levels of cache reduce memory latency: 2MB or 4MB Level 3 cache, 96K Level
2 cache, and 32K Level 1 cache.


Operating frequencies of 733MHz and 800MHz.


266MHz data bus enables fast system bus transactions with 2.1 GB/sec bandwidth.



Advanced error detection, correction and containment provided by Machine Check
Architecture (MCA), comprehensive error logging, and Error Correcting Code (ECC)
on caches and the system bus.


IA
-
32 instruction binary compatibility in hardware.


6.4 giga flops at peak performance

Product Highlights

Intel


Itanium Architecture

2. Block Diagram






Complex block diagram

Simple block diagram

Intel


Itanium Architecture


Itanium


10 stages


Pentium III
-

12
-
stages


Alpha 21264


8 stages


Pentium 4
-

20 stages


Athlon
-

10 stages


Pipeline

10 stage In
-
Order pipeline

Comparison with others

Intel


Itanium Architecture

general
-
purpose integer registers (each 64 bits wide),
-

128

floating
-
point registers (each 82 bits wide),
-

128

1
-
bit predicate registers
-

64

branch registers
-

8







Register Set

Each task can have individual
set of registers

Intel


Itanium Architecture

Instructions are 41 bits long.

It takes 7 bits to specify one of 128 GPR

2 source
-
operand fields and a destination field = 21 bits


Predication = 6 bits (64 combination)

1 Bundles = 128 bits (Instructions are given in bundles)

three 41
-
bit instructions (making 123 bits), plus one 5
-
bit template

Instruction categories = 4

integer, load/store, floating
-
point, and branch operations.


Instruction Set

Intel


Itanium Architecture

-
Conditional (predicated) execution

-
hinted and speculative loads

(LD.A


Load Advanced, uses special buffer ALAT)

-
64 free
-
form predicate bits

(Earlier Chips have (zero), V (overflow), S (sign), and N (negative) flags )

-
One conditional branch with 64 predicate bits

-
VLIW features

-
Groups of independent instructions

-
Simple hardware

-
Exploit Instruction Level Parallelism (ILP) with Compiler

EPIC

EPIC: Explicitly Parallel Instruction Computing

It is a combination of features from RISC and VLIW

Advantages

-
Large increase in code size

-
Blocking caches

Disadvantages

Intel


Itanium Architecture

1. Compare x to 4

2. If not equal go to line 5

3. z = 9

4. go to line 6

5. z = 0

6. // Program continues from here


if (x == 4) z = 9

else z = 0;

1. Compare x to 4 and store result
in a predicate bit (we'll call it A)

2. If A==1; z = 9

3. If A==0; z = 0


EPIC


Power to Compilers

C source code:

Compiled on Pentium

32
-
bit compiled code

64
-
bit compiled code

Compiled on Itanium

Intel


Itanium Architecture

Data Speculation

A sequence of instructions which consist of an advanced load, zero or more instructions
dependent on the value of that load, and a check instruction

Code speculation

It is a Compiler Concept.

An instruction or a sequence of instructions is executed before it is known that
the dynamic control flow of the program will actually reach the point in the
program where the sequence of instructions is needed

Prediction

Preprocessing

1) Register use, 2) Loop optimization, 3) Instruction execution order,
and 4) logical program layout

Branch prediction now given to Programmers. For dynamic runtime branch
prediction

EPIC Features

Intel


Itanium Architecture

-
Complexity shifts to compilers

-
Methods to express compile time information

-
Optimized FPUs for multimedia applications

-
Reliability and performance


server side

Compiler advantages

EPIC Features contd..

Intel


Itanium Architecture

-

Supports all x86 instructions including MMX, SSE (not SSE2),


Protected, Virtual 8086, and Real mode features

-

Run entire OS in x86 mode, or run the applications under


a new IA
-
64 OS.

-

X86 compatible registers: AR24 through AR31

-

JMPE: Switch instruction to switch between x86 and new mode

x86 compatibility

x86


Register compatibility

Intel


Itanium Architecture






Transistors: 325 million

Processor chip: 25 million

(including L1 and L2 caches)

each of the four L3 cache: 75 million



Pentium III : 24 million

Pentium 4: 42 million

Itanium Code: 2x Pentium (estimated)


30% more than other RISC

How does it looks like?

Intel


Itanium Architecture

Itanium
-

anatomy

Intel


Itanium Architecture

Photograph of Alpha 21264

Slot B module

UltraSPARC
-
III chips

MIPS 20K processor

IBM Power4 module

Other 64 bit processors

Intel


Itanium Architecture

Overview of the processors

Intel


Itanium Architecture

It’s just beginning

Merced

McKinley

Madison

Deerfield

Itanium Code names

Intel


Itanium Architecture

Databases

A quantum leap

Intel


Itanium Architecture

The Coming Content “Big Bang”

2000

3B

2001

6B

2002

12B

2003

24B

40,000 BCE

cave paintings

bone tools

3500

writing

0 C.E.

paper
105

1450

printing

1870

electricity,

telephone


transistor
1947

computing
1950

Late 1960s

Internet

(DARPA)

1993

The web

1999

GIGABYTES

Source: IBM Informix Conference,


2001 Las Vegas

Databases


Storage needs Contd..

Intel


Itanium Architecture

Data Explosion!


We are in the midst of a data explosion


“The Big Bang”!


Terabytes of data


Common corporate expression


Petabytes(10^15) & Exabytes(10^18) is fast approaching


2
-
3 Exabytes = total volume of all information
generated worldwide annually


Storage capacities are growing


72 GB Hard Drive (HD) becoming industry standard


180 GB High Density HD


in production


Source: IBM Informix Conference,


2001 Las Vegas

Databases


Storage


Requirements

Intel


Itanium Architecture

The Need for Speed


Memory access speeds desired


long term


Memory latency averaging 235
-
360 nano seconds


Max = 256 GB of RAM


64 bit => 20 Exabytes addressing capabilities


Disk access speeds are the reality


near term


Disk latency averaging 3
-
4 milli seconds


4 “orders of magnitude slower”


DW tables contain Billions of rows


Light table Scan


100 byte row @ 1 GB/s


~ 9 million rows/sec


~ 540 million rows/minute


5.4 billion rows (500GB) ~ 10 minutes

Source: IBM Informix Conference,


2001 Las Vegas

Databases contd..

Intel


Itanium Architecture

Databases


Itanium advantages

64
-
bit addressing

Tens of Gigabytes to thousands of Terabytes stored in nanosecond access
main memory

eliminates millisecond disk access times thus improving application response time.

Large number of Registers and innovative register model

Data and intermediate calculations stored in on
-
chip registers reduce the repetitive load and
store of intermediate data values thus improving the response time of an application’s database
request.

Instruction set parallelism

Ability to execute instructions in parallel allows quick access simultaneously and manipulation
of data derived from multiple rows and columns of a large in
-
memory database table or tables.

Predication

Predication allows the conditional execution of instructions before it is known whether the
execution is needed. Predication allows more code to execute in parallel, the performance
penalty of branch
-
dependent code is less, and applications with heavy branching speed Up.

Intel


Itanium Architecture

Databases


Itanium advantages contd..

Control/Data Speculation

Control speculation allows certain load instructions to be scheduled before conditional branch
instructions, rather than after. Data speculation is similar to control speculation but allow loads
to be scheduled above stores. Both allow a reduction in the CPU wait states generated by
branch
-
intensive code with high latency RAM accesses thus speeding application performance.

Instruction/Data Prefetch

Instruction prefetches can be signaled on branch instructions. Data can be prefetched with
explicit prefetch instructions. Both prefetches speed application performance by reducing wait
states.

Advantages

Big databases like,

-
Data warehousing

-
Decision Support

-
Web
-
Enabled ERP


Intel


Itanium Architecture

Security

Intel


Itanium Architecture

-
Common encryption algorithms run 3
-
5 times faster

-
EPIC parallelism with register rotation makes algorithms more faster

-
Performance boost to CAD/CAE applications due to increased floating point
registers

-
Performance boost to 3d applications

-
82
-
bit floating
-
point unit offers high precision

-
RSA computations are 512
-
bits to 1024
-
bits in length

-
New Multiply
-
Add Instruction comes to aide

-
Parallelism comes to aide (2 128
-
bit computations are performed in parallel)

-
Predication eliminates branches (if) from RSA computations

-
RSA, AES, SHA
-
1 algorithms are improved, as they use only counted loops
utilizing Register Rotation

-
Vast number of registers

-
Large Physical Memory for Security Cache: Directory Services can be stored on
Memory

-
Network traffic can be encrypted

Security

Intel


Itanium Architecture

Security contd..

Performance statistics


Encryption algorithms

RSA

ECC

AES

DES

RC6

SHA

Multi
-
precision arithmetic





X

X

X

X

Multi
-
precision logical operation

X



X

X

X

X

Fixed data rotate

X



X







Variable data rotate

X



X

X



X

Integer multiplication

X



X

X



X

Sbox lookup

X

X





X



Logical Operation

X

X









Intel


Itanium Architecture

Java

Intel


Itanium Architecture

-
Garbage Collection

-
Object
-
oriented programming (OOP)

-
Byte code vs. native machine code

-
Variability of performance because of interpretation

-
Multithreaded applications

-
Java Native Interface Vs. Native Method Interface

-
Network Performance

-
Limitations with current architectures

-
EJB involves frequent invocation of method calls

-
Java needs dynamic bounds checking, null checking, exception handling

-
Java has a 64 bit integer data type


long

-
Java Object Handles (ObjId) is 64
-
bit

Java

Common Java Limitations (J2SE 1.3)

Intel


Itanium Architecture

-
Streamlined Garbage Collection reduces pause time

-
OOP: IBM Java uses Thread Local Heaps allowing variable sized thread local heaps

-
Just
-
In
-
Time compiler translates to optimized native code

-
Mixed Mode Interpreter does Selective Compilation

-
Multi
-
threading now has light weight and full power mode

-
JNI enhanced and NMI removed in Java 2

-
N/w Performance: Java Socket API overhead removed



Advantages using IBM Java2

Java Contd..

Intel


Itanium Architecture

-
Predication: Branching caused by Java technology’s bounds checking is benefited

-
Speculation: Multiway branching allows address locations and data needed for Java’s bounds
and null checks to be prefetched increasing performance

-
Instruction Parallelism: Multiple execution units run instructions concurrently increasing the
performance

-
Register Set: Smaller methods need not contend for registers as more registers are available



Advantages using Itanium

Java Contd..

Intel


Itanium Architecture

Win64

Intel


Itanium Architecture

Win64

Type Name

What it is

LONG32, INT32

32
-
bit Signed

LONG64, INT64

64
-
bit Signed

ULONG32,UNIT32,
DWORD32

32
-
bit Unsigned

ULONG64,UNIT64,
DWORD64

64
-
bit Unsigned

Type Name

What it is

INT_PTR,

LONG_PTR

Signed Int, Pointer
Precision

UINT_PTR,

ULONG_PTR

DWORD_PTR

Unsigned Int,
Pointer Precision

SIZE_T

Unsigned Count,

Pointer Precision

SSIZE_T

Signed Count,

Pointer Precision

Win64 data types

Intel


Itanium Architecture

Win64 Issues

Win64 Contd..

-

LLP64 issues

-
Porting issues (32
-
bit to 64
-
bit)

-
Polymorphic data usage

-
Pointer/length combinations

-
RPC and COM

-
Supports RPC between IA
-
32 and IA
-
64

-
Supports LocalServer style (out
-
of
-
proc) COM between IA
-
32 and IA
-
64 bit processes

-
IA
-
32 DLL cannot be loaded into 64
-
bit process

-
IA
-
64 DLL cant be loaded into 32
-
bit process

-
Use COM as out
-
of
-
proc (Solves prev 2 problems)

-
PnP should be RPCable enabled

Intel


Itanium Architecture

Questions?