IO Memory Management Hardware Goes Mainstream - Microsoft ...

reelingripehalfΛογισμικό & κατασκευή λογ/κού

14 Δεκ 2013 (πριν από 3 χρόνια και 9 μήνες)

91 εμφανίσεις

IO Memory Management
Hardware Goes Mainstream

Mark Hummel

AMD Fellow

Computation Products Group, AMD


Mark.Hummel @ amd.com

Session Overview

Benefits and Function

Topology and Features

Translation Data Structures

Software Interface


Function Of An IOMMU

What does it do?

Translates requests that come
from

all devices
regardless of target

Enforces access rights of devices to system
address space

Page granular protection

Separate read and write access rights

Maintains cache of translations

Root of distributed caching hierarchy of

address translations

Function Of An IOMMU

What does it not do?

Does not translate CPU originated traffic

The processor’s traffic is translated by the

CPU’s MMU

Does not directly support demand paged IO

Devices and drivers are not designed to deal with
arbitrary delays

Devices and drivers don’t understand concept of an
“IO Page Fault”

Support for remote address translation extensions
enable indirect device specific demand paging

System Topology

Where is the IOMMU?

HT

DRAM

IOMMU

PCI Express

devices,

switches

CPU

DRAM

HT

IOMMU

PCI, LPC, etc

HT

PCIe
bridge

CPU


Device

ATC

optional
remote ATC

Tunnel

PCIe
bridge

ATC

ATC

ATC = Address Translation Cache

HT = HyperTransport

PCIe = PCI Express

PCIe
bridge

IO Hub

System Topology

IOMMU are at edge of system

interconnection fabric

Full Source Identification is available

IOMMU are distributed and independent

Creates scalable caching structures

IOMMU supports remote address
translation caching extensions

Allow tuning of caching hierarchy


Benefits Of An IOMMU

Why have one?

Enhanced virtualization capabilities

Direct device assignment to Guest OS

Improved performance and scalability

Enables direct device access by user
mode applications

Benefits Of An IOMMU

Direct Device Assignment Example

Device Driver

Device

Controller

Virtual Machine
Monitor

Virtual Device Emulator

Device

Controller

Device Driver

Virtual Device

Driver

Guest OS

Virtual Device

Driver

Guest OS

Device

Controller

Virtual Machine
Monitor

Device

Controller

Device Driver

Guest OS

Device Driver

Guest OS

IOMMU

Overhead
reduced
in path
between
Guest
and
Device

Benefits Of An IOMMU

Why have one? (continued)

Enhanced security capabilities

Adds precise device access control of
address space

Creates IO protection domains

Enhanced system reliability

Isolation between devices

Protects system memory from errant
device writes

Benefits Of An IOMMU

Security and Isolation Example

Device

Controller

Device

Controller

System Memory

I/O Buffer

I/O Buffer

Malicious or
Errant Write

Device

Controller

Device

Controller

System Memory

I/O Buffer

IOMMU

Protection
Domain 1

I/O Buffer

Protection
Domain 2

Write is
blocked

Benefits Of An IOMMU

Why have one?

Support for Trusted Input and Output

Creates a protected channel between a
device and driver

Benefits Of An IOMMU

Trusted I/O Example

I/O Buffer

Disk

Controller


System
Memory

Graphics

Controller

Device

Driver

Application

I/O Buffer

Disk

Controller

Content is
capture by
3
rd

party

I/O

Buffer

Disk

Controller


System
Memory

Graphics

Controller

I/O

Buffer

Disk

Controller

IOMMU

3
rd

party
access is
blocked

Protected
Channels

Device

Driver

Device

Driver

Application

Device

Driver

Benefits Of An IOMMU

Why have one?

Support legacy 32
-
bit devices in
large
-
memory systems

Eliminates bounce buffers

Benefits Of An IOMMU

Bounce Buffer Example

Bounce

Buffer

Disk

Controller


System Memory

CPU

I/O

Buffer

0
-

4 GB

4 GB+

Disk

Controller


System Memory

CPU

I/O

Buffer

0
-

4 GB

4 GB+

IOMMU

Controller limited to
32 bit addressing

CPU must
move data

IOMMU translates
address so data
can be directly
placed

Benefits Of An IOMMU

Why have one?

Synergy with PCI
-
SIG virtualization efforts

Address translation service (ATS)

Single root device virtualization

Multi
-
root shared I/O fabric

IOMMU Features

Variable per
-
device virtual address range

Variable per
-
device physical page size

Flexible virtual address space sharing options

Devices can have their own virtual address space

Devices can share a virtual address space

Can be utilized natively by an enhanced

OS

Can be utilized by a virtual machine monitor

Translation Data Structures

Definitions

Requester ID (RID)

Label identifying the source of a transaction

Address Translation Cache (ATC)

Local or remote coherent copy of address
translations

I/O Translation Look aside Buffer (IOTLB)

A remote ATC that exists in a device associated

with an IOMMU

Address Translation Services (ATS)

Extensions supporting remote caching of

address translations

Translation Data Structures

Definitions

Page Directory Entry (PDE)

Translation table entry that points at a table

Page Table Entry (PTE)

Translation table entry that contains a
translation

Root translation table

Translation

table at the top of translation hierarchy

Device Table

Maps Requester ID to root translation table

Translation Data Structures

Device Requests

Contain a Requester ID

BUS/Device/Function (BDF) used for PCI Express

Unit ID or BDF (with SRC ID extension) for
HyperTransport

Extensions to support remote ATC

Un
-
translated (device virtual address) read or write

Default case

IOMMU will translate the address of the request

Translated (system physical address) read or write

IOMMU uses the address provided without translation

Translation request

Translation Data Structures

Device Table

Single contiguous block of system memory

Maps Requester ID to a root

translation table

Per device virtual address space supported

Many to one mappings supported

Each device is assigned a Domain ID

Devices may share a Domain ID

IOMMU Invalidations managed on a per
Domain basis

Translation Data Structures

Device Table

Reserved

Control Bits

127

104

103

96

Reserved

Domain ID [15:0]

95

80

79

64

Reserved

Page Table Root Pointer [51:32]

63

51

32

IR

IW

Res

62

61

60

52

Page Table Root Pointer [31:12]

Reserved

31

9

8

0

V

NL

11

12

V


valid bit

IW


I/O Write protection

IR


I/O Read protection

NL


next Level

Res
-

reserved

Translation Data Structures

Page tables

Translation tables are
always

4K byte
blocks in system memory

Root translation table base address
comes from the Device Table

May point to either a table of PDE or PTE

Intermediate translation tables

Point to either a table of PDE or PTE

Translation Data Structures

Page tables

Next Table Addr [31:12]

Reserved

31

9

8

0

P

NL

11

12

1

Res

Next Table Addr [51:32]

63

51

32

IR

IW

Res

62

61

60

52

IW


I/O Write protection

IR


I/O Read protection

NL


next Level

P


present


Page Address [31:12]

Reserved

31

9

8

0

P

000

11

12

1

S


size

U


ATS attribute bit

NS


ATS attribute bit

Res
-

reserved

PDE Format

PTE Format

Res

Page Address [51:32]

63

52

32

IR

IW

Res

62

61

60

57

NS

59

U

S

58

51



1) IOMMU receives request, but the translation is not
cached in the ATC. So

4) and refill the ATC and satisfy
the device request

2) The Requester ID from the
device request is used to select
the root translation table

Translation Data Structures
Simplified View

Device
request

Requester

ID

Pointer

Index

IOMMU

Device Table
Base

3) Address from the device request is
used to walk page tables

Page tables

“Virtual” address


ATC


“Translated” address

Device Table

Translation Data Structures

Advance capabilities

Support for 64
-
bit device virtual address

Requires 6 level lookup

Support for variable page sizes

All power of 2 from 4K up

IOMMU tables may be shared with

CPU MMU

Can be efficiently virtualized

Translation Data Structures

Advance capabilities

Configurable maximum table depth

If virtual address has group of leading zeros the lookup
depth may be reduced

Level Skipping

If virtual address has interior groups of zeros, lookup
levels may be skipped

Early Exit

Exit is possible at any level with remaining un
-
translated address bits used as an offset within a
“super page”

Base is 4k, super pages are at 2M(2
21
) 1G(2
30
),
512G(2
39
), 2
48

Translation Data Structures
Example with level skipping

Starting

Level

Levels Skipped

Final Level 1

Skipped



卵p敲 p慧e

0000000b
1


000000000b
1


Level
-
4 Page
Table Offset

000000000b
1


Level
-
2 Page
Table Offset

Physical Page Offset


63

58

57

48

47

39

38

30

29

21

20

0

1
The Virtual Address bits associates with all skipped levels must be zero

Level 4 Page Table Address

4h

51

12

11

9

52

63

8

0

PDE

2h

Level
-
4
Table

PTE

0h

Level
-
2 Table

Physical

Address

2 MB Page

52

52

9

9

21

Software Interface

Control Structures


Command Queue

Event Queue

Cmd Buffer

base register

IOMMU

Device Table

base register

Event Log

base register

Device
Table

Command
Queue

Event
Log

I/O Page Tables

Software Interface

Control Structures

Command Queue

Circular ring buffer in system memory

Low insertion overhead

Processed at IOMMU service rate

16 byte command entries

Maximum size is 512 KB

Event Log

Circular ring buffer in system memory

Low removal overhead

Processed at CPU service rate

16 byte log entries

Maximum size 512 KB

Software Interface

Command queue

Tail Pointer is incremented by the CPU after writing a command

Tail Pointer write signals IOMMU that new command is ready

Head Pointer is incremented by the IOMMU after reading a command

MMIO Offset 0008h

MMIO Offset 2008h

MMIO Offset 2000h

MMIO Offset 2020h

status register

tail pointer

buffer base

buffer size

tail pointer

buffer base

buffer size

head pointer

+0

+16

+32

+48

+64

+80

+96

+112

IOMMU

(consumer)


IOMMU registers

System Software

(producer)


Circular command buffer in
system memory

Software Interface

Commands

Invalidate Device Table Entry

Indexed by Device ID

Invalidate IOMMU Pages

Power of 2 naturally aligned number of 4K pages

Indexed by Domain

Invalidate IOTLB Pages

Power of 2 naturally aligned number of 4K pages

Indexed by Device ID

Completion Wait

May be used as a fence

May be used to signal an interrupt

May be used to write a flag in system memory

IOMMU manages ordering interlocks

Invalidate Device Table commands will complete before
subsequent Invalidate IOMMU Pages commands

Invalidate IOMMU Pages commands will complete before
subsequent Invalidate IOTLB Pages commands

Completion semantics

Invalidation commands are complete when all overlapping DMA
transactions that are in flight to system memory are either
complete or visible

Completion signaled when Completion Wait

command is executed

Interrupt

Memory based flag

Software Interface

Command ordering and semantics

Software Interface

Event log

Tail Pointer is incremented by the IOMMU after writing an event

IOMMU can be configured to signal an interrupt when event log is written

Head Pointer is incremented by the CPU after reading an event

Head Pointer write signals IOMMU that event has been consumed


[MMIO Offset 0010h]

[MMIO Offset 2018h]

[MMIO Offset 2010h]

head pointer

buffer base

buffer size

+0

+16

+32

+48

+64

+80

+96

+112

System Software

(consumer)


status register

tail pointer

buffer base

buffer size

head pointer

IOMMU registers

IOMMU

(producer)


Circular event log in system
memory

Software Interface

Events

Translation events

Invalid Device Table Entry

IO Page Fault

Device Table HW Error

Page Table HW Error

Invalid Device Request

Command processing events

Command HW Error

Illegal Command

IOTLB Invalidate Timeout

Software Interface

Exception Handling

Translation failure for any reason

(i.e. Errors due to I/O page faults, memory errors
due to page table walks)

Request is aborted

Completer Abort (CA) returned to device where possible

Details logged

Interrupt is optionally generated

Command queue failure

Processing is halted

Details logged

Interrupt is optionally generated

Software Interface

OS/Hypervisor Interactions

Initialization

Done via configuration and MMIO transactions

Clear caches, set base address and size of domain tables, etc

Runtime operations

Device table updates, translation cache invalidations

Combination of MMIO and DRAM accesses

MP support requires software
-
managed sharing of command buffer

Each IOMMU has separate command and event queue

Virtualization of IOMMU

Intercept MMIO pointer writes to virtual IOMMU

Process virtual IOMMU command queue and update shadow tables

Forward Invalidate commands to real IOMMU


Call To Action

Read the “AMD I/O Virtualization (IOMMU)
Technology” specification to understand
hardware assisted virtualization, available at
http://developer.amd.com/documentation.aspx

Driver writers should consider the effects of the
change from physical to virtual address
assignment

Device vendors should consider the impact on
their devices when used with I/O memory

management hardware

Sign up for AMD’s development center at
http://devcenter.amd.com


Additional Resources

Web Resources

Main Page
http://www.amd.com


Developer Center
http://devcenter.amd.com


PCI
-
SIG
http://www.pcisig.com

Related Sessions

PCIe Address Translation Services

and I/O Virtualization

Windows Virtualization Best Practices and
Future Hardware Directions

Questions?