pptx - Department of Computer Science - University of Toronto

reelingripehalfSoftware and s/w Development

Dec 14, 2013 (3 years and 8 months ago)

81 views

Professor Yashar Ganjali

Department of Computer Science

University of Toronto


yganjali@cs.toronto.edu

http://www.cs.toronto.edu/~yganjali

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.:

The Problem


All packet switches (e.g. Internet routers, ATM
switches) require packet buffers for periods of
congestion.


Size:

For TCP to work well, the buffers need to hold
one
RTT

(about 250 ms) of data.


Speed:

Clearly, the buffer needs to store (retrieve)
packets as fast as they arrive (depart).

CSC 2203


Packet Switch and Network Architectures

2

University of Toronto


Fall 2012

Memory

Line rate,
R

Memory

Line rate,
R

Line rate,
R

Line rate,
R

Memory

1

N

1

N

CSC 2203


Packet Switch and Network Architectures

3

University of Toronto


Fall 2012

An Example

Packet Buffers for a
40Gb/s
Router
Linecard

Buffer

Memory

Write Rate,
R

One 40B packet

every 8ns

Read Rate,
R

One 40B packet

every 8ns

10Gbits

Buffer Manager

Unpredictable

Scheduler
Requests

CSC 2203


Packet Switch and Network Architectures

4

University of Toronto


Fall 2012

Memory Technologies


Use SRAM?

+

Fast enough random access time, but

-
Too low density to store 10Gbits of data.



Use DRAM?

+

High density means we can store data, but

-
Can’t meet random access time.

CSC 2203


Packet Switch and Network Architectures

5

University of Toronto


Fall 2012

Lots of DRAMs in Parallel?

Buffer

Memory

Write Rate,
R

One 40B packet

every 8ns

Read Rate,
R

One 40B packet

every 8ns

Buffer Manager

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Buffer

Memory

Read/write 320B every 32ns

40
-
79

Bytes: 0
-
39











280
-
319

320B

320B

CSC 2203


Packet Switch and Network Architectures

6

University of Toronto


Fall 2012

Works Fine If There is Only One FIFO

Write Rate,
R

One 40B packet

every 8ns

Read Rate,
R

One 40B packet

every 8ns


Buffer Manager

40
-
79

Bytes: 0
-
39











280
-
319

320B

Buffer Memory

320B

40B

320B

320B

40B

40B

40B

40B

40B

40B

40B

40B

40B

320B

320B

320B

320B

320B

320B

320B

320B

320B

320B

CSC 2203


Packet Switch and Network Architectures

7

University of Toronto


Fall 2012

Write Rate,
R

One 40B packet

every 8ns

Read Rate,
R

One 40B packet

every 8ns


Buffer Manager

40
-
79

Bytes: 0
-
39











280
-
319

320B

Buffer Memory

320B

?B

320B

320B

?B

320B

320B

320B

320B

320B

320B

320B

320B

320B

320B

Variable Length Packets

What if Packets Have Variable Lengths?

CSC 2203


Packet Switch and Network Architectures

8

University of Toronto


Fall 2012

In Practice, Many FIFOs

40
-
79

Bytes: 0
-
39











280
-
319

320B

320B

320B

320B

320B

320B

320B

320B

320B

320B

320B

320B

1

2

Q

E.g.



In an IP Router,


Q

might be 200.



In an ATM switch,


Q

might be 10
6
.

Write Rate,
R

One 40B packet

every 8ns

Read Rate,
R

One 40B packet

every 8ns


Buffer Manager

320B

320B

?B

320B

320B

?B

How can we write

multiple variable
-
length

packets into different

queues?

CSC 2203


Packet Switch and Network Architectures

9

University of Toronto


Fall 2012

Problems


A 320B block will contain packets for different
queues, which can’t be written to, or read from the
same location.



If instead a different address is used for each
memory, and packets in the 320B block are written to
different locations, how do we know the memory will
be available for reading when we need to retrieve the
packet?

CSC 2203


Packet Switch and Network Architectures

10

University of Toronto


Fall 2012

Arriving

Packets

R

Unpredictable

Scheduler

Requests


Departing

Packets

R

1

2

1

Q

2

1

2

3

4

3

4

5

1

2

3

4

5

6

Small head SRAM

cache for FIFO heads

SRAM

Hybrid Memory Hierarchy

Large DRAM memory holds the body of FIFOs

5

7

6

8

10

9

7

9

8

10

11

12

14

13

15

50

52

51

53

54

86

88

87

89

91

90

82

84

83

85

86

92

94

93

95

6

8

7

9

11

10

1

Q

2

Writing

b

bytes

Reading

b

bytes

cache for FIFO tails

55

56

96

97

87

88

57

58

59

60

89

90

91

1

Q

2

Small tail SRAM

DRAM

CSC 2203


Packet Switch and Network Architectures

11

University of Toronto


Fall 2012

Some Thoughts


What is the minimum SRAM needed to guarantee
that a byte is always available in SRAM when
requested?



What algorithm should we use to manage the
replenishment of the SRAM “cache” memory?

CSC 2203


Packet Switch and Network Architectures

12

University of Toronto


Fall 2012

An
Example


Q = 5, w = 9+, b = 6

t

= 1

Bytes

t

= 3

Bytes

t

= 4

Bytes

t

= 5

Bytes

t

= 7

Bytes

t

= 2

Bytes

t

= 6

Bytes

t

= 0

Bytes

Replenish

Replenish

CSC 2203


Packet Switch and Network Architectures

13

University of Toronto


Fall 2012

An Example


Q = 5, w = 9+, b = 6

t

= 8

Bytes

t

= 9

Bytes



t

= 10

Bytes

t

= 11

Bytes

t

= 12

Bytes

t

= 13

Bytes

Replenish



t

= 19

Bytes

Replenish

t

= 23

Bytes

Read

CSC 2203


Packet Switch and Network Architectures

14

University of Toronto


Fall 2012

The Size of the SRAM Cache


Necessity:


How large does the SRAM cache need to be under any MMA?


Theorem:

wQ

> Q
(
b
-

1
)(2 +
ln
Q
)



Sufficiency:


For a specific MMA, and for any pattern of arrivals, what is the smallest
SRAM cache needed so that a byte is always available when requested?


For one particular algorithm:
wQ

=
Qb
(
2 +
ln
Q
)

w

Bytes

Q

w

Memory
Management
Algorithm

CSC 2203


Packet Switch and Network Architectures

15

University of Toronto


Fall 2012

Some Definitions


Occupancy:
X
(
q,t
)


The number of bytes in FIFO
q

(in SRAM) at time
t
.



Deficit:
D
(
q,t
) =
w

-

X
(
q,t
)

w

Q

w

occupancy


deficit

CSC 2203


Packet Switch and Network Architectures

16

University of Toronto


Fall 2012

Smallest SRAM
cache


Necessity


CSC 2203


Packet Switch and Network Architectures

17

University of Toronto


Fall 2012

Smallest SRAM
cache


Necessity


In addition, each queue needs to hold (
b



1) bytes in
case it is replenished with
b

bytes when only 1 byte
has been removed.



Therefore, SRAM size must be at least:

Qw

>
Q
(
b


1)(2 + ln
Q
).

CSC 2203


Packet Switch and Network Architectures

18

University of Toronto


Fall 2012

Most Deficit Queue First
MMA


Sufficiency


Algorithm:

Every
b

timeslots, MDQF
-
MMA
replenishes the queue with the largest deficit.


Theorem:

With MDQF
-
MMA, an SRAM cache of size
Qw

>
Qb
(2 + ln
Q
) is sufficient.


Examples:

1.
40Gb/s linecard,
b
=640,
Q
=128: SRAM = 560kBytes

2.
160Gb/s linecard,
b
=2560,
Q
=512: SRAM = 10MBytes

CSC 2203


Packet Switch and Network Architectures

19

University of Toronto


Fall 2012

Reducing the size of the SRAM

Intuition:


If we use a
lookahead

buffer to peek at the requests
“in advance”, we can replenish the SRAM cache only
when needed.


This increases the latency from when a request is
made until the byte is available.


But because it is a pipeline, the issue rate is the same.

CSC 2203


Packet Switch and Network Architectures

20

University of Toronto


Fall 2012

The ECQF
-
MMA Algorithm

2.
Compute:

Determine which queue will
run into “trouble” soonest.

green!

1.
Lookahead:

Next
Q
(
b


1)

+
1

arbiter requests are known.

Q
(
b
-
1) + 1

Requests in Lookahead Buffer

b
-

1

Q

Queues

3.
Replenish:

Fetch
b

bytes for the
“troubled” queue.

Q

b

-

1

Queues

CSC 2203


Packet Switch and Network Architectures

21

University of Toronto


Fall 2012

Example of ECQF
-
MMA
: Q=4, b=4

t

= 0; Green Critical

Requests in lookahead buffer

Queues

t

= 1

Queues

Requests in lookahead buffer

t

= 2

Queues

Requests in lookahead buffer

t

= 3

Requests in lookahead buffer

t

= 4; Blue Critical

Requests in lookahead buffer

t
= 5

Requests in lookahead buffer

t

= 6

Requests in lookahead buffer

t

= 7

Requests in lookahead buffer

t

= 8; Red Critical

Requests in lookahead buffer

CSC 2203


Packet Switch and Network Architectures

22

University of Toronto


Fall 2012

Theorem

Patient Arbiter
:

An SRAM cache of size
Q
(
b


1) bytes is
sufficient to guarantee that a requested byte is
available within
Q
(
b


1) + 1 request times. Algorithm
is called ECQF
-
MMA (Earliest Critical Queue first).

Example:

160Gb/s linecard,
b
=2560,
Q
=512: SRAM = 1.3MBytes,

delay bound is 65ms (equivalent to 13 miles of fiber).



CSC 2203


Packet Switch and Network Architectures

23

University of Toronto


Fall 2012

Maximum Deficit Queue First with Latency

(
MDQFL
-
MMA)

What if application can only tolerate a latency

l
max

<
Q
(
b



1) + 1 timeslots?



Algorithm:

Maximum Deficit Queue First with latency
(MDQFL
-
MMA) services a queue, once every
b

timeslots in the following order:

1.
If there is an earliest critical queue, replenish it.

2.
If not, then replenish the queue that will have the
most deficit
l
max

timeslots in the future.