IPTR Presentation Template - SIGOPS

candlewhynotΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 4 μήνες)

271 εμφανίσεις

1

Differentiated Storage Services

Michael Mesnier, Jason Akers, Feng Chen

Intel Corporation

Tian Luo

The Ohio State University

23rd ACM

Symposium on Operating Systems Principles (SOSP)



October 23
-
26, 2011,
Cascais
, Portugal

2

An analogy: moving & shipping

Why should
computer storage
be any different?

Technology overview

Classification

Policy assignment

Policy enforcement

3

Differentiated Storage Services

(
offline)

Classifier

QoS Policy

Metadata

Low latency

Boot files

Low latency

Small files

High throughput

Media files

High bandwidth





Computer system

Operating system

Applications or DB

File system

I/O Classification

I/O Classification

I/O Classification

Storage system

Management firmware

Storage controller

QoS Policies

QoS Mechanisms

Storage

Pool A

Storage

Pool B

Storage

Pool C

= Current & future research

Technology overview

Classification

Policy assignment

Policy enforcement

Classify
each I/O
in
-
band

4

The SCSI CDB

5 bits


㌲ 捬慳c敳

5

Motivation: disk caching with SSDs


Universal challenges in the industry


Keeping the right data cached


Avoiding thrash under cache pressure


Conventional approaches


Cache bypass for large/sequential requests


Evict cold data (LRU commonly used)


How I/O classification can help


Identify cacheable I/O classes


Assign relative caching priorities

Technology overview

6

Filesystem prototypes
(Ext3 & NTFS)

Classify
each I/O
in
-
band

Classifier

Cache priority

Metadata

0

Journal

0

Directories

0

Files <= 4KB

1

Files <=16KB

2

Files <=64KB

3





Files > GB

Lowest

Computer system

Operating system

Applications or DB

File system

I/O Classification

I/O Classification

I/O Classification

Storage system

Management firmware

Storage controller

QoS Policies

QoS Mechanisms

= Current & future research

Technology overview

FS classification

FS policy assignment

FS policy enforcement

Disk

SSD

7

Classifier

Cache priority

System tables

0

Temp. tables (on write)

1

Randomly tables

2

Temp. tables (on read)

3

Sequential tables

Bypass

Index files

Bypass

Database prototype
(
PostgreSQL
)

Classify
each I/O
in
-
band

Computer system

Operating system

Applications or DB

File system

I/O Classification

I/O Classification

I/O Classification

Storage system

Management firmware

Storage controller

QoS Policies

QoS Mechanisms

= Current & future research

Technology overview

DB classification

DB policy assignment

DB policy enforcement

Disk

SSD

8

Selective cache algorithms


Selective allocation


Always allocate high
-
priority classes


E.g. FS metadata and DB system tables always allocated


Conditionally allocate low
-
priority classes


Depends on cache pressure, cache contents, etc.


High/low cutoff is a tunable parameter


Selective eviction


Evict in priority order (lowest priority first)


E.g., temporary DB tables evicted system tables


Trivially implemented by managing one LRU per class

Technology overview

9

Technology development

10

Ext3 prototype


OS changes
(block layer)


Add classifier to I/O requests


Only coalesce like
-
class requests


Copy classifier into SCSI CDB


Ext3 changes


18 classes identified



Optimized for a file server


Small files & metadata


A small kernel patch


A one
-
time change to the FS


Ext3

Class

Group
Number

Cache
priority

Unclassified

0

12

Superblock

1

0

Group

desc
.

2

0

Bitmap

3

0

Inode

4

0

Indirect block

5

0

Directories

6

0

Journal

7

0

File <= 4KB

8

1

File <= 16KB

9

2

File <= 64KB

10

3







File > 1GB

18

11

Technology development

11

Ext3 classification illustrated


echo ‘Hello, world!’ >>
foo
; sync



READ_10(
lba

231495
len

8
grp

9) <=4KB


WRITE_10(
lba

231495
len

8
grp

9) <=4KB


WRITE_10(
lba

16519223
len

8
grp

8) Journal


WRITE_10(
lba

16519231
len

8
grp

8) Journal


WRITE_10(
lba

16519239
len

8
grp

8) Journal


WRITE_10(
lba

16519247
len

8
grp

8) Journal


WRITE_10(
lba

8279
len

8
grp

5)
Inode


7 I/Os (28KB) to write 13 bytes


Metadata accounts for most of the overhead


I/O classification
shows read
-
modify
-
write and metadata
updates

Technology development

NTFS classification is implemented

with Windows filter drivers

12

PostgreSQL

prototype


Classification API: scatter/gather I/O










OS changes
(block layer)


Add O_CLASSIFIED file flag


Extract classifier from SG I/O


A small OS & DB patch


A one
-
time change to the OS & DB

PostgreSQL


class

Group
Number

Unclassified

0

Transaction log

19

System

table

20

Free space map

21

Temporary

table

22

Random

table

23

Sequential

table

24

Index

file

25

Reserved

26
-
31

fd
=open("
foo
", O_RDWR|O_CLASSIFIED, 0666);

c
lass = 19;

myiov
[0].
iov_base

= &class;

myiov
[0].
iov_len

= 1;

myiov
[1].
iov_base

= “Hello, world!”;

myiov
[1].
iov_len

= 13;

writev
(
fd
,
myiov
, 2);

Preliminary DB classes

Technology development

13

Cache implementations


Fully associative read/write LRU cache


Insert(), Lookup(), Delete(), etc.


Hash table maps disk LBA to SSD LBA


Syncer

daemon asynchronously cleans cache


Monitors cache pressure for
selective allocate


Maintains multiple LRU lists for
selective evict


Front
-
ends:
iSCSI

(OS independent) and Linux MD


MD cache module (RAID
-
9)

Technology development


Striping:
mdadm


create /dev/md0

level=0

raid
-
devices=2 /dev/
sdd

/dev/
sde

Mirroring:
mdadm


create /dev/md0

level=1

raid
-
devices=2 /dev/
sdd

/dev/
sde


RAID
-
9:
mdadm


create /dev/md0

level=9

raid
-
devices=2 <cache> <base

14

Evaluation

15

Experimental setup


Host OS
(Xeon, 2
-
way, quad
-
core, 12GB RAM)


Linux 2.6.34 (patched as described)


Target storage system


HW RAID array + X25
-
E cache


Workloads and cache sizes


SPECsfs
: 18GB (10% of 184GB working set)


TPC
-
H: 8GB (28% of 29GB working set)


Comparison


LRU versus LRU
-
S (LRU with selective caching)

Evaluation

16

SPECsfs

I/O breakdown

Large files pollute LRU cache

(metadata and small files evicted)

LRU

LRU
-
S fences off large file I/O

LRU
-
S

17

SPECsfs

performance metrics

Syncer

overhead

LRU
-
S

LRU

LRU

LRU
-
S

I/O Throughput

LRU

LRU
-
S

Hit rate

LRU

LRU
-
S

HDD

Running time

1.8x

speedup

18

SPECsfs

file latencies

LRU

LRU
-
S

Reduction in write latency over HDD

LRU suffers from write outliers

(from eviction overheads)

LRU

LRU
-
S

Reduction in read latency over HDD

LRU
-
S reduces read latency

(most small files are cached)

LRU

LRU
-
S

19

TPC
-
H I/O breakdown

Indexes pollute LRU cache

(user tables evicted)

LRU

LRU
-
S fences off index files

LRU
-
S

20

TPC
-
H performance metrics

Syncer

overhead

I/O Throughput

LRU
-
S

LRU

LRU

LRU

LRU

LRU
-
S

LRU
-
S

LRU
-
S

HDD

Running time

Hit rate

1.2x

speedup

Intel Confidential

21

Conclusion & future work


Intelligent caching is just the beginning


Other types of performance differentiation


Security, reliability, retention, …


Other applications we’re looking at


Databases


Hypervisors


Cloud storage


Big Data (
NoSQL

DB)


Work already underway in T10


Open source coming soon…

Thank you!



Questions?