1
Differentiated Storage Services
Michael Mesnier, Jason Akers, Feng Chen
Intel Corporation
Tian Luo
The Ohio State University
23rd ACM
Symposium on Operating Systems Principles (SOSP)
October 23
-
26, 2011,
Cascais
, Portugal
2
An analogy: moving & shipping
Why should
computer storage
be any different?
Technology overview
Classification
Policy assignment
Policy enforcement
3
Differentiated Storage Services
(
offline)
Classifier
QoS Policy
Metadata
Low latency
Boot files
Low latency
Small files
High throughput
Media files
High bandwidth
…
…
Computer system
Operating system
Applications or DB
File system
I/O Classification
I/O Classification
I/O Classification
Storage system
Management firmware
Storage controller
QoS Policies
QoS Mechanisms
Storage
Pool A
Storage
Pool B
Storage
Pool C
= Current & future research
Technology overview
Classification
Policy assignment
Policy enforcement
Classify
each I/O
in
-
band
4
The SCSI CDB
5 bits
㌲ 捬慳c敳
5
Motivation: disk caching with SSDs
Universal challenges in the industry
–
Keeping the right data cached
–
Avoiding thrash under cache pressure
Conventional approaches
–
Cache bypass for large/sequential requests
–
Evict cold data (LRU commonly used)
How I/O classification can help
–
Identify cacheable I/O classes
–
Assign relative caching priorities
Technology overview
6
Filesystem prototypes
(Ext3 & NTFS)
Classify
each I/O
in
-
band
Classifier
Cache priority
Metadata
0
Journal
0
Directories
0
Files <= 4KB
1
Files <=16KB
2
Files <=64KB
3
…
…
Files > GB
Lowest
Computer system
Operating system
Applications or DB
File system
I/O Classification
I/O Classification
I/O Classification
Storage system
Management firmware
Storage controller
QoS Policies
QoS Mechanisms
= Current & future research
Technology overview
FS classification
FS policy assignment
FS policy enforcement
Disk
SSD
7
Classifier
Cache priority
System tables
0
Temp. tables (on write)
1
Randomly tables
2
Temp. tables (on read)
3
Sequential tables
Bypass
Index files
Bypass
Database prototype
(
PostgreSQL
)
Classify
each I/O
in
-
band
Computer system
Operating system
Applications or DB
File system
I/O Classification
I/O Classification
I/O Classification
Storage system
Management firmware
Storage controller
QoS Policies
QoS Mechanisms
= Current & future research
Technology overview
DB classification
DB policy assignment
DB policy enforcement
Disk
SSD
8
Selective cache algorithms
Selective allocation
–
Always allocate high
-
priority classes
–
E.g. FS metadata and DB system tables always allocated
–
Conditionally allocate low
-
priority classes
–
Depends on cache pressure, cache contents, etc.
–
High/low cutoff is a tunable parameter
Selective eviction
–
Evict in priority order (lowest priority first)
–
E.g., temporary DB tables evicted system tables
–
Trivially implemented by managing one LRU per class
Technology overview
9
Technology development
10
Ext3 prototype
OS changes
(block layer)
–
Add classifier to I/O requests
–
Only coalesce like
-
class requests
–
Copy classifier into SCSI CDB
Ext3 changes
–
18 classes identified
–
Optimized for a file server
Small files & metadata
A small kernel patch
A one
-
time change to the FS
Ext3
Class
Group
Number
Cache
priority
Unclassified
0
12
Superblock
1
0
Group
desc
.
2
0
Bitmap
3
0
Inode
4
0
Indirect block
5
0
Directories
6
0
Journal
7
0
File <= 4KB
8
1
File <= 16KB
9
2
File <= 64KB
10
3
…
…
…
File > 1GB
18
11
Technology development
11
Ext3 classification illustrated
echo ‘Hello, world!’ >>
foo
; sync
–
READ_10(
lba
231495
len
8
grp
9) <=4KB
–
WRITE_10(
lba
231495
len
8
grp
9) <=4KB
–
WRITE_10(
lba
16519223
len
8
grp
8) Journal
–
WRITE_10(
lba
16519231
len
8
grp
8) Journal
–
WRITE_10(
lba
16519239
len
8
grp
8) Journal
–
WRITE_10(
lba
16519247
len
8
grp
8) Journal
–
WRITE_10(
lba
8279
len
8
grp
5)
Inode
7 I/Os (28KB) to write 13 bytes
–
Metadata accounts for most of the overhead
I/O classification
shows read
-
modify
-
write and metadata
updates
Technology development
NTFS classification is implemented
with Windows filter drivers
12
PostgreSQL
prototype
Classification API: scatter/gather I/O
OS changes
(block layer)
–
Add O_CLASSIFIED file flag
–
Extract classifier from SG I/O
A small OS & DB patch
A one
-
time change to the OS & DB
PostgreSQL
class
Group
Number
Unclassified
0
Transaction log
19
System
table
20
Free space map
21
Temporary
table
22
Random
table
23
Sequential
table
24
Index
file
25
Reserved
26
-
31
fd
=open("
foo
", O_RDWR|O_CLASSIFIED, 0666);
c
lass = 19;
myiov
[0].
iov_base
= &class;
myiov
[0].
iov_len
= 1;
myiov
[1].
iov_base
= “Hello, world!”;
myiov
[1].
iov_len
= 13;
writev
(
fd
,
myiov
, 2);
Preliminary DB classes
Technology development
13
Cache implementations
Fully associative read/write LRU cache
–
Insert(), Lookup(), Delete(), etc.
–
Hash table maps disk LBA to SSD LBA
–
Syncer
daemon asynchronously cleans cache
Monitors cache pressure for
selective allocate
Maintains multiple LRU lists for
selective evict
Front
-
ends:
iSCSI
(OS independent) and Linux MD
MD cache module (RAID
-
9)
Technology development
Striping:
mdadm
–
create /dev/md0
–
level=0
–
raid
-
devices=2 /dev/
sdd
/dev/
sde
Mirroring:
mdadm
–
create /dev/md0
–
level=1
–
raid
-
devices=2 /dev/
sdd
/dev/
sde
RAID
-
9:
mdadm
–
create /dev/md0
–
level=9
–
raid
-
devices=2 <cache> <base
14
Evaluation
15
Experimental setup
Host OS
(Xeon, 2
-
way, quad
-
core, 12GB RAM)
–
Linux 2.6.34 (patched as described)
Target storage system
–
HW RAID array + X25
-
E cache
Workloads and cache sizes
–
SPECsfs
: 18GB (10% of 184GB working set)
–
TPC
-
H: 8GB (28% of 29GB working set)
Comparison
–
LRU versus LRU
-
S (LRU with selective caching)
Evaluation
16
SPECsfs
I/O breakdown
Large files pollute LRU cache
(metadata and small files evicted)
LRU
LRU
-
S fences off large file I/O
LRU
-
S
17
SPECsfs
performance metrics
Syncer
overhead
LRU
-
S
LRU
LRU
LRU
-
S
I/O Throughput
LRU
LRU
-
S
Hit rate
LRU
LRU
-
S
HDD
Running time
1.8x
speedup
18
SPECsfs
file latencies
LRU
LRU
-
S
Reduction in write latency over HDD
LRU suffers from write outliers
(from eviction overheads)
LRU
LRU
-
S
Reduction in read latency over HDD
LRU
-
S reduces read latency
(most small files are cached)
LRU
LRU
-
S
19
TPC
-
H I/O breakdown
Indexes pollute LRU cache
(user tables evicted)
LRU
LRU
-
S fences off index files
LRU
-
S
20
TPC
-
H performance metrics
Syncer
overhead
I/O Throughput
LRU
-
S
LRU
LRU
LRU
LRU
LRU
-
S
LRU
-
S
LRU
-
S
HDD
Running time
Hit rate
1.2x
speedup
Intel Confidential
21
Conclusion & future work
Intelligent caching is just the beginning
–
Other types of performance differentiation
–
Security, reliability, retention, …
Other applications we’re looking at
–
Databases
–
Hypervisors
–
Cloud storage
–
Big Data (
NoSQL
DB)
Work already underway in T10
Open source coming soon…
Thank you!
Questions?
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Preparing document for printing…
0%
Σχόλια 0
Συνδεθείτε για να κοινοποιήσετε σχόλιο