Windows Server 2008 Kernel Advancesx

reelingripehalfSoftware and s/w Development

Dec 14, 2013 (3 years and 6 months ago)

234 views

Windows Server

2008

Kernel
Advances

Mark Russinovich

Technical Fellow

Microsoft Corporation


Content of this talk was co
-
developed with Dave Solomon (
www.solsem.com
)

Scope Of Talk

This talk covers key enhancements to the
Windows Server 2008 kernel and related core
components

Many of the enhancements were introduced in
Windows Vista

Does not cover client
-
focused kernel changes
such as

Multimedia Class Scheduler

Protected Processes

User Account Control, Mandatory Integrity Controls

Superfetch, ReadyBoost,
ReadyDrive

I/O priorities,
Bitlocker

Scope of Talk (Cont.)

Does not cover important enhancements in

Networking (e.g., new TCP/IP stack, Windows Filtering
Platform)

Installation (e.g. WIM, Component Based Servicing)

Management (group policy and AD improvements)

Diagnostics and Monitoring (WDI)

Data protection (Bitlocker)


Agenda

Platform Support

Processes and Threads

I/O and File System

Memory Management

Startup and Shutdown

Reliability and Recovery

Security


Kernel Variants

Windows Server 2008 has both 32
-
bit and 64
-
bit kernels

64
-
bit includes x64 (AMD64, Intel 64) and Itanium

This is the last 32
-
bit server release

Uniprocessor kernel variants are gone in Windows Server 2008

Performance impact of running MP on UP insignificant

Multiprocessor systems becoming the norm

Kernel variants therefore reduce to:


* For
32
-
bit systems
with >4GB RAM or hardware no execute support

Kernel

32
-
bit

64
-
bit

Multiprocessor

x

x

Multiprocessor Checked

x

x

Multiprocessor PAE*

x

Multiprocessor PAE* Checked

x

Dynamic Partitioning

Before, hardware upgrades and maintenance
have required a shutdown, resulting in downtime

Windows Server 2008 reduces the need for
downtime by supporting these hardware
configuration changes without a reboot

Hot plug PCI Express

Some vendor proprietary Windows Server 2003 configurations
support hot plug PCI Express

Hot replace of memory

Windows Server 2003 supports hot add memory

Hot add and replace of processors

Adding a Processor

When a new processor is added

Drivers that requested notification are called

System
-
wide plug and play rebalance is done to
include new processor interrupts

Processes must opt
-
in for affinity updates to have
the new processor added to their affinity mask

Some applications make decisions at initialization
based on the number of processors

The System, Smss, and Svchost processes have
affinity updates enabled

Hardware Error Reporting

Before, hardware errors were not reported on in a
uniform way

No standards for hardware error reporting

No common mechanism in kernel to collect and report
hardware errors

Windows Server 2008 introduces a new common
error reporting infrastructure called Windows
Hardware Error Architecture (WHEA)

Supports hardware error standards via plug
-
ins

Common error format for all error types

Error source discovery

Hypervisor

Windows Server 2008 is the basis
for Microsoft’s new virtualization
offering

Hypervisor to control low level
access to system resources

Processors, Local APICs,

physical memory

Kernel enlightenments for improved
performance and scalability

Root partition implements device
support using Server Core

Different than VMWare ESX where
hypervisor implements drivers

Hardware

Windows hypervisor

Root

Partition

Server

Core

Apps

Apps

Apps

Child
Partition

Child
Partition

OS 1

OS 2

Agenda

Platform Support

Processes and Threads

I/O and File System

Memory Management

Startup and Shutdown

Reliability and Recovery

Security


Time Accounting

Before, Windows accounted for CPU time based on the
interval clock timer

10
-
15ms resolution

Thread quantum expiration was not always fair

A thread might get almost no turn or up to three turns

Threads also were charged for interrupts that occurred while they
were running

Idle

T1

T2

T1 & T2 come out of
wait; T1 begins

Time slice interval

Cycle Time Counter

Windows Server 2008 reads Time Stamp Counter
(TSC) at context switch

Actual CPU cycles consumed charged to thread

Interrupt time not charged to the interrupted thread

Allows for more accurate quantum accounting

Thread gets at least 1 turn and can get at most a turn + 1 tick

Also provides accurate time accounting for thread execution








Idle

T1

T1

Time slice interval

T2

Other Infrastructure Changes

Enhanced Thread Pool mechanism

Convenient library for efficient use of threads and CPU

New synchronization APIs

Allows for faster DLL loading and easier multithreading
development

Private namespaces

More secure protection of application objects such as
synchronization and shared memory objects

Hard resource quotas

Provides ability to prevent resource exhaustion of critical shared
resources including paged pool and non
-
paged pool

Agenda

Platform Support

Processes and Threads

I/O and File System

Memory Management

Startup and Shutdown

Reliability and Recovery

Security


Self
-
Healing NTFS

Before, NTFS corruptions required running
Chkdsk, which often could only be done on the
next reboot

In Windows Server 2008, an NTFS worker thread
performs background
Chkdsk
-
type corrections
when NTFS detects a corrupt file or

directory

Minor disk errors are transparent to the user

Only corrupted files/folder inaccessible during repairs

unlike lock of the entire volume

No need to reboot to repair corruptions

SMB2

SMB is the original Windows remote file
system

protocol

Can’t adapt to new NTFS features

Not designed for today’s large data sizes

SMB2 introduced in Windows Vista and

Windows Server 2008

Supports NTFS client
-
side symbolic links

Operations can be batched to minimize client/server
round trips

Support for arbitrary buffer sizes for more efficient
copies result in 30
-
40x throughput improvement

SMB2 Performance

Performance gains are realized for full SMB2
connections between Server 2008 and Vista SP1

I/O Completion Port Improvements

I/O completion ports allow threads to wait
efficiently for completion of multiple I/O requests

Completed I/Os queue on the completion port

Before, each completion caused unnecessary
context switch to the issuing thread

This might cause a delay since the thread might not
run immediately to process this

Windows Server 2008 defers I/O completion to
when the thread pulls the I/O off the completion
port

Avoids context switch, thus improving performance

Agenda

Platform Support

Processes and Threads

I/O and File System

Memory Management

Startup and Shutdown

Reliability and Recovery

Security


Static System Address Space

Before, system virtual address space divided into
fixed

regions

Reason for limits on
nonpaged
, paged pool, system page
table

entries

Application

Paged Pool

Nonpaged

Pool

System PTEs

2 GB User Mode

2 GB Kernel Mode

Session Pool

Dynamic System Address Space

In 32
-
bit Windows Server 2008, virtual memory
assigned as needed

Kernel page tables allocated on demand instead of at boot

Components still cannot exceed 2 GB on 32
-
bit systems

Kernel stack usage is reduced through “stack jumping”

Stacks can grow and shrink as required

Benefits:

Supports more users on terminal servers

Up to 64GB supported when booted /3GB (instead of 16GB)

Paged Pool

Nonpaged

Pool

System PTEs

2 GB Kernel Mode

Session Pool

Memory Manager

Performance Improvements

Fewer and larger disk reads for page faults and
system cache
readahead

64 KB limit removed; large I/Os done despite valid page
range

Readahead

done directly into page cache lists

Fewer, faster, and larger disk writes for
pagefile

and mapped file I/O

Larger cluster size (average 1MB), reduced
fragmentation

Elimination of zero writes

Concurrency improvements in many areas

More lock free searches, better parallelism

NUMA Enhancements

More memory allocations are NUMA aware:

Initial
nonpaged

pool has separate address ranges for
each node

Per
-
node look
-
asides for full pages

Page table allocation for system PTEs, the system
cache, etc. distributed across nodes

I/O system directs interrupt completion to node
that initiated I/O

NUMA Enhancements (Cont.)

Ideal node used more effectively for process
memory allocations

Page faults for threads running on non
-
ideal processor
go into ideal node’s memory

Prefetch

pages to ideal node for application, instead of
ideal node for
prefetch

thread

Pages migrated to ideal node on soft page fault

NUMA topology used to influence locality for
memory

allocations

New NUMA APIs allow applications to specify
preferred node number for memory allocations
and file mappings

NUMA Memory Allocation

Thread T gets scheduled on another node, but memory
allocations go to its ideal node

T

Node 1

Node 2

CPU 0

CPU 3

CPU 1

CPU 2

CPU 0

CPU 3

CPU 1

CPU 2

Ideal

CPU

RAM

RAM

Ideal Node

Agenda

Platform Support

Processes and Threads

I/O and File System

Memory Management

Startup and Shutdown

Reliability and Recovery

Security


Startup Processes On Server 2003

Session Manager (SMSS) created
Winlogon

and
Csrss

for each session

Session creation was done serially

Was bottleneck for Terminal Services

Winlogon
, the interactive logon
manager,

created

Local Security Authority (Lsass.exe)

Service Control Manager (Services.exe)

Startup Processes On Server 2008

In Windows Server 2008

Initial Smss.exe creates an instance of itself to initialize each
session

Permits parallel session creation

Minimum parallel session startups is 4

Maximum is number of processors

Session 0
Smss

runs Wininit.exe (new)

Wininit

starts what
Winlogon

used to start: Services,
Lsass

Also starts a new process, Local Session Manager (Lsm.exe)

Session 1
-
n
Smss’s

create initialize interactive
sessions

Session
-
specific instance of Csrss.exe and Winlogon.exe

Clean Service Shutdown

Before, services had no way to extend the time allowed
for

shutdown

After a fixed timeout (default 20 seconds), SCM was
killed and system halted (while services were running)

This was a problem for services that needed to flush
data

In Windows Server 2008, services can request
preshutdown

notification

Can take as long as they want to shut down as long as
they are responsive

If the service stops responding the system gives up on
it after 3 minutes

After pre
-
shutdown services stop, the system performs
Windows XP
-
style shutdown for other services

Agenda

Platform Support

Processes and Threads

I/O and File System

Memory Management

Startup and Shutdown

Reliability and Recovery

Security


Kernel Transaction Manager (KTM)

Before, applications had to work hard to recover from
errors during modification of files and Registry keys

Windows Server 2008 implements a generalized
transaction manager

Provide all or nothing transaction semantics

Kernel Transaction Manager coordinates between
transaction clients (applications) and Resource Managers

Builds on Common Log File System (Clfs.sys) introduced in

Windows Server 2003 R2 for efficient transaction logging facilities

Third parties can write user
-
mode or kernel
-
mode
Resource

Managers

Kernel Transaction Manager (KTM)

Registry and NTFS enhanced to provide
transaction

semantics across Registry and file
system operations

Transactions can span modifications across one or
many Registry keys, files, and volumes

Using DTC and Windows Server 2008 ,
transactions can coordinate changes across files,
registry, SQL Server, Oracle, MSMQ

Used by Windows Update and System Protection

No more unbootable systems when there’s a system
failure during a
hotfix

Better Handling of Process Crashes

Before, unhandled exception handling was executed in
context of thread incurring exception

Relied on thread stack being valid

Corrupt thread stacks resulted in “silent process death”

In Windows Server 2008, unhandled exceptions send a
message to the Windows Error Reporting (WER) service

WER launches Werfault.exe, which replaces Dwwin.exe

Permits WER to be invoked for threads who’s stack is too
corrupted to invoke unhandled exception filter

Bottom line: all process crashes will get recorded (and reported if
configured to do so)


Improved Crash Dump Support

Before, crashes during early part of system startup
would not result in a crash dump

Crash dumps are written to paging file on boot partition

Paging files were not opened by Smss process (after kernel
initialization)

Now, the paging file is open before system start drivers
initialize

Crashes that might result are therefore recorded

Before, default crash dump type on servers was set to
full memory dumps

Now, default dump type on all systems is kernel dumps

Agenda

Platform Support

Processes and Threads

I/O and File System

Memory Management

Startup and Shutdown

Reliability and Recovery

Security


Address Space Load Randomization (ASLR)

Prior to Windows Server 2008

Kernel, HAL, executables and DLLs
loaded at fixed locations

Buffer overflows commonly relied on
known system function addresses to
cause specific code to execute

Windows Server 2008 loader bases
modules at one of 256 random points
in the address space

Operating System (OS) images now
include relocation information

Relocation performed once per image
and shared across processes

User stack locations are also
randomized

Drivers, Kernel and HAL also
randomized

Kernel32

NTDLL

User32

Exe

Server 2003 1

Kernel32

NTDLL

User32

Exe

Server 2003 2

Kernel32

User32

Exe

Server
2008 1

User32

Exe

Server
2008 2

NTDLL

Kernel32

NTDLL

Service Security Hardening

Before, service bugs allowed for privilege elevation
attacks

In Windows Server 2008, services apply principle of least
-
privilege to limit system exposure in case of compromise

Service
-
specific SIDs permit a service’s access to objects
to be limited

Only required objects give SID access

Firewall policy can be applied to service SID (and many

Windows Server 2008 services have this specified)

Write
-
restricted service processes further limit write
access

Can only modify objects that allow restricted write access

Service Security Hardening

Service can specify which privileges (e.g., shutdown,
audit, etc.) they require

Limits power of service processes

Specified in MULTI_SZ registry value under service key called
RequiredPrivileges

On service start, SCM computes union of all required
privileges for service(s) inside service process

If process token does not contain one, service start fails

Privileges not explicitly specified are removed from token

If no required privileges specified, assumes all privileges in
process token are needed

Summary and More Information

Lots of exciting kernel changes in Windows
Server 2008 for performance, scalability,
reliability, security

Further reading:

Article on Vista Kernel Changes in February,

March, April 2007 issues of TechNet magazine

Windows Server 2008 Kernel Changes article (March
2008)

Windows Internals 5
th

Edition (Summer 2008)

My Other Sessions

Security
Panel:

10:45 tomorrow

CLI301 The Case of the Unexplained...:

9:00 tomorrow

15:15 tomorrow


Resources

TechNet Library
Knowledge
Base
Forums
TechNet
Magazine
Security
bulletins
User Groups
Newsgroups

E
-
learning Product
Evaluations
Videos
Webcasts
V
-
labs
Blogs
MVPs
Certification Chats



Visit TechNet in the ATE
Pavilion and
get a
FREE 60
-
day subscription to TechNet Plus!

learn

support

connect

subscribe

Technical Communities, Webcasts, Blogs, Chats &

User Groups

http://www.microsoft.com/communities/default.mspx



Microsoft Learning and Certification

http://www.microsoft.com/learning/default.mspx



Microsoft Developer Network (MSDN) & TechNet

http://microsoft.com/msdn


http://microsoft.com/technet



Trial Software and Virtual Labs

http://www.microsoft.com/technet/downloads/trials/default.
mspx



New, as a pilot for 2007, the Breakout sessions will be
available post event, in the
TechEd

Video Library, via the

My Event page of the website

Complete your evaluation on the My Event pages
of the website at the CommNet or the Feedback
Terminals to win!

All attendees who submit

a session feedback form
within 12 hours after the
session ends will have the
chance to win the very latest
HTC 'Touch'
smartphone

complete with Windows
Mobile® 6 Professional

© 2007
Microsoft Corporation. All rights reserved.


This presentation is for informational purposes only.


MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.