Net Compact Framework Memory Management Basics - Sys-con

jaspersugarlandΛογισμικό & κατασκευή λογ/κού

14 Δεκ 2013 (πριν από 3 χρόνια και 7 μήνες)

119 εμφανίσεις

Using the .Net Compact Framework Remote
Performance Monitor to
Optimize
how
your
Device
Application Use
s

Memory


In the
February

issue of
.Net Developers Journal I
described

how implicit
operations such as the boxing of value types can dramatically increas
e the amount of
memory your .Net Compact Framework application uses. At the time, the tools available
to help you get a picture of how your application uses memory were very limited.

While
version 2 of the Compact Framework did report performance statist
ics, it did so only
when

your application
closed
.
The static nature of these counters made it very hard to
locate memory usage trends in your application.

What’s needed instead is a tool that allows you to graphically view how your
application is using
memory
as

it is running. Fortunately, Service Pack 1 of .Net
Compact Framework version 2.0 provides such a tool: The .Net Compact Framework
Remote Performance Monitor

(RPM)
.

In this article I’ll show you how to use the RPM
to determine your application’
s peak working set and to help you identify allocation
trends in your application that can be optimized to use memory mo
re

efficiently.

Let’s get started by taking a look at how the .Net Compact Framework
manages

memory as your application is running.

.Net

Compact Framework Memory Management
Basics

The .Net Compact Framework CLR has been tuned over time to make optimal use
of the device’s memory on behalf of the running application. A basic understanding of
how the Compact Framework uses memory provides es
sential background that we’ll use
later when we discuss how to use the RPM. As your application is running, the CLR
makes numerous memory allocations to support the basic runtime services your
application needs.
Many of these
are
allocations
you’d expect
, such as the memory that
is allocated in the garbage collection heap each time your application creates a
n
instance
of a reference type using the
new

keyword. The need for other types of allocations isn't
always
so
clear
. For example, the CLR allocates
various "bookeeping" data that it uses
internally. These internal data structures are used to track which classes have been
loaded, which methods are in those classes, whether those methods have been jit
-
compiled, and so on.

I’ve grouped the allocations
made by the CLR into the following 5
categories:



Application and Class Library assemblies.

The CLR

load
s

all of the IL code

and
metadata

for both the application

itself

and
the .Net Compact Framework
class
libraries
. The IL is clearly needed so the JIT c
ompiler can generate
executable

native
code, while the metadata is used by the CLR class loader to create some of the
internal bookkeeping data structures I referred to earlier.



JIT
-
compiled native code.

As an application is executing, the JIT compiler i
s called
upon to generate the native code for each method that is accessed. This native code is
stored in a
dedicated heap that can grow and shrink depending on memory pressure.




Allocated reference types.

Even the simplest managed applications allocate
r
eference types. These types are typically created with language keywords such as
new

in C# or Visual Basic.NET. A basic “Hello, World” application will cause
instances of types including forms, menus, controls, strings and so on to be created.
In additi
on, instances of application
-
specific types are
often

created as well.
The
memory for all reference types comes from the garbage collector’s heap
.




In
-
memory representation of type metadata.

As classes and their methods are
needed during the execution of a

program, the CLR reads their metadata
and
generate
s

a set of data structures used to track
the state of a running program.




Miscellaneous allocations.

In addition to the categories of allocations described
above, the CLR generates a small amount of additi
onal data as it runs an application.
D
ata in this category includes stubs that the JIT compiler uses to determine whether a
method has been compiled.



To get a complete view of all the memory required to run an application, we must
also consider the memor
y needed to hold the code for the .Net Compact Framework CLR
itself. The CLR consists of two dlls: mscoree.dll and mscoree2_0.dll.
These dlls
,

along
with the native portion of the Compact Framework’s Windows Forms implementation,
netcfagl2_0.dll
, are
loa
ded by WindowsCE the first time a managed application is run.

Not all of the allocations described above are of equal importance when analyzing
memory issues with device applications. Some categories of allocations are more critical
to monitor than others

because of the way the Compact Framework uses the WindowsCE
memory m
odel.
There are two important axes to consider: 1) whether a given category of
allocations is shared among all applications or is specific to a given application, and 2)
whether the memo
ry used for a given set of allocations can be paged by WindowsCE
when the device is under memory pressure or not.

WindowsCE provides three general memory areas
as shown in Figure
1:


Figure 1

A high
-
level view of the
WindowsCE Memory Model




System Code Space.

The read
-
only code pages for all system dlls, such as coredll.dll,
are loaded into this space. There is one system code space per device so all
applications
share the code pages for the system dlls. Windows CE can page portions
of this memory to storage and pull them back later if needed.



Per
-
Process Address Space.

The
re are two primary reasons

that device applications
encounter memory issues

on WindowsCE. F
irst, the
virtual address space
WindowsCE
provide
s

to each application is only 32MB.
Second, the data stored in
this per
-
process space cannot be paged out under memory pressure.
The stack for
each thread in the application, the code pages for the applica
tion’s executable files

(if
it contains native dlls)
, and
any per
-
application heaps are examples of

data
elements
stored in this space.




High Memory Area.

The 1GB high memory area provides virtual address space
from which requests for large amounts of virt
ual memory can be satisfied.
Large
memory allocations and
all memory mapped files are stored in high memory. All data
stored in the high memory area is visible to applications on the device. Windows CE
can swap pages from the high memory area to storage an
d back if needed.



Because the per
-
process address space is relatively small, and because its contents
cannot be paged, making efficient use of the per
-
process space is the best way to ensure
your application behaves well on memory constrained devices. F
igure 2 shows that the
garbage collection heap, the jitted code heap and the heaps used to store the CLR runtime
data structures and other temporary allocations are stored in the per
-
process address
space.



Figure 2

The mapping between Compact Framework memory allocations and the Windows CE
memory model.

Now that we know where to look, let’s see how the RPM allows us to
graphically
see how the various CLR heaps
stored in
the per
-
process

space grow and shrink as
an

application runs.

Using the .Net Compact Framework Remote
Performance Monitor

Memory Management
Counters

If you’ve spent much time debugging .Net Compact Framework applications, you
may be familiar with

the performance statistics contained in “.stat” files.

By setting a
registry key you can direct the Compact Framework to write a variety of performance
data into a text file with a .stat extension after your application closes (see
http://blogs.msdn.com/davidklinems/archive/2005/12/09/502125.aspx

for more details).
The Remote Performance Monitor tool introduced in version 2 Service Pack 1
of the
Compact Framework
is a
GUI a
pplication that runs on your Windows desktop machine

and displays performance data from an application running on a device
.
The data
displayed by RPM is the same data containe
d in the .stat files. However, RPM
makes
performance analysis much easier becau
se the data is updated and displayed continually
while the application is running.

After installing the RPM (see the sidebar “
Installing the Remote Performance
Monitor
”) you can
view
dynamic performance statistics for your application

by
launch
ing

netcfrpm
.exe

from your Windows machine and select
ing

the
"
Live Counters...."

option
under the
File

menu.


Doing so displays the window shown in Figure 3.


Figure 3

The RPM “Live Counters” Windows


When connected via ActiveSync, your device will automatically show

up in
“Device” dropdown on RPM’s main form (if you aren’t using ActiveSync

you must enter
the IP address of your device in the “Device” dropdown. You can get your IP address by
running netcflaunch.exe from the
\
windows directory on your device
). After s
electing
your device, type the fully qualified name of the application you’d like to launch in the
“Application” text box and select the “Connect” button.

After RPM connects to your device and launches your application, several
performance statistics will
be displayed in a grid on RPM’s main form

as shown in
Figure 4.



Figure 4

Performance Statistics are displayed in a grid on RPM’s main form.

These statistics are grouped by category. There are categories for most functional
areas of the CLR including n
ative code interoperabilty, class loading, generics and so on.
The counters we are most interested in are those in the “
Memory” and “GC” categories.

I’ll briefly describe the counters here
and will

discuss how to use the counters to perform
more extensiv
e analysis later in the section entitled “Analyzing the Data”.
The

following
counters
in the Memory category are used to
view the size of the
5
per
-
process heaps
created by the CLR:



App Domain Heap.

The CLR
data structures that represent

the loaded assemb
lies
and classes are kept in this heap. The App Domain Heap is unique in that it never
shrinks. It will grow as long are the application continues to load types and is only
freed when the application exists.



GC Heap.

Most of the action in a managed appl
ication occurs in the GC heap.
Fortunately, the NetCF GC is optimized to shrink the heap and return memory to
WindowsCE when needed. Even so, the amount of activity within the GC heap is a
good indicator of the overall efficiency of your application.



JIT

Heap.

The native machine instructions produced by the JIT compiler are stored
in the JIT Heap. The JIT Heap will grow until the application experiences memory
pressure or is moved to the background, at which point

the CLR shrinks the heap as
much as it c
an without disrupting the execution of the application.



Process
Heap
and Short Term Heap.

The various other allocations made by the
CLR are stored in either the Process Heap or the Short Term Heap. Both of these
heaps are typically small and have little
, if any, impact on performance.

The GC category includes numerous counters used to view the workings of the
GC. I’ve found the following counters particularly useful:



Garbage Collections.

The number of times the garbage collector has run.



GC Latency Time
.
The amount of time spent in the garbage collector. Both the total
time, and the average time per collection are reported.



Bytes Collected by GC
.
The number of bytes collected by the GC. Both the t
otal
number of bytes and the average number of bytes col
lected each time the GC ran are
reported.



Objects Finalized
.
The number of objects that had finalizers to run.



Managed Objects Allocated
.
The number of objects allocated as your application
ran.



Boxed Value Types
.
The number of value types that were boxed.

This number will
always be a subset of Managed Objects Allocated.



Managed String Objects Allocated
.
The number of string objects allocated. This
number will also always be a subset of Managed Objects Allocated.

In addition to displaying the data in the
grid, RPM is also integrated with the
standard Windows Performance Monitor so you can view the performance data
graphically. Each performance category reported by RPM shows up as a performance
object in Windows Performance Monitor as shown in Figure 5
.


Figure 5

Compact Framework Performance Counters shown in the Windows Performance
Monitor

Individual counters can then be selected for graphing. Figure 6 shows an example graph.


Figure 6

A graphical view showing the size of the
GC Heap over the lifetime

of an application




Sidebar:

Installing the Remote Performance
Monitor

The Remote Performance Monitor includes some files that reside on the desktop
machine and some files that must be present on the device.

The desktop components are
installed automati
cally by the version 2.0 Service Pack 1 setup program. After the
installation completes, the RPM executable
(netcfrpm.exe)

is placed in the bin directory
of Compact Framework SDK.


On my machine, this directory is
C:
\
Program
Files
\
Microsoft.NET
\
SDK
\
Compac
tFramework
\
v2.0
\
bin.

Installing the device
-
side components involves manually copying two files from
the desktop machine to the device.


The setup program places the device
-
side files in the
same directory as the cab file that matches your processor type an
d operating system
version.


I have a Pocket PC 2003 SE device, so my device
-
side components are installed
in
C:
\
Program
Files
\
Microsoft.NET
\
SDK
\
CompactFramework
\
v2.0
\
WindowsCE
\
wce400
\
armv4.

The
two files you must copy are
netcfrtl.dll

and
netcflaunch.exe
.


Both of these files must be
copied to the
\
windows directory

of your device.

There are two issues you may run into when installing the device
-
side RPM
component
s on Windows Mobile 5.0 devices:



Depending on the security configuration chosen by the device
manufacturer, you may
see a security prompt on the device the first time you launch the RPM. This prompt
appears because netcfrpm.dll is not
digitally
signed.



An additional installation step is necessary on Windows Mobile 5.0 devices to
provision the devic
e so the RPM can run. Provisioning involves copying the
following XML text into a file and using the rapiconfig utility to send the XML file to
the device.


-

<wap
-
provisioningdoc>



-

<characteristic type="Metabase">



-

<characteristic type="RAPI
\
Wi
ndows
\
netcfrtl.dll
\
*">



<parm name="rw
-
access" value="3" />



<parm name="access
-
role" value="152" />



<!
--

152 maps to "CARRIER_TPS | USER_AUTH | MANAGER"
--
>



</characteristic>



</characteristic>

</wap
-
provisioningdoc>



F
or example, if you
paste

the above XML text into a file named rpmprov.xml you'd
issue the following command from your desktop machine to provision your device:

rapiconfig /p rpmprov.xml



Analyzing the Data

Often times

the hardest part of performance analy
sis is getting the data you need
to determine what’s going wrong. Fortunately, RPM makes data gathering easy. All
that’s left is to analyze the numbers. In this section I’ll describe how to use the memory

and GC counters to diagnose the

most common memo
ry
-
related performance issues.

Per
-
Process Heap Sizes

When diagnosing performance issues,
I typically start by looking at the size of the
three largest heaps: the App Domain Heap, the JIT heap and the GC heap.

The App
Domain heap and the JIT heap are dire
ctly related to the number of types your
application loads and the number and size of the methods it calls. As such, there
generally isn’t much you can do to make these heaps smaller, outside of refactoring your
entire application. Generally, both the Ap
pDomain and JIT heaps are well under 1MB.
I’ve seen the JIT heap as large as 2.5 MB for very large applications, but that’s definitely
not the norm. Even though there isn’t much you can do to minimize the size of these two
heaps, it’s often interesting t
o look at their combined sizes to get an indication of how
much of the 32MB virtual address space is left over for other allocations such as those
needed

to store reference types in the GC heap.

A large GC heap isn’t necessarily a
concern in and of itself

unless the heap is consuming so much
of the 32MB space that
little is left for other operations
.

A GC heap of 4 or 5 MB isn’t uncommon.

The amount
of activity, or churn, in the GC heap is often more of an issue than the size itself.

GC Activity

When tak
en together, the combination of the Garba
ge Collections, GC Latency
Time, Managed Objects Allocated and
Bytes Collected by GC counters will tell you if
the GC is running more often and is spending more time than it would have to.

Pay
particular attention
to the average GC latency. A high average GC latency is often an
indicator that your application is continually creating a large number of managed objects.
This trend can be verified using the Managed Objects Allocated counter.

If it looks like GC is an
issue, the next step is to determine where the managed
objects are coming from. The ratio between either Boxed Value Types or Managed
String Objects Allocated and Managed Objects Allocated will tell if you

the majority of
objects are being creating “on yo
ur behalf” by operations such as boxing or string
manipulations.
If so, look for
places in your code where you may be modifying instances
of
System.String

or using collection classes that take instances of
System.Object

as a
parameter (see my Garbage Coll
ection article in the February issue of .Net Developers
Journal for more details).

Another important GC counter is the Objects Finalized counter. Objects with
finalizers affect the GC because the memory occupied by the object cannot be freed until
the fin
alizer has been run so o
bjects will finalizers are not freed until the next garbage
collection after the object is no longer referenced. Effectively, objects with finalizers
lengthen the time the memory they use is required. In extreme cases, this can pu
t extra
memory pressure on the system.

The Asterisk: Native Allocations

This article has focused on allocations made in memory controlled by the
Compact Framework CLR. As we’ve seen there are tools available to give you a pretty
good idea of how your appl
ication is behaving with respect to managed memory.
However, there’s another category of memory that we haven’t considered yet: memory
that is not controlled by the CLR, or what I’ll call “native” memory.

Native memory is
of interest because under most c
ircumstances requests for native memory are satisfied out
of the same 32MB virtual address space that the CLR uses to manage the heaps it creates.

All applications use at least some native memory because WindowsCE allocates
various operating system and win
dowing objects in your per
-
process address spac
e as
your application runs. Your application will use much more native memory if it
interoperates with native code using either P/Invoke or COM. In these cases, memory
will be taken from the 32MB virtual add
ress space both to hold the code pages
that make
up the native dlls and to satisfy any memory allocations made from within the native
code. Because the Compact Framework is not directly aware of native memory
allocations made in the process, the RPM does
not report them. As a result, if all of the
counters we’ve discussed in this article look fine and you’re still having memory
problems, look to see if the native dlls your application is using
allocate

significant
amounts of native memory.

Summary

The .Ne
t Compact Framework Remo
te Performance Monitor is a new
diagnostic

tool that can help you narrow down performance issues in your device applications.

In
this article I’ve focused on how RPM can help you identify areas within your application
that can be t
uned to make more efficient use of memory.

RPM is integrated with the
standard Windows Performance Monitor so you can graphically view memory allocation
trends within your application. By using the various memory
-
related counters offered by
the RPM, a de
veloper can determine not only the extent to which services like garbage
collection affect performance, but can also generally identify the source of their
performance issues.