Vikram Singh Saini
“ Intel VTune is software from
parents, relatives and friends who
us in documentation of Intel
also grateful to our faculty
for his keen guidance and
support while documentation of this document in your hands.
to thank God who
gives knowledge and potentiality to write this all documents clearly and present
it in front of
During the documentation
feel lot of troubles such as crashing of Windows and damaged
to computer hardware, hindrance in working of software Intel VTune.
would also like to
mention the name of the
, for providing his PC for
presenting this application.
And at last
would like to say a special thanks to NIIT who provides us a wonderful chance to
present this document in front of you all readers.
Thanks everybody who directly or
helps for presenting this file.
Vikram Singh Saini
Diff.between TBS & EBS
What happens during Sampling
Features of Sampling
Sampling Over Time
Features of Call Graph
Features of Counter Monitor
Working of Counter Monitor
Tuning Assistant Concepts
Features of Tuning Assistant
Understanding Tuning Methodology
Strategies for Improving Performance
Types of Advice
Information that Tuning Assistant provides
VTune analyzer provides an integrated performance analysis and tuning environment
that helps you analyze your code's performance on systems with IA
32, Intel(R) 64, and IA
VTune analyzer can plug in into Microsoft Visual Studio and Eclipse
can work with the VTune analyzer using the graphical interface and command line
interface. All commands to create and run Activities must be preceded by
The VTune(TM) Performance Analyzer can analyze the performance of your Linux*
application. The VTune analyzer is installed on a controlling system and controls the run of
your Linux application on a Remote Agent system. The VTune
analyzer then collects data on
your Linux application by collecting data remotely.
When the VTune(TM) Performance Analyzer analyzes the performance of your Java*
application or applet (.class), the Virtual Machine (VM) and Just
er (JIT) are
enhanced to provide the VTune analyzer with specific information required to analyze the
performance of a Java application.
During sampling, the VM and JIT provide the VTune analyzer with information about JIT
compiled Java methods being loade
d into memory, such as their memory addresses, sizes,
and symbol information.
The VTune(TM) Performance Analyzer enables you to profile .NET* and ASP.NET web
services running on your machine.
The VTune analyzer will set the necessary environme
nt variables and restart the web service
before collecting sampling or call graph data. The environment variables will be deleted and
the service restarted on completing data collection.
sampling configuration wizard
call graph configuration
ASP.NET/.NET web services.
INTEL VTUNE PERFORMANCE ANALYZER
Provides a graphical view of the application and helps you identify
critical functions and timing details in the application.
Calculates the actual performance of an application over a period
based sampling) and for various processor events(Event
Provides system level performance, such as resource
consumption, during the execution of an a
Provides tuning advice from an anzlusis of the performance
data. The tuning advice helps you improve performance of an application.
Helps identify the area of code that takes the maximum CPU
REQUIREMENTS OF SOFTWARE
Core Intel(R) Xeon(R) Processor 5300 Series
Core Intel(R) Xeon(R) Processor 5100 Series
Core Intel(R) Xeon(R) Processor 5000 Sequence
Core Intel(R) Xeon(R) Pr
ocessor 7100 Series
Core Intel(R) Xeon(R) Processor 7000 Sequence
Core Intel(R) Xeon Processor LV
Intel(R) Xeon(R) processor MP
Intel(R) Xeon(R) processor
Core Intel(R) Itanium(R) 2 processor 9000 sequence
Low Voltage Intel(R) Itanium(R) 2 P
Intel(R) Itanium(R) 2 processor
Intel(R) Core(TM)2 Quad processor
Intel(R) Core(TM)2 Extreme
Intel(R) Core(TM)2 Duo processor
Intel(R) Core(TM) Duo processor
Intel(R) Core(TM) Solo processor
Intel(R) Pentium(R) D processor 900
Intel(R) Pentium(R) D processor
Intel(R) Pentium(R) 4 processor Extreme Edition
Intel(R) Pentium(R) processor Extreme Edition
Intel(R) Pentium(R) 4 processor
Mobile Intel(R) Pentium(R) 4 Processor
Intel(R) Pentium(R) M processor
R) Celeron(R) M processor
Intel(R) Celeron(R) D processor
Intel(R) Celeron(R) processor
Mobile Intel(R) Celeron processor
bit operating systems supporting IA
Microsoft* Windows XP Professional Service Pack 2
Microsoft* Windows Server 2003 Enterprise Edition Service Pack 1
Microsoft* Windows Server 2003 R2 Enterprise Edition
Microsoft* Windows Vista*
Microsoft* Windows Server 2008 RC0 (build 6001)
bit operating systems supporting Intel(R) processors with Int
Microsoft* Windows XP Professional x64 Edition
Microsoft* Windows Server 2003 Enterprise x64 Edition
Microsoft* Windows Server 2003 R2 Enterprise x64 Edition
Microsoft* Windows Vista*
Microsoft* Windows Server 2008 RC0 (build 6001)
ating systems supporting Intel(R) Itanium(R) architecture processors:
Microsoft* Windows Server 2003
Enterprise Edition Service Pack 1
Microsoft* Windows Server 2008 RC0 (build 6001)
SYSTEM MEMORY REQUIREMENTS
At least 128 Megabytes of RAM
At least 105 Megabytes of available space on a local drive
20 Megabytes of disk space is required for system files on the drive containing the system
directory (for example,
additional hard disk space is needed for updating and installing the DLLs and OCXs that the VTune
analyzer requires to be in the system directory.
Sampling is the process of co
llecting a set of data for analy
sis and r
epresenting the analyzed
data in a statistical format.
Use the collected data to identify the critical processes, threads,
modules, functions, and lines of code running on system.
During sampling, the VTune(TM) Performance Analyzer monitors all the softwar
on your system including the operating system, JIT
compiled Java* applications,
and device drivers.
Sampling does not modify binary files or executables in order to monitor the performance
of application. The VTune analyzer
analyzes the collected samples and
helps you to identify:
Is a section of code within a module that took long time to
This results in high amount of processor time spent executing that
section, thus generating lot of samples for that module.
Is an area in the code that is slowing down the execution of
Bottlenecks appears as hotspots in hotspot v
bottlenecks and hotspots optimize the application.
TWO TYPES OF SAMPLING MECHANISM TO COLLECT SAMPLING DATA
BASED SAMPLING (TBS)
The VTune(TM) uses the operating system timer
to interrupt and collect samples of all active instruction
addresses at a regular time
interval (1ms. by default).
The collected samples provide the performance data of all
the processes running on the system. Processes that took the longest time to
execute have the highest number of samples.
performance problems caused by processor events, such as Cache Misses and
From the EBS data,
can determine which process, thread, module, function, and
source line in program
generated the most processor events, and if any of those
events impacted the performance of program.
The VTune analyzer provides
recommended for use by performance analysts at Intel.
FIGURE 1: Event based sampling
TBS AND EBS
Data is collected using Clocktick events
. But when HLT instructions are
executed by processor clock, the processor clock causes the clockticks events to
This results in no samples being collected while the processor is in
The VTune will report few samples than you were expected.
Data is collected using OS timer. And OS timer is not affected during HLT
And the samples are collected accuratelty.
TBS can potentially gives
more accurate data.
WHAT HAPPENS DURING SAMPLING
When you run an Activity configured with the sampling collector, the VTune analyzer does
Waits for the
time (if specified) to elapse and then starts collecting
Interrupts the processor a
t the specified
and collects samples of
For every interrupt, the VTune analyzer collects one sample.
Stores the execution context of the software currently executing on system.
FEATURES OF SAMPLING
The following ar
e the main features of the sampling collector and views:
Multiple event sampling.
event based sampling with multiple events
run. Depending on the type of processor using, the VTune analyzer can monitor and
collect samples on tw
o or more events in one run.
Collect sampling data for an application running on a remote
system. Your remote system can be a machine running on any operating system
supported by the VTune analyzer.
Collect sampling data for applications running on systems enabled with
The following sampling views help you analyze the data:
View the threads running within a process and select one or more
drill down to specific hotspots.
Opens default for clocktick events.
. Display a system
wide view of all the
running on your
system when sampling data was collected.
. Display all the modules within selected
. Display function names associated with selected modules. Group
hotspots by function, related virtual address (RVA), source file, or class.
The following panels and toolbar options are available from the sampling view:
A sampling toolbar is available at the top of each sampling view.
This toolbar includes buttons labeled Process, Thread, Module, Hotspot, and Source.
Select items within a view and click one of the buttons to drill down.
hen you open a specific
, a tab is created at the
bottom of the window labeled with the name of the view, for example, Process,
Thread, Module or Hotspot. If you open several views, a tab for each open view is
created at the bottom of the wind
ow. You can use the tabs to quickly move from one
view to another.
Display your sampling data in a Microsoft Excel 2000 spreadsheet.
You can customize the appearance of the spreadsheet report as needed.
Selection Summary panel
a panel displaying the events configured in an
Activity and the number of samples collected per event for the items you select in a
. Display a detailed legend for all sampling views. Each Activity result, event,
and event ratio is color
The legend explains what each color represents.
Event summary panel.
Display the total number of events collected for items you
select in a view.
. Display the workload as distributed across multiple processors.
time view displays the samples collected for single event.
enables you to identify which thread are running serially and in parallel at any
point of time.
Sampling Over Time view can be invoked for Thread,Process and Module views.
Sampling over time view
consists of two panels. The left panel displays the names of
the selected items and the right panel displays the samples collected over time.
right panel is divided into squares, each square representing a unit of time in
The color of the squares indicates the number of samples collected for that unit of
time. A red square indicates a large number of samples, and a green square indicates
a small number of samples.
Sampling Over Time
The Over Time view can be
One can determine if there is excessive context switching.
Enables you to view whether processor is idle or not.
process receives samples there is scope for
improving processor utilization at that
Temporal loction of hotspots:
We can see the specific periods of time when a large
number of events occurs.
You can view the pattern of thread behavior and thread interaction.
Viewing the fo
otprint of each thread:
You can view the footprint of each thread on
Threading technology enabled processors.
The call graph collector of the VTune(TM) Performance Analyzer collects information
the program flow of an application, that is, the number of function calls to some other
function and the amount of time each function spent executing its code and/or calling other
A function can be a
A parent function that calls
the current function.
A child function that is called by the current function.
In many cases, the caller may call the callee from several places (sites), so call graph also
provides call information per site.
FEATURES OF CALL GRAPH
are the main features of the call graph collector and views:
Manual launching mode.
your application from the desktop and
select required modules of interest to analyze.
Level Data Collection
Configure the call graph collector to instrument and
level DLLs even when the application itself cannot be instrumented.
Select exactly which
functions to instrument
the speed of the instrumented applica
tion by using improved filtering capabilities.
Collect data for more than one process with fully
Profile COM interface methods using the call graph collector.
collect call graph data using the VTune analyzer, you can view the call graph
profiling information at the following levels:
rovides visual graphical presentation of the application execution.
displays the selected function(s), the function's pa
rents (callers), its child functions
(callees), and timing information.
in the graph represents a function.
(line with an arrow)
connecting two nodes represents the call from the
parent to the child function. For every function
you can traverse caller and callee
The call graph view uses the following conventions:
Nodes connected by thick red edges designate functions on the critical path from the
The thicker the edge, the greater the Edge time.
Uses of t
estimate the performance of your application
find potential performance bottlenecks
, which is a path with the maximum
Graph view of Call Graph
rovides full information on the selected
function, its callers
(parents) and callees (children) in the table format.
is the function which is currently being viewed and the focus is on
that function. It shows the threads
and classes associated with it.
is the function which calls the focused function. Along there are
columns of contribution, Edge time, thread,class etc.
is the function which is been called by the focused function.
are also columns almost same as that of caller function.
Call List view of Call Graph
rovides full information on all the application functions in
the table format.
The rows in the function summary display functions
background colors according to the hierarchical position. The default view shows the
first four types of data as follows:
Function Summary view of Call Graph
Following are the various options available from the
call graph view:
Gain different perspectives on your data using the wide range of
filtering options available
Conveniently view detailed function information using tooltips
w Java function calls and Win32
function calls in the same
call graph results.
with an expanded collection of
wait times for functions and calls. Traverse Self Wait time, Total Wait time, Edge
time, Edge Wai
t time, and
from node to root and from node to bottom.
Node state indicators.
Adjust the color palette for any graph elements and control
node length settings to support long function names. Node state
three different types of
node status, facilitating orientation within the graph view.
Control a wide range of options in the
the function summary pop
contains enhanced features,
provides quick and easy access to the
most commonly used commands.
Make changes to the way you view data, then return or
advance forward through several cycles of changes.
Counter Monitor identifies system
level issues in applications. It is used to track system
activities when the application runs on the system.
Counter Monitor collects data for specific performance counter data, such as that of an
application, an OS, or a
hardware device at different intervals of time.
monitors and graphically displays the performance counter data.
is a feature that measure and gathers performance related data that
represents the state of
the system without affecting the performance of the program.
Counter monitor also helps you to understand the cause
effect relationship between
an application and the sytem on which the application is running.
If you develop application
counters using performance dll’s ,the VTune analyzer will also monitor and display
these counter values.
FEATURES OF COUNTER MONITOR
The following are the main features of the counter monitor collector and views:
to monitor hardware and software counters at
predetermined intervals according to criteria that
is an event that tells the VTune™ Performance Analyzer when to collect counter
data. The VTune analyzer uses the system timer as the
For the system timer,
performance data is collected once per second when the default interval (1000 milliseconds)
Following are the counter monitor views to help analyze the data:
VTune analyzer generates a graph that
shows changes as they happen. View data as you log it or review data after the run.
This is the default
view which runs on completion of an activity.
. Displays data logged during an Activity
In the Lo
gged Data view,
data from each counter selected for logging is charted with a separate line and color
Each line on the chart represents data for a specific performance counter
. The peak
indicates the highest counter value. Moving the cursor over a counter
on the chart
displays a tool tip with the value of the counter at that point in time during data
Logged Data View of Counter Monitor
The peaks in each counter indicates the highest counter activity.
For example, a peak in the
counter that measures
per second indicates that the most page faults occurred
at that point in time during data collection.
Each line includes a distinct legend symbol for the corresponding counter,
representing the point at which data was taken. The vertical Y axis represents counter
values (scaled or actual), while the corresponding time is displayed on the horizontal X axis.
ummary Data view
Displays a statistical view of the counter data.
Summary Data view provides statistical information for each counter you selected for
display in the Logged Data view. This information includes:
This enables you to determine which values were the most active, or
and drill down from a Logged Data view of those values.
Data View of Counter Monitor
summary data for each counter is represented as a bar diagram:
of the diagram is the maximum value for the counter (in the example:
% Total Processor Time counter), the
is the minimum value, and the
(violet bar in the example) is the average counter value.
Following are some opti
ons available from the counter monitor view:
. Choose a chart style best suited to the data you w
ant to view using
the Chart FX
WORKING OF COUNTER MONITOR
select an Activity with the counter monitor collector in the
click Run Activity to begin performance data collection, the VTune analyzer does the
Launches the specified application, if any.
Starts monitoring and logging the counter values.
The VTune analyzer collects performance data for al
l the counters of a performance
object but displays only the counters you select.
with a chart showing the counter data as it is being
collected, if the runtime display option is selected.
If sampling data collection was turned on, it also starts collecting
At the end of an Activity run, if counter monitor data was logged, the VTune analyzer
does the following:
Creates an Activity result with the coun
ter monitor data and shows it in the
Displays the counter monitor Logged Data view if the counter monitor data is
the only type of data that was collected, or prompts you to pick a view to
open if multiple types of data were collected.
The Intel(R) Tuning Assistant provides advice on tuning your system resources and
application performance. Using its multiple knowledge bases, the Tuning Assistant analyzes
the data collected by the VTune(TM) Performance Analyzer, identifies performance is
on the following types of data:
Sampling data collected on supported processors
Counter monitor data collected on supported operating systems.
C, C++, Fortran, or Java* source code
Disassembled assembly code
TUNING ASSISTANT CONCEPTS
The following are some key Tuning Assistant concepts:
All the software that was executing when data was collected.
An insight is an observation about the performance of your code. It indicates
a potential perfor
mance problem that could be a bottleneck to your application’s
Advice is a possible solution or recommended workaround (usually a
suggestion to modify the code) to remove or avoid a performance problem.
A relevance score is a heuristic to indicate how relevant a
particular insight or advice is to the current context. For instance, an extremely high
relevance score for an insight may indicate a high probability of a performance
The Tuning Assi
stant provides tuning advice for code, processes/modules/functions, or time
ranges that you select in source, sampling, or counter monitor views. If you provide symbol
information, the Tuning Assistant window provides links from your function names directl
to the corresponding code section in Source View.
FEATURES OF TUNING ASSISTANT
The Intel(R) Tuning Assistant has the following features to enable analyzing the performance
of your application:
Provides insights and advice on potential performance proble
ms by analyzing
sampling data collected on supported processors (See the Release Notes for a
complete list of processors for which the Tuning Assistant can provide insights and
advice). You can use the insights and advice to make algorithmic changes to you
application so the processor can execute your application more efficiently.
Contains knowledge bases to support Hyper
Enables you to compare two or three Activity results.
Provides links from function names directly to the
corresponding code section in
source view when you provide symbol information
Provides advice on performance counter data and disassembly code
Provides static assembly advice.
Guides you through the key steps of performance tuning methodology
ability to export the tuning advice report to a
values) text file for viewing and editing using a different application, such as
UNDERSTANDING TUNING METHO
The main objective
el tuning is to optimize the
utilization of system resources. The tuning speeds up application performance by
improving the way the application interacts
with the sytem.
This tuning is effective
for I/O applications.
purpose of application
level tuning is to reduce
the execution time of an application.
This can be achieved by improving the
algorithms of the applications, implementing threads, and by using APIs.
Increases the performance
of application by
improving the way an application runs on a processor.
This type of tuning is used
IMPROVING PERFORMANCE OF APPLICATION
Enables to speed up application when proc
is low. Processor utilization drops when the processor is waiting for I/O to complete.
Need to make changes in app. during system level and application
Improving threading model
By adding multithreading to single
Improve efficiency of app. by increasing processor utilization.
Improving the efficiency of computation
Speed up application by making changes
to the application to accomplish the same amount of work by using less
TYPES OF ADVICE
Tuning assistant automatically analyzes the sampling
data,identifies performance issues, and provides insights on the issues.
click an insight, the
window provides additional information.
that can be use to view the relevance of a
particular insight to performance issues.
Tuning assistant performs
counter analysis based
on all counters measured in activity.
After analysis, TA displays insights into potential
TA uses a compiler technology for source
based advice, which
enables you to speed up the executi
on of code. But it is limited to C,C++ and Java
Static Assemble Penalties
VTune analyze code at assembly language level
categories of information that TA displays are:
Indicates a specific problem and the effect of the pr
performance of code.
Indicates potential problems that might degrade the performance.
INFORMATION THAT TUNING ASSISTANT PROVIDES INCLUDES:
Indicates the problem that could be hindering the performance of the
Various categories of insights are:
That are estimated to have significant impacts on performance.
Enables to identify the maximum optimization
that one can achieve for the
Are performance issues for all mod
(See fig. 8)
Focus on performance issues for the modules in
(See fig. 8)
Insights on performance issues
based on functions that are
sorted by percentage of CPU time.
izes the features that the system uses such as sped of
processor and the name of the operating system.
View information about possible optimizations to improve
Window of Tuning Assitant Advice
Indicates the relevance of the insight or advice to a particular
For example, a high relevance score indicates that the effect of
the problem on the application is significan
(See fig. 9)
TUNING ASSISTANT ADVICE
to remove or avoid a problem
lick on links
as shown in fig. 8
following are the references which have been used for documentation purpose.
Intel VTune software help file is used.
Intel VTune Performance Analyzer Essentials (Author: James Reinders)
Semester Intel VTune (By NIIT)