>> Ben Zorn: So it's a great pleasure to introduce Harry Xu from Ohio State University. Harry is a PhD student and he's going to graduate this year. So he's here to talk about his PhD work. Harry had some great internships at IBM research and some -- two internships sort of prompted his line of thinking around understanding how programs use memory and the aspect of memory bloat which he's actually investigated over the

redlemonbalmMobile - Wireless

Dec 10, 2013 (3 years and 8 months ago)

61 views


>> Ben Zorn: So it's a great pleasure to introduce Harry Xu from Ohio State
University. Harry is a PhD student and he's going to graduate this year. So he's
here to talk about his PhD work.


Harry had some great internships at IBM research and some
-
-

two internships
sort of prompted his line of thinking around understanding how programs use
memory and the aspect of memory bloat which he's actually investigated over the
past few years with a number of really interesting publications.


Harry has rece
ived several awards, including an IBM Research fellowship, a
fellowship at the
--

or a distinguished paper award ICSE and also a departmental
fellowship at Ohio State University.


So with that, I'll introduce Harry. Thank you.


>> Guoqing (Harry) Xu:

Thank you. Thank you, Ben for introduction. So it's a
great pleasure here to talk about my research. So okay. So let's get started.


So I'm actually a program analysis person. I'm interested in both theoretical
foundations, program analysis and thi
s applications. So the important
applications that forms the basis of my PhD thesis is to use static and dynamic
program analysis techniques to help programmers find and remove what we call
runtime bloat in modern object
-
oriented software.


So this rese
arch here is motivated entirely by the really world problems that we
have regularly encountered and studied. So here I'm going to tell you the story.


All right. So as probably all of us here of already seen the past decade has
witnessed a tremendous i
ncrease in the size of software which now contains
more and more functionality and consume more and more resources in order to
solve increasingly important problems.


So this picture shows the growth of the total number of objects in the heap in the
larg
e Java server applications over the course of two years, between 2002 and
2004. So the total number of objects in the heap has increased, for example,
from less than half a million in the beginning all the way up to 30 million at the
end. Like more than
60 times over this two
-
year period.


So what happened to this application? Why does this application have
--

consume so much memory now? Because now we have this big pile
-
up thing.
So this picture shows the architecture of the SAP, the SAP netweaver a
pplication
server where each box in this picture represents a component of the server.


So this application server has millions of lines of code and needs about 20
components to work simultaneously in order to function. So one important thing
that can b
e seen from this picture is that this application server itself is a big
pile
-
up, right? It's built on top of layers, layers frameworks and libraries. So first
of all, this can be extremely easy for such a large scale framework intense
application to suf
fer from performance problems.


So for example suppose there exists small performance issue in one of
components here. Its effect can quickly get orders magnitude more significant,
when its components become nested in layers, right? And second, if this

application has a small performance problem, it will be extremely hard for their
developers to find why. Because the problem can easily cross many, many
layers of libraries and frameworks. And the library and frameworks can come
from different vendors.

In most cases their source code is not available. So as a
result significant performance degradation and scalability problems can be
regularly seen in large scale and real world applications.


All right. So a lot of evidence has been shown that, you k
now, most performance
problems in modern object oriented software can caused not by the lack of
hardware support but instead by the runtime redundancies or inefficiencies
during the execution. So what causes the general redundancies or inefficiencies
runt
ime bloat.


A typical example of bloat is to use a very complex heavyweight function to
achieve very simple task which should have been shipped in a much easier way.
Bloat has caused applications to consume increasing large memory space. So
for example
, a typical Java application Java heap has quickly grown from, you
know, for example 500 megabytes a few years ago to 2 to 3 gigabytes today. It's
very common. But it doesn't necessarily mean that we're now supporting more
users or functions. Right?


So here is a list of common bloat effects that we found in real world applications
that we have studied. So for example, a large scale application originally
designed to support millions of concurrent users can eventually scale only to
thousand users in p
ractice. And it was extremely hard for their developers to find
out the reason. So they gave it to IBM Research for performance tuning.


As another example the designer of a large
-
scale application initially expected
only two kilobytes memory for savin
g session state per user but eventually they
found 500K, you know, for saving session state per user, more than 250 times
larger than they initially expected.


So the consequence of a bloat is not just excessive memory usage, large
memory consumption usu
ally comes with the execution of lot of redundant
operations that can cause a significant slowdown of the application. So bloat can
have huge impact on scalability, power usage and performance of large scale
real world applications that form the backbone
of modern is enterprise computing
used every day by thousand businesses.


Bloat can also have a huge impact on mobile computing where most applications
have, you know, very restrict resource constraints. According to IBM Research,
millions of dollars ar
e spent every year by its customers. Memory like bloat
detection, like performance tuning and memory leak detection. So that's why
IBM Research is pushing very hard to develop useful tools that can help
programmers identify performance problems.


All r
ight. So what can we do? What can we do to deal with this every increasing
level of inefficiencies? The first thing that we can probably think of immediately is
to use compiler optimizations. Programmers, people with general, you know,
feeling is like
this. We don't need to worry about performance in object
orientation because we have this advanced compiler technology, we have this
garbage collector, right. Let's leave it entirely to the compilers and the runtime
systems because they are always smarte
r than ourselves. That's basically what
people thinking in the real world.


However, the usefulness of this compiler
--

traditional compiler technology is very
limited in optimizing bloat away because a general dataflow in large scale
application can ea
sily cross thousand method applications. Many, many layers
of libraries and even frameworks.


This is a very large scope, way beyond what a compiler analysis would inspect.
Of course optimizing a lot of real world problems may require a lot of develope
r
insight which the compiler doesn't have. On the other hand, it's not easy as well
for the human experts to perform many optimization, primarily because of a lack
of good tools that can help them make sense of large heaps with millions of
objects and lon
g executions that can last for hours and days.


So my entire thesis here is to find a better way that can identify larger
optimization opportunities with a small amount of developer time. How we do
that? The basic methodology is to advocate compiler
-
as
sisted manual tuning in
order to take advantage of both sides, the manual optimization side and the
compiler side. Specifically I've designed, implemented and evaluated static and
dynamic program analysis techniques that can first identify the root cause
of a
memory problem or performance problem and second remove automatically or
manually some kinds of bloat patterns that we, you know, identified with the
techniques. And third, prevent bloat from occurring in the early stage of software
development. So
this is actually three
-
step approach.


All right. So our techniques were implemented on a variety of different platforms.
So for example the dynamic analysis were implemented on, you know, JVM, you
know, the IBM J9 which is the commercial JVM of IBM.
Jikes RVM, which is
open source JVM within Java. And also used JVM TI which is like general tool
interface supported by all JVMs. And, you know, the [inaudible] analysis can be
used to generate online bloat warnings during the execution of the program.


The static analysis were completed on Soot, which is a popular program analysis
framework for Java. The static analysis can be used either to produce online
--

I
mean not online, to produce bloat warnings or to generate refactoring
discussions during co
ding when the developers write their code. So this really
demonstrates that our techniques are general enough and they are not limited by
any specific platform or framework.


All right. So this slide shows and overview of the set of techniques that I h
ave
developed for my thesis. For example for dynamic analysis, we start with
profiling the execution to identify certain bloat evidences like a lot of copies, large
cost benefit ratios, high memory leak confidences. They are all strong indicators
of bloa
t.


And then by analyzing the profiled activities we generate bloat or memory leak
report.


For static analysis we analyze the bytecode of large scale applications to identify
certain bloat patterns that we have previously found using the dynamic analy
sis
in order to either generate bloat warnings or to transform the program to produce
optimized code.


So for example static analysis can be used to help programmers to identify
bloated containers. They can also be used to help programmers find and hois
t
loop invariant data structures.


So in addition to software bloat analysis, I've also contributed to some other
program analysis areas. For example I'm very interested in scalable points to
and dataflow analysis. I've done some work on control and da
taflow analysis for
AspectJ which is, you know, the major aspect of [inaudible] language. I also
have some experience with language
-
level checkpointing and the replaying.


In terms of research areas my work goes from very high level software
engineering

across programming languages, compilers, all the way down to
runtime systems. So this slide gives a detailed classification of the publications
in terms of research areas. So I actually have experience and expertise in these
three areas.


So now we're

done with the introduction part. Now let's get a little deeper into
technical problems and solutions. So here I talk about two specific program
analysis techniques to help programmers find and remove bloat. One dynamic
analysis and one static analysis.



The goal of the dynamic analysis is to find low
-
utility data structures to help
programmers do performance tuning. The second static analysis can be used to
help programmers find and hoist loop
-
invariant data structures.


So the goal
--

I mean the m
otivation of the second static analysis is based on a
lot of loop
-
invariant taught structures that we found using the first dynamic
analysis. That's how these two analysis are connected.


So eventually I'm going to talk about future work and conclusions
.


All right. So this is the first piece of work which was published in PLDI last
--

one
of my PLDI 2010 papers. The goal of this analysis is to identify
high
-
cost
-
low
-
benefit data structures that are likely to be performance
bottlenecks.


So how we
going to do that? We design a runtime analysis that computes a cost
measurement and a benefit measurement for each data structure in the heap.
And eventually we present to the user a list of data structures that are ranked
based on their cost
-
benefit rat
ios.


Intuitively data structures with high cost
-
benefit ratios are likely to be closely
related to performance problems. That's the motivation.


All right. So before getting deeper into the technical problems and the solutions,
let's first look at t
he
--

go through this motivation part. Yeah, sure?


>>: So you're talking about the cost but, in fact, the cost of extra memory isn't
--

I
mean there's sort of a subtle indirection in the sense that the cost of extra
memory isn't visible unless you do
this performance effect service, say for
example, you know, it's bigger than the cash, et cetera.


>> Guoqing (Harry) Xu: Oh, yes.


>>: So do you get to
--

when you're talking about cost, do you get to the actual
cost in terms of the impact on cycles

or do you actually
--



>> Guoqing (Harry) Xu: On cycles.


>>: Or is it just more memory is better [inaudible].


>> Guoqing (Harry) Xu: No, I mean
--



>>: [inaudible].


>> Guoqing (Harry) Xu: Right. So I think it's cost benefit is defined in te
rms of
like actual computation done, the model memory needed, used or something. So
it's like more about execution boat or
--

instead of memory bloat.


>>: Okay.


>> Guoqing (Harry) Xu: So
--



>>: Okay. So I'll let you go.


>> Guoqing (Harry) X
u: Yeah, yeah.


>>: [inaudible].


>> Guoqing (Harry) Xu: So okay. I mean this is actually
--

let's first go through
motivation. As we probably know
--

already know, runtime bloat can manifest
itself through many observable symptoms. So for exampl
e, a lot of temporary
objects that are really short lived objects, right, they're not long lived, a lot of pure
data copies without any computation done. And may be, you know, a lot of
highly
-
stale objects, meaning objects without any
--

I mean, that are
not used for
a long time. And many problematic containers. Maybe many, many other
systems that we're going to observe in the future. However, in any tuning task it
will probably be impossible for the developer to try all those different techniques,
all
those different
--

look at all those different symptoms to identify a performance
problem.


So that basically doesn't make sense. The immediate question to ask here is
that is there a common way or in other words more systematic way beyond all
those dif
ferent symptoms and techniques that can characterize different kind of
bloat even though they could exhibit different symptoms? As bloat is really about
runtime inefficiencies, we found that what is really common about different kinds
of bloat is that the
re exist operations that are very expensive, very hard to
execute but do not produce data values that have large benefit or impact on the
forward progress. In other words, the cost of those operations are out of line with
their benefits. This observation

actually motivates us of computing
--

to compute
this cost benefit measurements to help programmers do performance tuning.


So how are we going to do that? Let's first look at example. This example
shows a problem that we found from Eclipse framework
using our dynamic
analysis. So here is a method called package that takes a Java string as input
and eventually returns whether or not the string represents a valid Java
packaging name. Right? The way this method is implemented is that it first calls
an
other method called directory list in order to identify and return a list of all files
in this current directory represented by S here. And then it checks whether or not
this return list equals null. Equals null.


So it's easy to see that there's a muc
h easier way or smarter way to implement
this method, right? So for example we can directly parse this incoming stream
into two parts like Java and IO and check directly whether or not they correspond
to valid Java directory names, right? So we don't nec
essarily have to do
--

find
all the files in directory.


So in this case, for example, this, you know, big list data structure pointing to by
this variable packs has the highest cost
-
benefit ratios because intuitively there's a
lot of effort made to iden
tify those files and, you know, populate this entire list.
But eventually none of the elements in the list are used for any purpose. So the
cost
-
benefit ratio is the highest.


So for this example, I mean, this example actually shows a typical problem w
ith
the current object
-
oriented programming practice. So for example, the
programmers are encouraged to pay more attention to high
-
level abstractions
like modularity, reuse readability without considering performance.


So for example in this case, think

about why the programmer wants to do this,
why the programmer wants to implement this method in this way. The only
reason that I can see here is that the programmer just wants to reuse the
implementation of this method directory list instead of creating
a specialized
simplified version for it, right?


However, they're never aware that this piece of code can be executed for millions
of times and when this list becomes large then it really hurts performance. In
fact, by simply creating a specialized vers
ion for this method we're able to reduce
the running time for this application by eight percent, which is very impressive.


All right. Now let's go back to the definitions. So what is cost? Yeah, sure.


>>: [inaudible] eight percent easily apparent

in your profile or was it hidden by its
[inaudible] too much of the cost being [inaudible] and stuff like that?


>> Guoqing (Harry) Xu: Well, I think
--

generally I think it's the algorithm
problem.


>>: Yeah, but if someone looked and decided I car
e about the performance of
this program, I mean look at the profile, wouldn't they have been
--



>> Guoqing (Harry) Xu: Identified the problem by looking at the simple profiles?


>>: At least would he have been told to look at his package or would it
have
been spread out by mostly secondary GC costs or something like that?


>> Guoqing (Harry) Xu: Well, we don't have sort of a detailed profiles of the
--

you know, like what percentage of the eight percent comes from the GC and
what percentage comes f
rom [inaudible].


>>: [inaudible] if you just used a very simple technique for just, you know,
profiling the amount of time spent in each method.


>> Guoqing (Harry) Xu: Oh, okay.


>>: Wouldn't this directory list pop up?


>> Guoqing (Harry) Xu:

Well, I've
--

actually if you look at the G
--

and the simple
profiles of the
--

like the running time profiles of the large scale application like
Eclipse there's
--

I'll give you very specific example. So if you look at a large
scale applications like
we have profile, we have did
--

we have done this profiling
for this large
-
scale applications. We found that the most frequently executed
method for this big application is the method called hash map that I'll get.


So it turns out I mean methods like t
his can never be the most executed method
--

most frequently executed method.


>>: [inaudible].


[brief talking over].


>>: Would it have shown up as eight percent if you would have seen oh, eight
percent is for this little tiny package thing I'm go
ing to go look at that?


>> Guoqing (Harry) Xu: No. I mean, the profile cannot
--

I mean, you look at the
running time profiles it never show like eight percent for this method. Right?
Other than
--



>>: [inaudible].


>> Guoqing (Harry) Xu: Beca
use of cost, yeah.


>>: GC costs associated
--



>> Guoqing (Harry) Xu: There are a lot of different things going
--

yeah. There's
no
--

yeah. So that's probably the answer, right.


>>: Thank you. Of.


>> Guoqing (Harry) Xu: Okay. Yes. Yeah.

Yeah. Okay. Now, let's go back to
the cost benefit thing. So what is cost and what is benefit? So the absolute cost
of a heap value here is defined as the total number of instructions executed to
produce a value. So each instruction here has a three
address representation
like A equals B equals B plus C and each instruction here is considered to have
unit cost, right? So that part is a really easy to understand.


And then so it's a little tricky to define what the benefit is. As the benefit of a
v
alue is really related to how good consumption of that value is, right? So there's
no clear metric for that. So here I'm going to give you some intuitive definitions of
benefit. Later I'll talk more about the formal definitions.


So intuitively benefi
t of a value is defined a very large number if this value, this
heap value goes to program output like socket or file system because, you know,
it's actually used for any purpose
--

for some purpose.


Of course benefit is zero if this value, heap value i
s never used, you know. The
third case here is the most common case where this heap value is used to
produce another value, another heap value, V prime. In this case, the benefit of
V is defined as amount of work done to transform V into V prime.


So t
here is some kind of relationship between the cost of V prime and the benefit
of V. So again, these are just intuitive definitions. I'll give you more
--

you know,
more formal definitions later during the talk.


So first let's look at the cost computat
ion. How can we compute a cost? A
natural idea to compute a cost for a program like this will be to capture a dynamic
dependence graph, you know, that looks like this one here, right, where each
edge add should represent a data dependence relationship be
tween a pair of
instructions when writing to a memory location and the other reading from the
same location. Right?


So for example, suppose now we want to compute the cost of D here. It can be
computed efficiently by traversing backward this dependenc
e graph and counting
the total number of reachable nodes which represent the total number of
instructions executed, which is actually four in this case. So we cause this
problem of backward dynamic flow problem because in order to solve the
problem we nee
d to record and traverse backward some kind of a history
information of the execution. So this is a backward problem as opposed to a
forward problem. Right?


There are actually many, many other backward dynamic flow problems. This is
just one of them.

If you are interested in other problems you can look at the
paper or we can talk offline.


So in general the only way to solve this backward dynamic flow problem is to use
dynamic slicing that captures a dynamic dependence graph that look like this.
S
o for example
--

I mean dynamic slicing is actually a general dynamic technique
that needs to record all memory accesses and
--

during the execution and their
dependence relationship, right? So it's easy to see that dynamic slicing is
prohibitively expens
ive for large scale and long
-
run applications because the
trace that it generates, dynamic slicing generates is unbounded. It depends
completely on the dynamic behavior of the program. So it's prohibitively
expensive.


However, I mean in order to achie
ve efficiency and make our analysis more
scalable, we propose a new technique here called abstract dynamic slicing that
performs dynamic slicing over bounded abstract domains.


So the motivation of this new technique is as follows: If we look at a speci
fic
client analysis that uses the result of the slicing algorithm, the trace generated by
dynamic slicing usually provides many more details than this
--

you know, this
specific client would possibly need. So a natural question to ask here is that is it
p
ossible to let the slicing algorithm be aware of this client analysis so that it can
capture only part of the execution that is relevant only to the client? In other
words, we wonder whether or not it's possible to customize the client analysis
--

the sli
cing algorithm with the semantics of this client analysis.


The answer is yes. And we do this by looking at abstractions. So we found that
for many backward dynamic flow problems equivalence classes exist. So here is
some example that shows a fragment

of program trace that contains different
runtime instances of the same instruction A equals B.M. Each instruction
instance here is annotated with an integer I that represents the Ith execution of
that instruction. So given the client analysis, a specifi
c client analysis it would be
possible for us to divide those equivalence
--

those runtime instruction instances
into two equivalence classes like E1 and E2. So later it will be sufficient for the
client analysis to look only at those equivalence classes
E1, E2 instead of looking
at individual runtime instruction instance.


So in this way, we can potentially record only one runtime instruction instance
per equivalence class as is representative that can potentially lead to a
--

you
know, a significant re
duction in the amount of memory needed, right?


It's easy to see here that the dependence graph computed this way is an abstract
dependence graph where each node represents an equivalence class and each
edge connects two equivalence classes as long as th
eir exists a dependence
relationship between two runtime instruction instances aggregated into that
equivalence classes, right?


So we've actually generalized this idea and defined a more general theoretical
framework that can be instantiated to solve ma
ny, many other backward dynamic
flow problems. This talk focus only on high
-
level ideas. So if you're interested
--

yeah, sure.


>>: So how do you define equivalence classes?


>> Guoqing (Harry) Xu: Well, I mean, that's, you know, user has to provi
de some
kind of annotation like the designer of the analysis gave the semantics of
analysis so that our analysis, our profiling framework takes analysis
--

takes the
semantics as input and defines those equivalence classes.


>>: [inaudible] one example?



>> Guoqing (Harry) Xu: Yes, I'll give you a specific example, sure. Of course.
So what was your question now?


>>: The same thing.


>> Guoqing (Harry) Xu: Okay. Sure.


>>: [inaudible] example [inaudible].


>> Guoqing (Harry) Xu: Of cours
e. Of course. Here is example. So let's look at
the
--

you know, go back to cost computing. How do we use abstract dynamic
slicing to compute cost for a specific runtime value? So recall that the absolute
cost for a runtime value is defined as the tot
al number of instructions executed,
right, to produce the value.


So here, for example, the absolute cost for a specific runtime instruction instance
like this one, A equals B plus C annotated with 20 can be computed by traversing
backward this concrete
dependence graph and counting the total number of
reachable nodes which represent total number of instruction executed, right?
That's basically for absolute cost for the concrete dependent graph.


However, as we already know, concrete dependence graph i
s very expensive to
compute. In addition it's very hard for the programmer to make sense of specific
runtime instruction instance because the static instruction can potentially have
millions of runtime instances. So it doesn't make sense for the programm
er to let
the program make sense of the cost for specific one.


To solve the problem, though, instead of computing this absolute cost for a
specific runtime instruction instance, we're proposed to compute abstract cost for
an equivalence class of instanc
es. So for example, the abstract cost for an
equivalence class like this one, A equals B plus C with a
--

annotated with E0 can
be computed by traversing backward this abstract dependence graph and
calculating the sum of the sizes of the reachable equival
ence classes. Right, the
size of a equivalence class is essentially the total number of instruction instances
aggregated in that equivalence class.


So in this way, we can usually find that, you know, first of all, abstract
dependence graph is much easi
er to compute and, second, it's easier for the
programmer to make sense of it because the abstract cost for an equivalence
class is essentially the absolute cost for several runtime instances aggregated
based on some kind of abstractions, right?


Now the

question is what is the proper abstraction that we can use for computing
cost here? That's basically you guy's question, all right. All right. So here we
use calling context to define equivalence classes. In other words, those E102,
E47, E0 are object
-
sensitivity
-
based calling context, specifically each of the E
here is a chain of receiver object allocation site for the call site on the call stack.
In this way it will be natural for us to aggregate runtime instruction instances
based on object
-
oriente
d data structures.


So if you're interested in object sensitivity we can definitely talk offline. That's
basically the general idea. Okay?


>>: [inaudible].


>> Guoqing (Harry) Xu: Yes.


>>: So if the instruction is in a loop you count it only

once for that calling context
even if it's [inaudible] a million times [inaudible].


>> Guoqing (Harry) Xu: Right. Right. I mean, we only consider the method cost
like the chain of the object.


>>: [inaudible] the cost can still be very skewed away

from the actual cost?


>> Guoqing (Harry) Xu: Oh, yeah, they're still like different, yeah, sure.


>>: [inaudible].


>> Guoqing (Harry) Xu: This is not very accurate. I mean it's like an
approximation.


>>: [inaudible] approximation, I mean, i
t is always under approximated? It could
be over approximated, too, right?


>> Guoqing (Harry) Xu: I think it's always over approximation.


>>: I just gave you an example it's under approximated, right? If it's a loop
--

if
it's an instruction insi
de a loop you will count it only once based on the calling
context even though I execute it a million times because I go around the loop a
million times.


>> Guoqing (Harry) Xu: You go around the loop a million times for the
aggregate. I mean we
--

the

cost is actually the frequency, right. Consider the
frequency of that.


>>: [inaudible] count, you will count the number of times I execute the
--



>>: The trick here is he wants to limit the number of the size of the dynamic
--



>>: I understand.



>> Guoqing (Harry) Xu: Yeah.


>>: Is that right? So he's going to have
--

you're iterating a million times over
like a million objects, say, but the call stack remains the same. So you're not
--



>> Guoqing (Harry) Xu: The equivalence class
--



>>: So you're not create
--



>>: Okay. So you have one equivalence class and then you have a count
--



>> Guoqing (Harry) Xu: Yeah, yeah. Right, right, sure. It's considered
frequency actually.


>>: So the static analysis case you would use so
me estimate of [inaudible] get
real numbers for
--



>> Guoqing (Harry) Xu: This is dynamic analysis, so it's not static analysis.


>>: [inaudible] try to do this statically [inaudible].


>> Guoqing (Harry) Xu: [inaudible] statically, that's a little

--

I don't know. That's
not question for me. I don't
--



>>: You take this information and [inaudible] the static but you don't try to do it
statically?


>> Guoqing (Harry) Xu: No, no. This is completely a runtime analysis.


All right. So so fa
r all costs that we have talked about are cumulative costs that
measures that effort made from the very beginning of the execution to produce a
runtime value. So however we found that cumulative cost is now useful in
helping programmers understand perform
ance problems because it's almost
certain that value produced later during the execution has the higher cost than
the value produced earlier in the execution, right? There does exist a strong
correlation between a high cost an a performance problem.


So

in order to solve the problem, we propose to compute relative cost instead of,
you know, cumulative cost. So the relative cost for a runtime value like a heap
location, heap value, is defined as the amount of work done on the stack that
transforms values

read from other existing heap locations in order to produce this
value.


I show this by example. Let's consider this picture where boxes represent
objects, edges represent dataflow. So it's FG, HG represent opposite fields.
Suppose now we want to com
pute the cost for O3. If we want to compute its
cumulative cost we only need to consider. So we basically need to consider this
amount of work done and pretty much all the work done from the beginning in
order to produce the value as written in to O3, ri
ght? All the work.


However, if you want to compute the relative cost, we're only to consider this
amount of work done on the stack that transform values read from other existing
heap locations like O1, O2, in order to produce the values in to O3. So t
his is
actually the fundamental difference between relative cost and cumulative cost.
Completely symmetric to relative cost, relative benefit for heap value is defined
as the amount of work done on the stack that transform this value in order to
produce v
alues written into other heap locations.


Let's consider again this example. So for example now we want to compute
benefit, relative benefit for O3. We're only to consider this amount of work done
on stack that transform O3 into other heap locations li
ke O4 and O5. So it's clear
to see that relative benefit for heap location is determined by both the frequency
and the complexity of the use of their value. Right? So eventually, you know,
the cost of benefits computed by individual
--

for individual he
ap locations are
aggregated based on object structures in order to produce cost and benefits for
high
-
level data structures.


So this slide reviews some of the key problems and challenges and ideas in this
worker. So for example solve the first problem,
dynamic slicing is too expensive,
we propose to use a new feedback called abstract dynamic slicing that performs
dynamic slicing over bounded abstract domains.


To solve the second problem, which is how to abstract instances for OO data
structures we pro
posed to use object
-
sensitivity
-
based calling context as
abstraction.


To solve the third problem, which is cumulative cost is not correlated with
performance problems, we proposed to use relative cost instead of cumulative
cost.


So combining all the
three insights we eventually compute relative abstract cost
and the relative abstract benefit and use this cost
-
benefit ratio as an indicator to
performance problem.


So this analysis was implemented in IBM J9 virtual machine. And we performed
case stud
ies on the real world large
-
scale applications. So, in fact, all those
applications here except bloat have millions live code. This picture shows the
running time reductions that we have achieved after removing the problem that
we found using this dynami
c analysis cost benefit analysis.


So for example for bloat, there are 35 percent of running time reduction that we
have achieved after removing the problems. This is actually the very first
dynamic analysis targeting general bloat. All existing work t
arget different kind of
symptoms like symptom base upload. This is actually the only piece of a word
that targets
--



>>: Go back to the
--



>> Guoqing (Harry) Xu: Sure.


>>: The original example with the is package and explain how it relates to th
is
analysis? So what
--

how would the
--

yeah, how do you compute the relative
cost
--

sorry. Sorry, man.


>> Guoqing (Harry) Xu: All the way back.


>>: So what ends up being the relevant cost of whatever packs and relative
benefit
--



>> Guoqing
(Harry) Xu: Well, so first of all consider the role of benefit. The
benefit is really easy to compute in this case because none of the elements in the
heap locations, the heap values in this list are used for any purpose. Right? So
they're never used.

Because the only
--

you only use this reference value. You
never retrieve the heap values from this list. Right? So the benefit for this entire
list is zero.


>>: Not quite zero.


>>: Yeah, it can't be zero. Well, I mean, zero is
--



[brief tal
king over].


>> Guoqing (Harry) Xu: Yeah, for data memory, right, exactly.


>>: Equals whatever [inaudible].


>> Guoqing (Harry) Xu: Equals. I mean, this is a reference
--

I mean this is a
pointer value. This is not like the heap
--

the value retr
ieved from the heap
locations, right? So there is a large benefit
--

there's a large cost associated with
the list because, you know, you have
--

you compute all those files. You
populate the list. So every data member in the list has a large cost assoc
iated
with it, right? So to do all the computations in order to produce values and then
write them into that heap location, the list location, right. In this way, you know,
it's clear to see that there's a large cost but there's no benefit. I mean, very

little
benefit. So you get a large
--



>>: You can go on. I just
--

I'm a little
--

some benefit because you have to test
against string. Is it [inaudible] is a test then that gets encoded in the result, right,
if they match.


>> Guoqing (Harry) Xu
: Yeah, sure, sure. Yeah. We can definitely talk offline. I
mean some subtle issues here. Let's go all the way forward.


All right. So
--



>>: So can you just back up
--



>> Guoqing (Harry) Xu: Yeah, sure.


>>: So the bloat, so the bloat is
a constructed example, or is that [inaudible].


>> Guoqing (Harry) Xu: No.


>>: [inaudible].


>> Guoqing (Harry) Xu: Well, bloat is program knowledge framework written by
Purdue University like many years ago. A few
--

large Java program analysis
f
ramework.


>>: Why is it called bloat? I don't know actually. It was the name of the
application. It was clustered in the [inaudible] benchmark [inaudible].


>>: Interesting [inaudible].


>> Guoqing (Harry) Xu: Yeah. That many years ago. So tha
t's basically
--



>>: [inaudible].


>>: Yeah, so did I. Okay. Thanks.


>> Guoqing (Harry) Xu: Well, the reason I mean why we can't find this such a
large running time reduction is because bloat is pretty much written by graduate
students. So the

quality of the code is really
--

I mean really poor.


>>: Was it one issue that you found in this, or were there multiple issues?


>> Guoqing (Harry) Xu: There were multiple issues. Yeah.


>>: So there's not one that you could
--



>> Guoqing (H
arry) Xu: There's no, no. I can give you specific example later
offline. Yeah.


All right. So this is actually the very first dynamic analysis targeting general flow
--

general bloat. In addition in this work we have identified a lot of interesting
bloat patterns that can be regularly observed during the execution of large
-
scale
applications.


So a further step would naturally be to develop a static analysis that can identify
and remove such patterns in a source code so that the programmers can avo
id
such small performance problems during development, during coding, right, in
order
--

and before these problems really pile up and become significant.


So, in fact, some of the interesting bloat patterns that we found in this work have
already left to

the development of new analysis and the tools. So for example
give you specific examples. We found a lot of interesting like, you know,
container inefficiencies. So we develop a new static analysis that uses
context
-
free
-
language reachability formulati
on to help programmers identify
underutilized containers and overpopulated containers. That work was published
in another PLDI 2011 paper.


And we found there were a lot of loop
-
invariant data structures in this work. So
we used a type and effect syste
m to help programmers identify and hoist the
loop
-
invariant data structures. Some other patterns that we found in this work
include problematic implementations of a certain design patterns and anonymous
classes.


So actually the kind of work that can do

immediately will be to develop static
analysis to deal with like three months work.


All right. So now we're getting to the third part of the talk which is actually the
static analysis that can be used to help programmers identify and hoist
loop
-
invari
ant data structures. The motivation of this particular analysis is based
on the observation that in large scale applications there are a lot of places where
objects with the same content in the loop get created in many, many times by
different loop iterat
ions and all their instances are exactly the same. And it really
hurts performance in many cases. So by pulling those objects out of loops, in
other words, hoisting those objects in the semantics preserving way we can
potentially save a lot of computatio
n as well as garbage collection effort. Right?
So that's basically the motivation.


Let's first look at example. All right. So this piece of code showed the problem in
multiple applications. Not only one application like multiple applications that
many we stupid in IBM TJ Watson Research Center.


So the loop here either is over a string array called date. Date here is a string
array. In order to parse each of the string in this array into a date object, right.
So the way this loop is implemente
d is that it creates a simple data format object
per iteration of the loop. And they use this object to parse each of the strings.
So it's easy to see that you know this entire data structure reachable from this
OSDF which represents this allocation site

gets created many, many, many
times, right, by loop iterations. You know, so and all their instances are exactly
the same.


In many cases creating an object using this new keyword in Java is much more
than allocating the space for the object. It can i
nvolve a lot of heavyweight
operations to initialize a big data structure like this one here and create a lot of
other objects like this O1 and O2, right? So specifically for this case creating one
simple data format requires to load many resource models
from disk which can
involve a lot of very slow disk IO operations. So it's very
--

so it's perfectly okay
for us to put all of the loop and use only one simple data format object
--

data
structure to parse all the incoming strings. Right. Yeah, sure. Y
eah.


>>: You said it's perfectly okay. That depends on a lot of stuff.


>> Guoqing (Harry) Xu: Oh, yeah, sure. Yeah. Of course. We have a
--

yeah.
Definitely go to
--

I mean, there is five or six different checks like very specific
checks over
there. So it's not that simple. Not as simple as what I said.


>>: So in this particular case it's the case that parse treats SDF as a read only
object. Are there other mutating
--



>> Guoqing (Harry) Xu: No.


>>: Operations
--



>> Guoqing (Harr
y) Xu: No. Our static analysis
--

our static analysis can make
sure that
--



>>: On the class, on the entire class is there any way to mutate a simple data
format?


>> Guoqing (Harry) Xu: Out of the entire class? What's your question?


>>: [inau
dible] simple data formats that's constructed there, I mean
--



>> Guoqing (Harry) Xu: Yes. It's immutable, right?


>>: I don't know.


>> Guoqing (Harry) Xu: I think so. Yeah, of course. I mean
--

no, no, no. I think
that the only thing that is

not immutable is you have to load sort of the current
data from the
--

like a date and whatever
--

the date and the time format from the
local computer to initialize this
--

lot of resource models loaded.


>>: Once it's constructed
--



>> Guoqing (Har
ry) Xu: Yeah, once it's constructed it's immutable, of course.


>>: So could this be solved at the library level by having like a static like factory
method that yielded a singleton object that was only
--



>> Guoqing (Harry) Xu: Of course you can de
finitely do that, yeah. There's a
possible way of doing that, right. Yes. That's definitely one possible way of
doing this. Optimizing this case.


>>: So is it often
--

I'll talk to you later.


>> Guoqing (Harry) Xu: Well, okay.


>>: Okay?


>> Guoqing (Harry) Xu: Yeah. Sometimes for library designers they're not
careful enough to consider these kind of complex cases. All right. So it's clear to
see that the challenge, the biggest challenge in our work here is that we're one
hoist a big da
ta structure out of the loop instead of one single instruction or one
single object. That is the fundamental difference between our work here and the
traditional compiler loop optimization work.


So given the difficulty of hoisting a big data structure
out of the loop we divide
this technical problem into two subproblems, one focused on the data site, one
focused on the call site.


So the first solved problem here focused on hoistable data structure. So which
means that first of all, for each object c
reated in the loop we need to identify this
entire data structure, big data structure that is reachable from this object. And
then we check whether or not all fields in this big data structure are loop invariant
without considering any actual call signal
that access this data structure. So if
you'll fields in this data structure are loop invariant, we call this data structure a
hoistable data structure.


Back to the example our static analysis can make sure that this entire data
structure reachable from

OSDF is a hoistable data structure because no field in
any instance of this data structure can contain iteration specific value that can
change across iterations. Our static analysis can make sure of that. And once
we have a set of hoistable data struct
ures identified, the second step here tries to
hoist the actual call signal that access those data structures. So the key idea
here is that for each hoistable data structure we check each call site that is
invoked on this data structure and see if they're

hoistable. If this data structure is
--

I mean if this call site is indeed hoistable, we just pull it up.


Let's go back to the example. Once we make sure that this entire data structure
is reachable from OSDF, it's hoistable data structure, we check
each call site that
is invoked on each object in this data structure. So first of all we first check this
allocation site because call site, right, it's a structural call. And in this case the
allocation site itself is completely hoistable so we just pul
l it out of the loop like
this, and then we check the second call, which is the call to match the parts.
However, the second call here is not hoistable because the argument date here
contains iteration specific value, right, that can change it across iter
ations. Our
static analysis can identify that.


So there's no way to hoist this second statement, I mean the call site. So this is
actually what the call looks like eventually after our analysis of performance
hoisting. So it's important to know that,

you know, this analysis is interpretable
compiler analysis. It's not the source to source translation or and if some form of
refactoring. So this is completely a compiler analysis.


All right. So let's first look at this first, you know, technical pr
oblem, subproblem,
which is to identify
--

which is focus
--

which focuses on the data site which hard
to identify hoistable data structures. So we found there are three challenges to
identify hoistable data structure. So the first challenge here is to u
nderstand how
a data structure is built up. So for example, for each object creating the loop we
need to identify what other objects are reachable from this object. This requires
us to [inaudible] all points
-
to relationships. Right? This is very simple
.
Straightforward.


The second challenge here is to understand where the data comes from. So for
example we have to make sure that no field in the hoistable data structure can
contain iteration specific value. That can change across iterations. That
requires
us to reason about dependence relationships.


And the third challenge here is to understand in which iterations the objects are
created. So for example a data structure, a hoistable data structure cannot
contain objects that can be created in d
ifferent iterations, otherwise there's no
way to hoist it. To understand this information we propose to compute iteration
count abstraction for each allocation site that can have three abstract values, 0,
1, and the bottom. 0 here means that this allocat
ion site must be outside the
loop. In other words, all instances created by this allocation site must exist
before the loop starts.


The second case here is if the ICA for the allocation site is 1, it's guaranteed that
this allocation site is inside the

loop and no instance the lifetime for any instance
of the allocation site must be within the iteration where its instance gets created.
In other words, no instance of the allocation site can escape the iteration where
it's created to later iterations.


And the third case here is bottom, which means that, you know, some instance
--

the allocation site must be inside the loop and some instance of this allocation
site might escape situation where it's created to later iterations and may actually
be used by

those later iterations.


So it's clear to see that we're interested only in data structures where the ICAs
for all their objects are 1, right, because it's guaranteed that, you know, all
objects any instance of the data structure must be created in one
single iteration
and all die as they end all iteration.


So the ICAs for objects are computed
--

can be computed by general technique
called abstract interpretation. So this talk focus only on those high
-
level analysis
ideas. If you're interested in th
is low
-
level analysis details we can definitely talk
offline or you can read the paper. There are four page forms and a proof
[inaudible].


All right. Sure.


>>: So you came up loaded with an example which you create an object that's
immutable and al
so presumably the constructor depended functionally on
--



>> Guoqing (Harry) Xu: Right.


>>: On things that weren't modified in the loop.


>> Guoqing (Harry) Xu: Correct.


>>: But you have a much more ambitious analysis here that's trying to ide
ntify
--

that would handle objects that are immutable but not mutated within the loop.


>> Guoqing (Harry) Xu: It's immutable but not mutated in the loop.


>>: I mean you decide defining the concept of loop invariant.


>> Guoqing (Harry) Xu: Yeah.


>>: And there's nothing that requires the object to be deeply
--



>> Guoqing (Harry) Xu: Oh, yeah, sure.


>>: Immutable.


>> Guoqing (Harry) Xu: I mean, we have this dependence analysis to identify
this immutable fields. So it's immutable but n
ot mutated in the loop.


>>: No.


>> Guoqing (Harry) Xu: Right?


>>: Mutated.


>> Guoqing (Harry) Xu: Yeah, it's not mutated.


>>: So you could have a less ambitious analysis that would identify just
completely immutable after construction o
bject beyond a class basis.


>> Guoqing (Harry) Xu: Right.


>>: And it wouldn't matter what happened in the loop?


>> Guoqing (Harry) Xu: Uh
-
huh. Oh, yeah, sure. I definitely understand. Our
analysis is more ambitious in terms of your
--

like t
o identify
--



>>: You find actual cases in which
--



>> Guoqing (Harry) Xu: The class is immutable but it's immutable but is not
mutated in the loop.


>>: Where you get extra benefit over just the plain
--



>> Guoqing (Harry) Xu: Yeah, sure, sure
, of course. Yeah. Yeah. All right. So
how do we identify these hoistable data structures? We combine
--

I mean the
interesting idea here is we combine the three abstractions in a powerful way by
annotating points to and dependence relationships with
ICAs.


So here's an example. I don't know I have time to talk about this, but
--

so how
much time do I have, Ben?


>> Ben Zorn: We have the room until 12.


>> Guoqing (Harry) Xu: Oh, okay. Yeah. Okay. I'll continue. So let's first look
at exam
ple. So this very simple example. It contains one loop and the four
objects. So let's first look at the ICA for each object here. So the ICA for
example for O1 here is zero because the allocation site is outside loop, right?
That's very easy.


ICA f
or O2 is one because no instance of this allocation site can escape the
iteration where it's created to later iterations, right? This is the same case for O3.
But for O4 the ICA is bottom because some instance of this allocation site might
escape iterati
on, right? The current iteration where it's created to later iterations.
And it may actually be used in those later iterations, you know, for example here.


So basically 1, O, O, bottom are the ICAs for these four allocation site. It's pretty
easy to
understand. This picture shows the annotated point 2 graph where each
node represent an allocation site, an object, and each edge represent an
annotated points
-
to relationship, which contains a field name which FGH are field
and a pair of ICAs for the two

nodes, two objects connected by the edge.


So by [inaudible] this points relationship like this annotated points graph we can
easily conclude that this entire data structure reachable from O2 is not hoistable
because, you know, the ICA for O4 here is bo
ttom. Because there's
--

so there's
no
--

you know, there doesn't exist any way to hoist the
--

this entire data
structure.


This picture here shows the end [inaudible] dependence relationship where each
node represent either a heap location or a static

location. And each edge
represent a dependence relationship annotated with a pair of ICAs for the two
news connected by the edge. The ICA for heap location like O3 value is actually
the ICA for the object O3 which is a one in this case. The ICA for a s
tack
variable is actually determined by where the stack variable is declared. If the
variable is declared outside the loop, its ICA is zero. If the variable is declared
inside the loop, the ICA is one. So the ICA for stack variable can never be
bottom b
ecause in Java any variable declared in the loop must be initialized
every time the loop iterates, right? So, you know, for example from this case we
can easily see that this location O3 double value depends on a stack variable
that can depend on itself w
hich indicates that this heap location can contain
iteration specific value that can change across iterations.


So any data structure that contains this location is not hoistable. So basically.


We found there are four interesting properties in a hois
table data structure.
Missing any of the properties here can make a data structure not hoistable. So
the four interesting properties are disjointness, confinement, non escaping loop
invariant. So I don't think I'm
--

I will have time to go through all t
he details. I'm
probably going to skip those details. So if you are interested, we can definitely
talk offline.


So there's one important thing I can definitely report here, that is the verification
of those four important properties can be easily done

by performing checks on
the annotated points
-
to and the dependence graph. That's the only thing that
you need to know to understand this piece of work. Okay?


All right. So once we have a set of hoistable data structures identified it's really
easy f
or us to identify hoistable calls. You know, the data structure
--

the actual
[inaudible] that access [inaudible]. This is actually the second step, right? I
mean, the most important part of this work is the first step. As long as you
understand the fi
rst part it's really easy to understand the second part.


So the key idea again is to
--

for each hoistable data structure identified which
have each call site that is invoked on the data structure and see if it's hoistable.
This process can involve, yo
u know, like one, two, three, four, five, six, six
different checks, control dependence checks, argument checks, external
environment checks, exception throwing, thread, control flow checks. As long as
all those checks are satisfied this particular call s
ite can be hoisted.


So I'm not going to talk about all those details as they're very specific checks and
they are defined clearly in the paper. So we can definitely talk offline if you're
interested.


So that's pretty much about this automatic work.

The sound and automatic
transformation. However, we found that this completely sound and automatic
transformation is not quite effective in hoisting real world data structures. There
are basically two reasons here. First of all the real world usage of
Java data
structures is very complex and any static analysis has to be over conservative to
achieve 50, right? And the second reason here is that hoisting a lot of real world
data structures requires developer insight.


So for example consider a data st
ructure with 100 fields. If there's only one field
that is not loop invariant, this can make this entire data structure not hoistable.
However, if this information is given to the developer, the developer might have a
way of hoisting data structure. Rig
ht? So for example the developer might be
able to split the data structure into a hoistable part and non hoistable part, right,
and pull out only the hoistable part.


So in order to make our analysis more practical and help programmers perform
this kind

of manual hoisting, we proposed to compute hoistability measurements
for each data structure in the loop that indicates how likely is this data structure
can be manually hoisted.


So for example we consider metrics like data hoistability and code hoista
bility.
I'm not going to explain all those details again. They're defined in the paper. So
we can definitely talk offline. So but there's one important thing which is the
computation of those hoistability measurements fits nicely, very nicely with the
original analysis that we proposed for transformation.


In other words, we don't need any new analysis to compute those
measurements. They're computed automatically with the transformation, the
actual transformation.


So the analysis was implemented u
sing the Soot 2.3 framework and evaluated
on set of large
--

19 large Java applications. In those 19 large Java applications
a total 981 loop data structures are considered in which we found 155 data
structures are completely hoistable data structures.


However, our completely sound and automatic transformation was able to hoist
only four data structures in three programs, completely automatically because of
the two reasons that I mentioned.


So here's a list of the data structures
--

of the running ti
me that we have achieved
after hoisting those data structures completely automatically. From those
numbers we can easily see that there's a large optimization opportunity out there,
but only small portion of it has been captured by this completely automat
ic
transformation. To explore the rest, we do manual hoisting with the help of
hoistability measurements. Yeah?


>>: [inaudible] Java C by 10 percent?


>> Guoqing (Harry) Xu: Yes.


>>: By hoisting something out of the loop.


>> Guoqing (Harry)

Xu: Java C no
--

yes. Yes.


>>: So what's the
--

so what was the data structure that gave you that benefit?


>> Guoqing (Harry) Xu: I don't know actually. This is a completely automatic
analysis. And Java C is now open source. So I saw
--

you k
now, used the
analysis to produce a new byte code, a new version of the byte code and run the
new version. And it causes 11

percent running time reduction. I don't know what
was actually going on there.


>>: Okay.


>> Guoqing (Harry) Xu: So
--



>>
: So you actually
--



>> Guoqing (Harry) Xu: This is not open source. This
--

you know, you can't get
the source code for this application. Java C application. So. But we do have
open source
--

I mean do have sources, source code for some other appl
ications.
So we studied five large Java applications by inspecting top 10 data structures
ranked based on hoistability measurements. We've actually achieved much
larger performance improvements than this completely automatic transformation.
For example
for applications like PS, which is a
--

which is a postscript
interpreter, we're able to make the application run more than six times faster after
hoisting only a few data structures in this core components.


And as another example, for applications like

xalan, which is a real world XML
transformer processor there's 10 percent of running time reduction after we
hoisted only one XML transformer object of the loop. So this sort of a
performance bug has been conformed by the DaCapo development team. So
the
re's one
--

I mean for this case we hoisted only one XML transformer object.


>>: [inaudible] did in PS [inaudible].


>> Guoqing (Harry) Xu: Yes. Yes.


>>: What was hoisted?


>> Guoqing (Harry) Xu: We found there's
--

the problem is actually wi
th the
usage of the data structure called the stack. The programmer seemed to
understand
--

I mean they don't understand the stack is a subcause of the list.
For every operation they want to do with stack is they use push and pop. So
they keep pushing t
he popping step from the stack. They don't understand the
stack is a public class of the list so they can directly use something like get to get
through retrieve a specific element. That means you know they use
--



>>: [inaudible] so was that a change
in the code beyond just pulling it out of a
loop or what was
--



>> Guoqing (Harry) Xu: Yeah. There's
--

yes. There's something more complex
going on there. Because there's not, you know, just hoisting the data structure.
But we found this problem by

identifying the hoistable data structures. So we
found there's no problems that we found in this work have ever been reported
before in previous work.


All right. So we're getting to the final part which is future work and conclusions.
Of course I wi
ll continue this line of work on software bloat analysis because I
think it's wide open area that has so many interesting problems, challenges and
potential research opportunities.


We're actually very
--

one of the very few academic groups that are doin
g
fundamental research on this real world problem. Why do I think this is a wide
open area? Because if you look at deeper into the cause of the bloat, you will
see that the methodology of object
-
orientation itself encourages certain level of
excess or bl
oat. For example the programmers are encouraged to basically do
whatever they want to do to do favor reuse, modularity or readability leaving
performance entirely to the compilers and the runtime systems.


However, if we want to allocate, to explicitly
consider performance, this can have
impact on almost the entire software development cycle, right? So this picture
shows the kind of working that I'm planning to do in the future. Program we start
with detecting performance problems. So once we find, yo
u know, the root
cause of the problems, we can either do manual tuning or we can classify
interesting bloat patterns.


Once we have a set of interesting bloat patterns identified, we can do a lot of
different things. We can develop a static transformati
on that can automatically
remove such patterns in the source code. We can develop a self
-
adjusting
systems as part of the feedback directory of the compilation that can remove
such problems online as part of the JVM, right? Or like the COR or the runtime

in the Microsoft setting.


We can allocate performance conscious software engineering that can require a
whole lot of new things, new design principles, new modeling tools, new testing
analysis tools, new compilers parse and so on, so forth. And we can

also use a
compiler to synthesize a bloat
-
free implementation given a set of performance
specifications.


So there are a lot of interesting things we can do in the future. In addition I'm
also considering to leverage existing techniques from other fiel
ds such as
systems and architecture in order to deal with this every increasing level of, you
know, excess or inefficiencies.


Of course also look into other program analysis
--

actively look into other
program analysis areas. So definitely one of the t
hings is to adapt existing
techniques to solve MS
--

Microsoft specific problems like, you know, for the
existing techniques like all the existing techniques are implemented within JVM
so is there
--

where does the
--

what are the problems that are specifi
c to
Microsoft products, is there a possible way to adapt those techniques into the
CLR, you know, within this Microsoft family of languages? Is that possible? I
don't know. So definitely it's interesting to see.


There are some other interesting thin
gs that I'm planning to do in the future like
parallelism is one of the most interesting things. Still like optimizations,
data
-
based compilation. Model checking, testing, debugging, security improves,
compilers for high
-
performance computing, data local
ity. I mean the bottom line
here is that I'm open for collaboration with any researcher whose research deals
with static and dynamic program analysis. That's pretty much my goal.


So the conclusion here, I'm a program analysis person. I'm interested b
oth in
theoretical
--

static and dynamic analysis and both the principles and applications
of the program analysis.


My dissertation here deals with important application which is software bloat
analysis that contains both dynamic analysis that can help
programmers make
more sense of heaps and executions and static analysis that can help
programmers remove and prevent bloat.


So with a set of techniques that I have developed I hope that we can lower the
bar for performance tuning. In other words, with
automatically
--

automatic tool
supports I hope that tuning is no longer a daunting task. I also hope to educate
the programmers, developers in the real world, especially the OO programmers
and raise the awareness of this bloat problems in the real world
object
-
oriented
programming.


So for example developers should really be aware that performance is important.
They should do everything possible during development to avoid small
performance problems before they pile up and become significant.


So thi
s is because compilers and runtime systems are not always smarter than
human experts. Thank you very much. I'm ready to take questions.


[applause].


>> Ben Zorn: Questions?


>>: I have one question for you.


>> Guoqing (Harry) Xu: Yes, sir.


>>: In all the work you talked about, you didn't really talk at all about the
hardware or the cache, you know, sort of memory card key as a source of
performance. Are you
--

is that something that you intentionally didn't
investigate, are you interesting

in [inaudible] how does your work relate to that?


>> Guoqing (Harry) Xu: Well, actually I didn't get any chance to investigate that.
I'm interested in doing that, definitely. So yeah, I mean, for example the cost
benefit thing can be
--

the current
definition of cost benefit is defined in terms of
computation, right? The amount of work done to produce their value is can
naturally be extended to work for like the cache and other things. I don't know
how to do that, but I think it's
--

it's doable.


Yes, I mean the answer to the question is that I haven't done any work specific to
the cache in the memory hierarchy thing. So all the work that I have done are
related to program analysis. That's the only thing that I have done.


>> Ben Zorn: Any o
ther questions?


>> Guoqing (Harry) Xu: Yeah?


>>: For the first
--

the first part of
--



>> Guoqing (Harry) Xu: Yeah?


>>: You gave an example in which there were a series of writes that were never
read. And you had a considerably more ambitio
us framework for measuring the
cost benefit.


>> Guoqing (Harry) Xu: Right.


>>: And did you quantify
--

I mean, you could add a simpler one that would just
find
--

identify writes that were never read.


>> Guoqing (Harry) Xu: Oh, yeah, sure.


>>
: And can you quantify how much extra benefit the
--

extra [inaudible].


>> Guoqing (Harry) Xu: Oh, yeah, sure, sure, sure. I understand your problem
--

your question. Well I mean we do find sort of more complex cases where the
benefit part is not jus
t like no use. So I'll give a very specific example. We found
that for lot of data structures in large scale applications there are just purely
copied from one source to the other source. Like if you consider this large scale
Web based applications they

have
--

you know, different components may have
different representation of the same piece of data, right? So in order to be
transmitted the for example for J2EE applications, in order to be able to be
transmitted on the network they have to be wrapped i
nto this soap sort of
protocol.


So some piece of data keeps being wrapping
--

being wrapped, you know, in
order to be transmitted between, you know, different components. So our
analysis can perfectly find places like that. So for example a piece of d
ata is, you
know, is wrapped in this application
--

in this component but is unwrapped in that
without any computation done. So
--

and then there's definitely a lot of
optimization opportunities out there. So we did find a lot of cases, more complex
case
s than the simply like no use. Yeah. Yeah.


>> Ben Zorn: Okay.


>> Guoqing (Harry) Xu: Thank you.


[applause].


>> Guoqing (Harry) Xu: Thank you very much.