>> Ravi Remamurthy: Okay. Good morning, everybody. It's a pleasure to welcome Sudarshan. Most of you already know him. He's been a visiting research at MSR before. He's a professor in IT Bombay, which has an excellent database research group. He's a co-author of a very popular undergrad textbook and his research areas include query optimization, keyword [indiscernible] and databases. And today, recently, he's been working on something called holistic query optimization,

aquahellishSoftware and s/w Development

Dec 13, 2013 (3 years and 7 months ago)

65 views


1


>> Ravi Remamurthy: Okay. Good morning, everybody. It's a pleasure to
welcome Sudarshan. Most of you already know him. He's been a visiting
research at MSR before.


He's a professor in IT Bombay, which has an excellent database research group.
He's
a co
-
author of a very popular undergrad textbook and his research areas
include query optimization, keyword [indiscernible] and databases. And today,
recently, he's been working on something called holistic query optimization,
and he's going to talk about

that today. So I'll hand it over to him.


>> S. Sudarshan: Thank you, Ravi. So this work was done by
--

it was started
by my Ph.D. student, Ravinda Guravannavar, or Ravi, as we all call him. And it
is being continued by Karthik, who is a Ph.D. stude
nt currently working on this
topic. So if there are any really hard questions, I'm going to ask you to
e
-
mail them. But hopefully, it should be okay. And then several Master's
students including Mahindra Chavan and a few others.


Okay. So what is the p
roblem? So here you have a bunch of people waiting for
a taxi. There's a long line. And in our context, the taxi is called to
database system. So you execute a query, you fill the taxi, start over. It
takes a long time to go there, a long time to come

back and then the next guy.
There's one taxi in town. So it's a very bad idea to live this way.


And why would you, you know, why would anyone do it this way? Because there
are many situations where people, called loops with query [indiscernible].
Som
etimes it because of a programmer didn't know better. It was easy to
change. Sometimes it's actually hard to change because the query is not
directly in the loop but it's deeply [indiscernible].


So it's useful not only in databases, but the same problem

even arises with web
service requests. Our implementation has focused on databases. But the ideas
that are present here should be applicable for web services also.


So naive execution by iterative or queries is obviously inefficient, as we
said. The
latency is a huge factor. In addition, what we have seen is that
with multiple CPUs, the database, the database can actually handle a lot of
queries concurrently, especially if their data is [indiscernible] in cache. So
there's not much disk IO.



2

And we'
re able to bump, you know, ten year queries in [indiscernible] database,
and the response time, you know, highly changes when you go to 20, 30. These
were small queries.


Even for large queries, this kind of works, because if they share the scan, for
exam
ple, you can get much better performance.


Okay. So how do we solve this problem? So a lot of us have been working on
query optimization for many years, but unfortunately, there's nothing the
database can do. It's being sent a series of queries, and unl
ess it can kind
of guess that this is what is happening, but even then there's little it can
do, because at best it can precompute the query and keep it, but still
[indiscernible] synchronous and slow with latency.


So you really have to go to the other si
de of the gap here and work on the
application program. So that's something we avoided for many years, because we
thought we are database people. We are not programming. We don't know
anything about programming languages.


But likely, I had a Ph.D. stud
ent who not only knew databases, he also knew
program analysis. So we started filling this gap. So as we say, it's time to
think outside of the box for query optimization.


So we're actually going to give two solutions to the problem I posed. The
first
one is to use a bus, ecologically sound. And what is a bus in this
contest? It's basically [indiscernible] of execution, as you can imagine. We
had a lot of fun with these pictures.


So I'm first going to describe earlier work, which first appeared in V
LDB 2008,
which was given a program which makes calls on a database. Our implementation
detects database calls and works on Java programs.


How do you automatically rewrite the program to replace a loop with calls by a
single batch oriented execution?


So

the obvious, such as [indiscernible] plan sharing of disc IO, network round
trip delays, all of those which I told you.


So our approach is to actually transform imperative programs, and the way we do
that transformation is there are, luckily, tools avail
able, including some at

3

Microsoft which unfortunately I don't think we have access to it, but we are
using an open source tool called SOOT, which analyzes Java, code
[indiscernible] and internal presentation called [indiscernible]. And then
builds the dat
a dependency in control for other graphs, collectively called the
program dependency graph.


And now we can
--

it also provides a bunch of APIs to detect things in this
graph. So it makes our life a lot easier to do an analysis using this tool.
And it
also allows us to move segments around at the [indiscernible] statements
can be moved around.


So we can actually transform these programs. So now we have basically
transformations applied to the program.


And then the second part is to actually rewrite t
he query to make it set
oriented. Luckily, you know, in this case, the sequence of our optimizer saved
the day here, because it's really good at decorrelating queries. So all we had
to do was give it a query which essentially used the cross [indiscernibl
e] and
[indiscernible] version of the cross [indiscernible]. And it did the rest of
the stuff for us. So that is actually on sequence server.


Okay. So here is an example of a program, a small program, and how we rewrite
it. See if I can use this point
er.


Okay. So here is a connection which has been somehow set up and you are
preparing a query which counts the number of partkeys from part from a
particular category, which is provided [indiscernible]. And then there's the
loop, which goes through a ca
tegory list, starting through [indiscernible]
elements using next. It binds the question mark parameter to the category,
executes it, gets the count and then adds it to the sum. So it's doing a fair
amount of work, but still a very small example.


So her
e is what we do when we end up rewriting it. So this part is the same.
Now here, you will notice there are two loops where there was one. So the key
thing here is to split the loop into two parts, where the first part doesn't
actually execute the query.

Well, we'll see that in the asynchronous case, it
actually does initiate execution of the query. But in the batch oriented
portion, all it does is it steps through the category list and now, after
binding the parameter, it says add batch. And that's al
l that loop does.



4

And then this single statement doesn't execute batch, but rewriting the query,
which as I said is very simple. And then the last part is executing the
results of this query and adding it up to get the sum. Okay? Of course, this
all ha
s to be syntactic. We don't understand the semantics. And, of course, a
lot of conditions apply, you can't always do this.


>>: In this particular case, can't you just issue one query? It's probably
harder to decide that [indiscernible] [inaudible] y
ou could push it as a union
of categories and [inaudible].


>> S. Sudarshan: Right.


>>: Could have potentially done that [indiscernible]. It's harder to do it.


>> S. Sudarshan: So there are two kinds of things that could happen. One case
is wher
e the category list is actually a list which is in the programming
language. It is not coming from the database. And here, what we are doing is
this [indiscernible] doesn't execute the query. It's actually just collecting
all the values.


And then this
execute batch is doing the whole execution.


>>: Still execute [indiscernible].


>> S. Sudarshan: No. It's one query. It's a single query, which is a batch
oriented version of it. Basically makes the parameter and makes a temporary
table in the dat
abase, and then that table [indiscernible] on the basic query.
[Indiscernible] are the parameters which went in, along with the regular
attributes which were in the query. And then that comes back and this part
actually steps over it.


>>: I guess wha
t I was asking is you could have pushed the sum
--



>> S. Sudarshan: Into the database, yes. Yes. If he
--



>>: [indiscernible].


>> S. Sudarshan: Exactly. So in this case, maybe we could [indiscernible]
that we can translate this. But in general
, what we're doing arbitrary stuff
here, all dependent and so on. So we actually can resolve [indiscernible] if

5

the
--

this
--

I'm not showing it here, but the way we do it, this would
execute exactly what this loop did in the same order over here. And s
o we can
[indiscernible] if you bring stuff over for example, it still works.


And, in fact, coming to the other thing, if this thing over here was actually
it rating what a relation in the data
--

this is actually another common case.
So you have a query

which looks at a relation of another query in sight. Then
we could short circuit this part, approximately, and realize that this is
already in the database as a relation and use that relation.


We could. I don't think our current implementation does it.

Yeah?


>>: [indiscernible].


>> S. Sudarshan: Yeah.


>>: What happens [indiscernible].


>> S. Sudarshan: Okay. So if you
--

well, in the paper, we discuss how we
could take a procedure and then rewrite it to a batched version of the same
proced
ure, where it takes one parameter. We pass a set of parameters, and we
can actually rewrite the procedure itself. It's a new signature in theory.


>>: [indiscernible].


>> S. Sudarshan: Right.


>>: [indiscernible].


>> S. Sudarshan: Right. So
if the so if itself [indiscernible] is then
[indiscernible] by [indiscernible] this is a possibility [indiscernible] could
do it. But currently will not handle [indiscernible].


>>: So a lot of the latency issue is avoided by simply packaging this whole

thing up into a procedure [indiscernible] down the database and letting that
procedure perhaps execute unmodified?


>> S. Sudarshan: Yeah, it's a query. It's not a procedure. It's just a
query.



6

>>: Yeah. So what you're adding to that here is som
ehow driving more of the,
more of the execution into the declarative part of the query when you optimize,
is that?


>> S. Sudarshan: That too, because we are telling the database to do this
query on another batch of parameters. The optimizer can now do

something
[indiscernible], which it couldn't have if you got into the
--

so that's the
second one. The first one is the latency and the second is this absolute.


Okay. So I'm going to discuss some of these conditions, but so this is
--

what
is this ex
ample here? This is a slightly more complex example, where what
we're doing here is if you look here, in this case, we have a category coming
in. And now look at the last part here. It says category is equal to get
parent of category. So now the step h
ere is after the query execution. So if
you split the loop at this point, we actually don't have the ability to get all
the categories. So what this example is showing us, you can't just take the
point in the loop where the query is and split it, because

there is a
dependency here which feeds back into it.


So this illustrates, that, well, two things. The first is that that if there
are dependencies, which they call loop carried flow dependencies. So this
thing goes into this next iteration of the loop
back here. Then you cannot
split the loop. That's the first thing this illustrates. So that's one of the
required conditions for doing the loop split.


But the good news is that we can actually do something about it. There is no
reason why this had to
be here. We could have moved it up here just before
this query. But then there's a slight problem, which is that the category
which was
--

on which the query was to execute is now getting clobbered,
because [indiscernible] we can use then query variables

and work around this.
We'll see an example.


Okay. So there are several steps in this transformation, through which I'll
show you how to handle complex cases like this. The first step is, obviously,
to identify which queries we want to turn into batch
execution.


And in our [indiscernible] implementation we just look for database
[indiscernible] and we could perhaps do it only if there are [indiscernible] of
a loop and so on. But anyway, the [indiscernible] are small enough that we can
get away with th
is for now. That's the first step. The intention is to split

7

the loop at this point. But like I said, there is a loop carried flow
dependency here, which feeds back into the loop condition and also into this.
So we can't directly split it.


So we do a
data flow analysis here to look for this condition, and so the third
step is to try to reorder statements to see if we can remove this loop carried
flow dependency. Now, there are certain cases where it cannot be done, in
which case we have to abandon it
and not split the loop.


But in this particular case, what has happened is we have stored the category
coming in into a temporary variable, and we have moved that statement, get
category to get parent of category which was done here, we moved it up here,
b
ecause I think nothing in between which depends on it. Well, only thing which
depended was this thing. The category was being passed as a parameter to the
query. But what we have done here is created [indiscernible] variable to we
can use that over ther
e.


Okay. So now, if you see, this resulting loop, there's actually no dependency
here. Yeah?


>>: [Inaudible].


>> S. Sudarshan: Yeah, so that is a lot of conditions applied. So there are
many conditions. One is there are all these functions whi
ch could potentially
have side effects, but in this case, you know, if the side effect affected one
of these things, then we could have a problem.


So we can do interpretive analysis to decide this. And if any of these ran
another query, which affected th
e result of this query, then again we have
trouble.


So to make sure there's no problem, we actually have to see what other queries
shoot to the database inside here. So we could do static analysis. In our
tool currently, you know, we are implementing th
at part. It's not really. So
like I say, it's faith
-
based optimization. We trust that this currently
doesn't cause problems. Yeah?


>>: So one thing is the extra conditions of the program. This thing is on the
data, for example, I wonder if there i
s [indiscernible] query is slightly more
complicated query often these transformations are not fully semantic specific.

8

They are symptomatic there are no issues with [indiscernible]. The
[indiscernible] is you wouldn't know on the database.


>> S. Suda
rshan: I think in this case, that problem will not arise, because
the [indiscernible] cross join definition is for each value generated by the
left side of the cross join, you execute the query on the right side, and the
right side query is the original q
uery. So assume being the SQL optimizer does
it right, and I do believe it does it right, we should not have any problems
with values.


But you're right, if you try to do decorrelation analysis, we could run into
trouble. So this is the reason we current
ly work only on SQL so the other guys
don't fully implement this.


Okay. So I think SyBase does present some sort of lateral joint, so we were
trying to implement it on SyBase, but it's not yet functional, as far as I
know.


Okay so in this case, after th
is [indiscernible], the loop carried flow
dependency is gone, and there are [indiscernible] conditions could take if you
could split a loop at that point. I won't get into all the details.


But the last step is how to split the loop. In the earlier progr
am, I sugar
coated a lot of stuff to make it easier to understand what was going on. But
this is actually what a transformation does.


So the first loop did something. Then we executed the query. Now, the second
loop should do exactly the same set of th
ings as the first loop did. Of
course, what happens inside the loop is different. But if there's a variable
which is defined in this loop and was used down here, then we better preserve
the value of that variable so that when the same iteration of the lo
op is done
over here, it sees the same value for that variable.


So once we split the loop, we have to make sure that variable states are saved
and then restored down here. Okay? So in order to do that, we create this
loop context stable, and has entries
. One entry per iteration of the first
loop. And the second loop simply goes over the same entries in the same order
in which they were created. So it's really more of an [indiscernible] than a
[indiscernible].



9

So what we do here is create a context.
And in this case, let us see. I think
there is no variable that needed to go into the context. So the context was
basically used because the original query execution here
--

well, let me go
back to the first slide to show the original query.


Okay. So i
f you see here, now in our earlier program, I sugar coated it by
saying the query just returns one value. But in general, the query may return
multiple values, and you might actually be iterating over the values here.


In this case, you know, we have remo
ved a little bit of the sugar coating and
we say results [indiscernible] next, because there's just one loop but it's
getting that. But if you had a loop, we have to make sure that in the second
part of the split loop, we will be able to execute the same
thing over there
and it could get all the results of one particular [indiscernible] location.


Okay. So that's part of what a loop context can do here. So when we are doing
our batch, we are setting a context. So which iteration of the loop was this
for

is provided by the context. And here, we're saying the results are equal
to statement [indiscernible] on the context. So we'll get exactly the same
results that the corresponding iteration over here would have got. And then
the rest is the same as befo
re.


Okay. Any questions?


So last part is how to do the batch rewriting. So this is the sequence of our
syntax. So over here is the original query. We have created a batch table,
and we have inserted all of the values into the batch table. We can use

the
batch form of the database in such statement so that we don't do multiple round
trips. We do only a few round trips.


And then here, we select batch table one. So those are the parameters and all
attributes of the query from batch table 1 outer appl
y, and then over here is
the query. And finally, we order by loop key one so that the output of this
comes back in the correct order so we can just go through it in sequence. We
don't have to go back and forth in it.


>>: [indiscernible] doesn't matte
r, though, right? This final ordering.


>> S. Sudarshan: So this ordering is on not of the query. If the query had an
order by
--

well, then what would happen? You'd have [indiscernible] order.

10

So we'd do an order by loop key. And then in the loop
key, whatever order was
there for the original query.


So the point is that a mini query is issued. We want the results to come back
in the same order in which the queries were issued, because there is a loop
there. Maybe it prints out there isn't. So t
he results have to come in the
same order then.


>>: I was just thinking that there might be some loops, though, where that
doesn't matter. Where you wouldn't have to do that.


>> S. Sudarshan: Yes, so that could be an optimization where we turn it
off.
But unless we know for sure that the order doesn't matter, this is safe.


Okay. So that goes an overview. Like I said, I didn't go into the details of
all the conditions, but we have those in the 2008 paper, and the more detailed
version is there i
n Ravi's Ph.D. thesis.


Okay. But there are some limitations. Well, a limitation is an opportunity
for a new paper. So anything which you didn't do in the first place, you get
one more paper. So that's what we did here. And then I'll tell you some of
the limitations of
--

so this was a paper published in ICD. Well, that also
has some limitations, which will be our next paper hopefully.


Okay. So the first limitation is we were doing batching and there were
obviously many interfaces which don't give a

batched
--

ability to do a batch
query. So we have to do asynchronous submission. That's the only way for
these things.


The second problem is for certain programs, the query may actually vary across
iterations, and so then what query are you batching?

Batching assumes that
across the loop, the query is fixed. Sometimes people add a few selection
condition, depending on what parameters came in and the queries. So we cannot
do batching as is.


And then there are some inter
-
statement data dependencies w
hich by I mean we
can't actually split the loop. So maybe we can't apply our transformations.


And finally, like I said, even though we may not be able to batch, the
multicore processing power on the client can be used to issue a number of

11

queries in para
llel and also fetch them back in parallel so at least the
--

whatever work the client has to do can be parallelized, and so the server,
whatever it does, can be parallelized, even if it's not set oriented.


So basically, we exploit asynchronous query submi
ssion. So in order to do this
--

by the way, original batching did a whole lot of low
-
level calls and added a
lot of [indiscernible] in there. So one of the problem when we saw a written
program, we couldn't understand what it was doing.


So, you know, l
ike I said, [indiscernible] to do the right thing, but we do
trust our programs that much. We do something, and then you run it, well, you
won't be happy.


So what we ended up doing is building an API so that a transformed program uses
the API and is a lo
t easier to read. So the programs that I showed you
actually are based on that API. The original one was actually much harder to
be.


The other thing is that once you have this API, if you don't trust our
rewriting, you can still use the API and do whate
ver we would have done
automatically, you can do it manually and get the same benefit.


And finally, there's an improved set of transformation rules. I won't be
getting into that, including reordering and so on.


So what we are doing in this thing is to h
ave a whole bunch of taxis.


>>: So I have one question.


>> S. Sudarshan: Yeah.


>>: It seems it should be very hard to find to argue that any [indiscernible]
that you have [indiscernible] extremely sudden [indiscernible] if you, for
example, use
[indiscernible] or if you use [indiscernible]. Seems
--



>> S. Sudarshan: Yes.


>>: How you would find all that, you know, that you could have
[indiscernible].



12

>> S. Sudarshan: So we are assuming that the procedures which you call are not
going to

have that side effect. So if, yeah, reordering the way in which you
do things. So if you have the procedure which was in the first part of the
loop and a procedure in the second part, earlier they will run in lockstep.
Procedure one, procedure two, one
, two, one, two. So now it's one many times,
two many times.


So absolutely, it will break. If they have side effects, the whole thing will
break.


>>: No, but my question was [indiscernible] won't have the side effects.


>> S. Sudarshan: So there
are two parts, right. If you do a full fledged
interpretive analysis, we could actually see what those procedures do and make
sure they don't have side effects. Current implementation doesn't do that
fully. The reason we don't do it is at least with the

tool we are using, it's
very slow.


The problem is it doesn't just look at our procedures. It starts and it goes
deep into all the system libraries which are there and starts analyzing all the
library, which is crazy.


>>: Right.


>> S. Sudarshan:
So we need a better tool which will just look at our
procedures. And for the system things, you know, [indiscernible] contract
about what side effects it has or doesn't have. So really it should work at
that level and then it would be efficient. Because

the time is [indiscernible]
purpose here is not much, but the problem is it goes into all the libraries.


So current implementation doesn't actually do that. So we are assuming
whatever procedures there are side
-
effect free. But the remaining parts, the

dependencies within the loop is what we are actually making sure is okay.


Okay. So the motivation is obviously asynchronous submission can improve
performance and it is, in fact, widely used. Ajax is very widely used. It's
also true that it's very har
d to program in Ajax. So it's okay for simple
stuff, but if you want to do more complex things, you need a bunch of very
smart programmers, and your average application programmer is not that smart.



13

So one of our goals, in addition to what we have been d
oing, is to take an
application which would run at a client, you know, tablets or whatever,
[indiscernible] or [indiscernible] so Java script or whatever other language
application and turn synchronous calls into asynchronous calls. We're not
there yet.
We would like to do it.


Okay. So asynchronous, you all know what this is. I'm going to skip this.
This is for a different audience. Yeah?


>>: Has this been [indiscernible].


>> S. Sudarshan: Yes.


>>: Asynchronous?


>> S. Sudarshan: Yeah.

So the programming
--

actually, the web services
community has certainly studied this. Obviously, it is important for them.
But whatever work we have seen has focused on straight line code. So if you
have a sequence of calls, then they would do [indisc
ernible] asynchronous
submission ahead of time for that thing.


But if you had a loop, then whatever techniques we've found do not work. So
database people tend to do loops over some data and execute a number of queries
and that is something which the web

services people somehow have not paid
attention to. But for the straight line case, indeed, there's been work from
quite a while back. 2003, 4, even, there's been work.


Okay. So the ICDE paper had the following contributions. Like in the batch
case,
it automatically transforms the program. There is a statement reordering
algorithm which is applicable both to the batch and to the asynchronous thing.
In the earlier paper, we didn't have any guarantees. It was like the best
effort.


Later on, we devel
oped an algorithm which could guarantee that variable
reordering was possible which would allow us to split the loop. It would, in
fact, do that. So there's a corresponding theorem in Ravi's thesis in detail
and briefly mentioned in the paper. I won't g
et into it.


Then there's the APA, as I said, and we also talk about some of the design
challenges involved in making this happen. Yeah?


14


>>: [indiscernible] contracting the efficiencies that the programmer may have?
For example, he may not have added

a [indiscernible] limit in this is query.
But you can look at the program and realize that he really only wants to
[indiscernible].


>> S. Sudarshan: That's a nice idea, but thanks for [indiscernible]. It would
be
--

yeah.


>>: [Inaudible].


>> S
. Sudarshan: So here's basically the same program as before. This time,
what we have done is we have a handle, which is for queries which have been
submitted. The handle is actually used to fetch the result from there, and
what we do here is in sort of
adding the query to a batch, we do a submit query
over here, and the submit query immediately returns a handle, which we save in
this handle area. And then the second part of the loop simply goes over the
different handles. It does a fetch result on that

handle, and then finishes up
the loop.


Again, this is simplified to work for this program. Actual rewriting doesn't
look like this. We do have the loop context and all that in there. But
conceptually, this is what happens.


So conceptually, API will e
xecute a submit query, execute
--

sorry, execute is
the blocking one, which is split into a submit and a fetch. Yeah?


>>: [Inaudible].


>> S. Sudarshan: Right. So that is a parameter which we can control as part
of a configuration. So the submit q
uery actually doesn't [indiscernible] go
and send the query. It simply adds it to
--

I think I have a picture here.
Yeah. So submit query is simply added to a submit queue and then there's a
bunch of worker processes over here. So we can control how ma
ny there are.


I believe the current versions of the [indiscernible] actually has an
asynchronous query submit. We'll not use it as of now. Our tool is JDBC, so
we don't have it. But if we did it in the dark net framework, we could perhaps
avoided all t
his and used the asynchronous submission.



15

Okay. So each of these threads is synchronous. It blocks until it gets a
result back and then puts it into result area, and then it's fetched over here.


Okay. So what do we have here? This is the same thing a
s we saw before. The
challenge is the same as before, complex program that arbitrarily controls
flow, data dependencies, loop splitting requests, variable values to be stored
in this. So these are all the same problems that we had before. I didn't
expli
citly list all of them before.


But let me say a little bit about what we do with some of these. The data
dependencies, I told you, that are conditions so then we can split it. The
second issue is what about control dependency. What if there is a query
which
is conditional in there? How do we handle it?


So we use a fairly standard trick, which is [indiscernible] anything which is
inside and if
--

into guarded statement. So we have a variable which stores
the result of the if predicate and then each of

the things within the then part
is guided by the
--

that variable being true. And then the else part is guided
by that thing being false.


So we have this usually guarded statement, and the control flow is basically
gone. So conceptually, we do everythi
ng, of course, we could skip the else
part if you're doing the then part. But in this case, we pretend that all the
guards are actually executed. But when we finish our transformation, that is a
second stage where we take the [indiscernible] code and get

it back into Java.
So at that point, we actually go back and create [indiscernible] back. So the
final program doesn't
--

it actually hides all those details.


Okay. So we give here a few of the rules. Now, there were similar rules for
batching, but I

will focus on the rules which we use for the asynchronous part.


The first one is the result of the equivalency rule for loop
/TP*EUGS
. The
second is to convert control dependencies to flow dependencies. This is the
one that I told you with the if
-
then
-
else can be turned into guarded commence.
And then rule C1, C2, C3, which are reordering of statements. Again, I won't
give all the details.


But these, some of these generalize the batching rules and some of these
simplify the batching rules.



16

So I'm go
ing to skip the details. Yeah?


>>: So one thing that you've been showing in the [indiscernible] is scale,
right, because I think it's possible that you [inaudible].


>> S. Sudarshan: Yes.


>>: And how you control that, like why [inaudible].


>
> S. Sudarshan: That's a good question. So we will use the variable and then
reuse the same variable in the next loop. We're actually storing a loop
context object which stores the old value of the variable and keeps it around
until the second loop. So

certainly, there's an increase in the state.


But the thing is how much will this blow up, right? If you had a thousand
iterations of loop, you have, you know, a thousand full blow up of the state.
Typically what we have seen in these programs, the stat
e is just a few bites.
If you have a very complex status, then that could be trouble. And if you had
data structures which are updating and so on, which are also used in the second
part.


So the thing is if variable is not used in the second part, we don
't have to
save its state. So it's only for these things which cross the boundary. So if
you have complex data structures which cross the boundary, then we are in
trouble. So we won't actually split the loop in that case.


But as long as it's simple var
iables whose value we can save and restore, we do
it.


Now, my condition is that the number of iterations of the loop is the no going
to go large. If it were, let's say, 10 million, your program would never have
executed. Try doing 10 million round trips

to a database. Your program
wouldn't have executed. So it's not something you need to worry about.


>>: No, but it could be an issue in the [indiscernible].


>> S. Sudarshan: Yes. Yeah.


>>: [Inaudible].



17

>> S. Sudarshan: So think in this ca
se, since we have, you know, we can
control what is sent and when it comes back. So that can be under the control
of the API. The submit part simply spews out the whole thing. So if the
[indiscernible], we can actually stop sending the queries to
--

we
don't
implement it, but there's no reason why we can't do this.


That, you know, we stop sending queries at some point, wait for the thing
submitted earlier to be consumed and then send more things.


>>: Isn't that true for any of the loops data, you co
uld simply
[indiscernible] better iterations of the loop and sort of package things up in
batches in order to avoid having to materialize?


>> S. Sudarshan: Yes, that is true. So we could
--

the rewriting would be a
little more complex. So you'd have
an outer loop and then an inner loop per
mini batch. So we could do it.


>>: [indiscernible] variables remain the same [indiscernible] arbitrary
changes that are happening in the loop [indiscernible].


>> S. Sudarshan: It's possible we are not exclu
ded. Yeah if you realize that
a variable is simply a counter, then we don't actually have to save its state.
So those are optimizations that are possible but not currently implemented.


Okay. So this is
--

well, actually, pasted two docs together. So t
here's a
little bit of reiteration between batch and this. So I think this particular
one is the loop carried dependencies. So I am going to skip this part of the
slide.


But the thing to notice when we did the earlier paper, the reordering was not
compl
ete. It was just, you know, said you can move this and then if it results
the condition, then you can do the rewriting. But there's no specific
algorithm to say what should you do, how should you do this, in what order do
you do the ordering.


So one of
the contributions in this paper is an algorithm that decides when you
can move something and among the candidates which one to move. And it actually
does this iteratively until it cannot move anything until either the condition
for splitting is satisfied
or it cannot move anything [indiscernible].



18

So this is an example of the dependency graph. So this is a little bit of the
inside study of what happens. So this are the statements corresponding to this
thing over here.


Now, again, over here this is a Ja
va program so the statements here are Java
state
--

lines. But that's not [indiscernible]. We have something closer to
byte code and those are the statements here.


But sticking to the Java statements, we are treating S2 as these two together.
And then
S3 and S4. So let's look at some of the dependencies. The black ones
here are the flow dependencies. That is S2 defining a variable here so
variable count. And S3 uses it. So that is a direct flow dependency.


Then there are other kinds of dependency.

There are anti
-
dependencies. So
over here, there's an anti
-
dependency from S1 to S4. Because S1 is reading
something and later S4 writes to it. So those are the anti
-
dependencies. Then
there are output dependence. In this case, this
--

well, the das
hed ones are
loop carry, which is across iterations of the loop. So over here are the
assignment to category is clobbered by the next loop, which also assigns to the
same thing. So that is an output dependency, but it is loop carried because
it's in the
next iteration so we have a dotted red arrow here.


Similarly, if you see here, category is assigned here. And then in the next
loop, the value of category red is whatever is assigned in the previous loop.
So there's also a flow dependency which is now l
oop carried from S4 to S4. So
this is the kind of things which we graph. I'll skip all the minor details.
But finally, you'll see that from S4 to S1, there is a loop
-
carried flow
dependency which goes from the second part of the loop back to the first p
art.
Because this assigns it and that reads it. So those are the ones which prevent
splitting, and that's what we get rid of by reordering.


And we already have seen this particular example of creating a temp variable.
I'm going to skip that slide.


But

let's see the same thing in terms of what happens to the graph. So what we
have done is added a new statement, S5, which is the temp thing over there, and
S2 has been rewritten to use the temp. So over here, there's a loop
-
carried
flow dependency from t
he second part to the first part. Over here, after doing
this reordering, you will notice that the part where we want to split is this.
S2 is over here. This is the execute query. We want to split the loop into a

19

part that is before it and a part that
is after it. And here, there is no such
dependency going back. And that's where we are able to split the loop.


Okay. So there is the statement reordering algorithm. Again, it takes us in
input, blocking query execution statement. And the basic block,

which is the
loop itself. And wherever possible, it reorders the basic block such that no
loop
-
carried flow dependencies cross the split boundary Sq. And, of course,
program equivalence is preserved. That's the formal definition, a statement of
what it

does.


And again, there are a lot of details here. I will probably not get into all
of them. But I'll just give you an idea of what we are trying to do. So
basically, what we want to do is we want to find a statement which we can move
to some other pla
ce in order to get rid of that loop
-
carried flow dependency.
That's not good.


So the first step is do identify statements which we want to move. So in these
two cases, there's a V1 with a loop carried flow dependency to this. And in
this case, again, t
here's a V1 which is over here. Which feeds back over here.


Now, the second part is where the loop carried flow dependency is from the
query itself to something earlier. Or from the query to something later,
which, in turn, has been going back. So thes
e are the various cases. So we
move the same things around in different cases. We won't get into all the
details.


So the thing is to decide what to move in each of these cases. And then we are
to see what other statements depend on the one which we wan
t to move. Because
if we move something, something else may get affected.


So we have to move it carefully. We can move a set of statements together in
some cases.


So in this case, we identify everything which is dependent and move all of
those past tar
get. And finally, the statement which
--

well, the dependence
statements are moved past target first, and then the statement itself is moved
past the target. Because if you don't do that, we are splitting the loop
there. So this is the last step. Once
we move statement past, we can now
split the loop.



20

Okay. So true dependence cycle in a data dependence graph is a directed cycle
made up of only the flow dependencies and the loop carried flow dependencies.
And the theorem is that if a query execution s
tatement doesn't lie on a true
dependence cycle, then the reorder algorithm is successful in moving things
around.


So this was a guarantee which you didn't have the earlier paper. In the ICDE
paper, we have the algorithm which guarantees this.


Good thin
g is that pretty much all of this is applicable to both blocking and
asynchronous. And, in fact, our API [indiscernible], we have this one API, and
then we have a flag somewhere which says good as batch or do it asynchronously.
The API looks weird. We s
ay at batch, when in the asynchronous case, it
actually goes and executes it. But the nice thing that the transformations are
identical. The API is identical. It's just a flag whether you want to do
batching or this.


Of course, in certain cases you can
not do batching, in which case the flag is
to asynchronous only.


Okay. So that is a quick view of what one of the kinds of things we do to
rewrite the program. And this is an overall flow of what we do. We take the
source Java file, pars it. Well, we
use a [indiscernible] framework which does
all of that. Converts too the jimple representation. Data flow analysis and
def
-
use representation, all of these, the dependency graph, all of this is done
by SOOT up to here.


This part is what we do, apply the

rules to move things around. The thing is
once we move a statement, everything changes. The dependency graph itself
changes. So after we do any such move, we have modified the jimple code, of
course, so we actually have to again do the data flow analys
is so that we
--

dependency information is correct after the move and then we can again apply
more transformation rules.


We decided we are done and then we decompile and give the target Java out.


So this API, like I said, can be auto generated, or it can

be manually used.
So there is a loop context structure in the API which makes it easy to remember
all those variables which were defined in the first part of the loop and used
in the next. And the same API for batching and asynchronous.


21


So this was dem
oed at ICDE also.


So that is a quick overview. Now let's move to the performance, whether all of
this is worth it.


So the batching performance, there is a number of results earlier. I won't use
all of them. But one or two of those are presented here.


So what did we use for doing our analysis? There were two public domain
benchmarks. There was two real world applications, which, one of which the
company was having actual performance problems for a real application. So they
had built it in a modular
fashion, okay. So modular fashion. Everything is an
object and you have to give a stock option to an employee to have a set of
procedures which deals with one employee.


And now stock options are generally given as a batch. You give it to a number
of em
ployees. And it turned out in their application, they used a really
expensive computer with, you know, plenty of memory and everything. And in
spite of that all, they had to process the stock grant, and they were running
out of that window.


So they came

to us and it came out well because we had already been working on
this problem. So that is a good connect.


>>: Did you include yourself among those who got the stock?

[Laughter].


>> S. Sudarshan: No, unfortunately.


>>: [Inaudible].


>> S.
Sudarshan: Actually, that company turned out to be very trouble. Once
we give them the idea, they said thank you and went away. So the idea was we
would work with them for a while and develop it, but I don't know what they
did. Whether they used our id
ea and said bye
-
bye or whether they decided not
to use our idea, I have no idea.


Okay. So this one, as I already told you, was something developed in this
area. So we used a dual
-
core machine. With dual
-
core, we are actually getting

22

a lot of benefit fr
om having multiple threads. There were actually some
experiments on postgreSQL, the asynchronous part, we can do on postgreSQL.
Batching transformation is a little trickier.


So we look at the impact of various things such as iteration counts, the numb
er
of threads, impact of warm versus cold cache, since IO is a big issue.


So one of the things we thought is if we increase the number of threads, there
would be more IOs happening which would destroy the normal sequentiality of a
single execution. It tu
rns out that we thought performance would actually
become much worse. But surprisingly, you know, it didn't become much worse
with either of these systems.


>>: [Inaudible].


>> S. Sudarshan: Yeah, for [indiscernible]. So we actually, this is what
h
appened first. We said okay, let's have a query which does a lot more work as
can. [indiscernible] didn't matter. So I think the data bases are fairly good
at controlling the load internally.


Okay. So here are these things. There are two things which

are cold cache mum
numbers and then two which are the warm cache numbers. This is SQL server. So
if you go here, the original program with four iterations, you can see that the
transformed program is actually running worse with cold cache.


And with war
m cache, the difference is even more. You can't even see the
original program down here. So the bottom line is when the number of
iterations is very low, the batched
--

well, which one is this? This is the
--

I think this is the batched one. No, sorry,

this is the asynchronous one.
That is the threads. So it actually becomes worse, potentially. But if you
see the time was actually very small anyway.


So ten
-
fold increase doesn't mean very much when it's ten milliseconds. But as
the number of iterati
ons increases, you can see that the transform program,
like here, this is cold and this is warm. You can see that the transform
program was like nine seconds, when the original was 50. And in this case, it
was 5.9 seconds, when the original was 46.4 seco
nds. So the improvement is
substantial in the number of iterations is more. And this was with ten
threads.



23

>>: So it doesn't matter, because for the smaller number of situations,
because the run time is so small anyway. Through
-
put, it does matter w
hen
you're ten times more resource intensive. So wonder whether you can
--

it
introduces a new estimation or prediction problem. Based on the program, not
on the database. Would it be on both? Could you try to predict whether
[indiscernible] on the rig
ht or whether you are in the other case on the left?


>> S. Sudarshan: Yeah. So Ravi is actually working on that. I have not been
involved in that part. He's been working with some of the programming language
people to try to do static estimation of
the number of loops that you would
have.


It is static or maybe dynamic based on previous ones. So we need to do some
cost
-
based changes.


The other thing is I'm not sure that this decrease in performance is because
the database is certainly much more ine
fficient. It may just be that
[indiscernible] of setting up an asynchronous call and fetching it. So it may
not have any impact at all on the database. It only impacts the
--



>>: [indiscernible].


>> S. Sudarshan: Yes.


>>: [indiscernible].


>
> S. Sudarshan: Absolutely. Okay. Now, this one is thing impact with number
of threads with one thread
--

well, we are at 46.4. And the time decreased
sharply. Starts levelling off after five. It improved up to somewhere around
here, 30 or 40. And t
hen it started increasing again. So experiments were
done with ten threads. Could have been slightly better maybe with 30 threads.
Or maybe some of that impact.


But for the four iterations, it wouldn't have matters. Doesn't matter whether
30 threads o
r four. It's all the same.


>>: [Inaudible].


>> S. Sudarshan: Hm?



24

>>: How many processors? How many processors?


>> S. Sudarshan: So the database server, I think, was a dual core. That is
here. 64 bit dual core is the database server mach
ine. And the client
--

okay, this doesn't say, but I think the client was just a single core.


>>: [indiscernible].


>> S. Sudarshan: Yeah, I think it had. That's, yeah, I'm pretty sure it had
hyper threading. So it probably would equal into four c
ores, at least.


Okay. That slide is done. Now, this is web service, where we coded this
manually, because our code does not actually recognize web service calls. And
again, here something which took almost 180 seconds, whether the database was
free bas
e with a web service API. And after about 95 threads, you can see
there's improvement. They start levelling off after this.


So there's a lot of potential for this. Now, what about batching versus
asynchronous. If both are applicable, how do they compa
re? Sometimes only
asynchronous is applicable so then this is not relevant. And if you see here,
the first is original, the second is asynchronous, and third is batching. And
as you can see, batching, whether it's applicable, it pretty much outperforms
asynchronous. It's fairly clear, the number of messages you send over the
network is reduced. The database can use a better plan so you should use
batching if at all it is applicable. But if it's not, asynchronous still gives
a substantial improvement.


Okay. So that completes the talk. There are many directions for future
research. The one which we are currently focusing on is this. So whatever I
showed you was a query in a loop. Now, this works for certain applications,
but there's a whole class o
f applications where the query is deep inside a
procedure. So any application which uses, say, hibernate framework, it hides
the SQL underneath. You just see an object model and you [indiscernible] on
the object. And deep inside, it's either doing a SQL

query or it is looking up
something that is already cached. It's a [indiscernible] object or it's
looking it up.


On this case, what can we do? If you had a loop or multiple objects and
whatever you're doing actually required running a query to fetch an

object,
well, we would like to prefetch those things.


25


So we can't do this exactly the same kind of, you know, execution of the query
asynchronously like we did, but what we can do is if you recognize that inside
a procedure
--

so here is a loop outside.

Here's a stack of calls and deep
inside a SQL query [indiscernible] are being executed. Let's say it's likely
to be executed. And we know what the query is.


And if you can trace the way in which parameters are passed down so that the
parameters to th
e query is actually available up here in the stack, we can
actually issue a prefetch all the way at the beginning. And then we don't
touch the second part. We are not splitting loops. We're not doing anything.
So this is what is nice about the new appr
oach. Very non
-
intrusive. The
earlier one actually did a lot of program rewriting. Now, the new rewriting is
to issue prefetch call. So that's work in progress which is coming out quite
well so far.


The second one is which calls to transform. As Jare
d was saying, we need to
figure out how many loops are there and then decide whether to split or not.


Minimizing memory overheads, there's some [indiscernible] optimization,
certainly.


How many threads to use. So our experiment showed something, but thi
s is
always the case. Maybe it depends on the load of the database server. So it's
already heavily loaded and you start throwing a lot more work on it, you may be
causing trouble. And then you're not actually using it immediately.


So can we control thi
s in a slightly more sensible manner?


And the last part is actually quite interesting, transactions with work David
and Jared here. So this is a big issue. We swept several things under the
carpet. The first thing that we swept under the carpet is even

for the simple
read
-
only case, each of these threads like opens a fresh connection to the
database. Now how do you guarantee that all of these are running under the
same transaction?


So it turns out that in theory, you can use the [indiscernible] interf
ace to
make all of these part of the same actual transaction. It also turns out that
many databases don't actually support this feature. So it's a bit of a
problem.


26


So that is supported, we can use it.


>>: Some of the things you might, in fact, want

to have separate transactions
and not, in fact, have the database run long transactions but rather run a
bunch of short transactions instead.


>> S. Sudarshan: Yes. So if that is what you want, then we are already doing
it. We are ignoring the effect

of taking one
--

what used to be one
transaction, we could do one connection. But if you use auto commit and each
one was independent, then there's no issue at all.


>>: You need to pursue it for semantics.


>>: Oh, sure, of course [indiscernible]
several transactions it's the same
effect.


>>: But most of the so
-
called default transactions are transaction per
statement, which [indiscernible] another transaction. That's involved unless
you do something explicit.


>> S. Sudarshan: So that case,
you know, we do handle as long is it's
read
-
only. If you do updates synchronously and things are [indiscernible]
you're in deep trouble. So we obviously don't issue updates asynchronously.


>>: So it's safe to turn a bunch of dis
/PRAEURT

transactions
into one big
transaction, but it could some bad effects on [inaudible].


>> S. Sudarshan: Yeah. So if you're
--

we can, since we have access to the
API, we can easily figure out whether your original program ran under the
single transaction. So in that

case, we take all these connections and try to
shoe horn them into one transaction. But as I said, it appears not to work
quite right on the databases we've tried. So we've not been able to get it
running so far.


So with [indiscernible] that's [indisce
rnible]. With snapshot isolation, with
read
-
only, seems like a natural thing to say that here are all these
connections. Let them all use the same snapshot. Now, if the database decided
to support this, it would be completely [indiscernible]. The [indi
scernible]
for the database to do it. But we need an API from the database to allow this.


27


Okay. So that's it. Any other questions?


>>: So one of the things that [indiscernible] background in compilers and
optimization, and I think an interesting qu
estion is to what extent the kind of
program transformations you're doing might be of use, whether or not you've got
a database program down in the bottom of the loop of some size. What sort of
conditions can you put on the things and what sort of general
ity can you do to,
in fact, get program transformations which might affect over scenarios as well
as the database scenario?


>> S. Sudarshan: That's a good question. It's, indeed, something we talked
about. So the first concern was maybe all of this h
ad already been done in the
programming language community. So we just use it and substitute, you know,
add statement is substitute the by JDBC call. It turns out that these are very
high overhead, setting up loop context and so on is very high overhead.

This
is not something that any compiler writer who is saying whatever put in in
order to [indiscernible]. So most of the work we do makes sense only if
whatever you're doing is extensive. So that is something they have not done.
But the specific analy
sis for loop splitting and so on, that has some
similarities with, you know, the parallel compiler where you want to take
something which has multiple iterations and then turn it into a parallel
execution. So some of the analysis is very similar to work t
hat happens in
parallel compilers.


>>: Any other questions?


>> S. Sudarshan: Thank you.