Machine Learning
Applications in Grid Computing
George Cybenko, Guofei Jiang and Daniel Bilar
Thayer School of Engineering
Dartmouth College
22th Sept.,1999, 37th Allerton Conference
Urbana

Champaign, Illinois
Acknowledgements:
This work was partially supported by AFOSR grants
F49620

97

1

0382, NSF grant CCR

9813744 and
DARPA contract F30602

98

2

0107.
Grid vision
Grid computing refers to computing in a distributed
networked environment in which computing and data
resources are located throughout the network.
Grid infrastructures p
rovide basic infrastructure for
computations that integrate geographically disparate
resources,
create a universal source of computing
power that supports dramatically new classes of
applications.
Several efforts are underway to build
computational
grids
such as Globus, Infospheres and DARPA
CoABS
.
Client
Server
Matchmaker
Advertise
Service
Location
Request
Reply
Request
Service
Grid services
A fundamental capability required in grids is a
directory service or broker that dynamically matches
user requirements with available resources.
Prototype of grid services
Matching conflicts
Brokers and matchmakers use keywords and domain
ontologies to specify services.
Keywords and ontologies cannot be defined and
interpreted precisely enough to make brokering or
matchmaking between grid services robust in a truly
distributed, heterogeneous computing environment.
Matching conflicts exist between client’s
requested
functionality and service provider’s actual
functionality.
An example
A client requires a three

dimensional FFT. A request
is made to a broker or matchmaker for a FFT service
based on the keywords and possibly parameter lists.
The broker or matchmaker uses the keywords to
retrieve its catalog of services and returns with the
candidate remote services.
Literally dozens of different algorithms for FFT
computations with different assumptions, dimensions,
accuracy, input

output format and so on.
The client must validate the actual functionality of
these remote services before the client commits to
use it.
Functional validation
Functional validation
means that a client presents to a
prospective service provider a sequence of
challenges.
The service provider replies to these challenges with
corresponding
answers
.
Only after the client is satisfied
that the service provider’s answers are consistent with
the client’s expectations is an actual commitment
made to using the service.
Three steps:
–
Service
identification
and
location
.
–
Service
functional validation.
–
Commitment
to the service
Our approach
Challenge the service provider with some test cases
x
1
,
x
2
, ...,
x
k
.
The remote service provider R offers
the corresponding answers f
R
(x
1
), f
R
(x
2
), ..., f
R
(x
k
).
The client C may or may not have independent
access to the answers f
C
(x
1
), f
C
(x
2
), ..., f
C
(x
k
).
Possible situations and machine learning models:
–
C “knows” f
C
(x) and R provides f
R
(x).
•
PAC learning and Chernoff bounds theory
–
C “knows” f
C
(x) and R does not provide f
R
(x).
•
Zero

knowledge proof
–
C does not “know” f
C
(x) and R provides f
R
(x).
•
Simulation

based learning and reinforcement learning
Mathematical framework
The goal of PAC learning is to use few examples as
possible, and as little computation as possible to pick a
hypothesis concept which is a close approximation to the
target concept.
Define a concept to be a boolean mapping . X
is the input space. c(x)=1 indicates x is a positive
example , i.e. the service provider can offer the “correct”
service for challenge x.
Define an index function
Now define the error between the target concept
c
and
the hypothesis
h
as .
Mathematical framework(cont’d)
The client can randomly pick
m
samples to PAC learn a
hypothesis
h
about whether the service provider can offer the
“correct” service
.
Theorem 1
(Blumer et.al.)
Let
H
be any hypothesis space of
finite VC dimension
d
contained in ,
P
be any
probability distribution on
X
and the target concept
c
be any
Borel set contained in
X
. Then for any , given the
following
m
independent random examples of
c
drawn
according to
P
, with probability at least , every
hypothesis in
H
that is consistent with all of these examples
has error at most
.
Simplified results
Assuming that with regard to some concepts, all test cases have
the same probability about whether the service provider can offer
the “correct” service.
Theorem 2
(Chernoff bounds):
Consider independent identically
distributed samples , from a Bernoulli distribution with
expectation . Define the empirical estimate of based on these
samples as
Then for any , if the sample size , then
the probability .
Corollary 2.1
:
For the functional validation problem described
above, given any , if the sample size , then
the probability .
Simplified results(cont’d)
Given a target probability P, the client needs to know how
many positive consecutive samples are required so that
the next request to the service will be correct with
probability P.
So probabilities , and P have the following inequality:
Formulate the sample size problem as the following
nonlinear optimization problem:
s.t. and
Simplified results(cont’d)
From the constraint inequality,
Then transfer the above two
dimensional function
optimization problem to the
one dimensional one:
s.t.
Elementary nonlinear
functional optimization
methods.
Mobile Functional Validation Agent
User
Interface
User
Agent
Interface
Agent
Computing
Server A
Mobile
Agent
Machine A
Machine
C, D, E, ...
Create
Send
Jump
A’s Service
Correct
Incorrect
B’s Service
Correct
Incorrect
C, D, E, …..
MA
Interface
Agent
Computing
Server B
MA
Machine B
MA
Correct
Service
Future work and open questions
Integrate functional validation into grid computing
infrastructure as a standard grid service.
Extend to other situations described(like zero

knowledge proofs, etc.).
Formulate functional validation problems into more
appropriate mathematical models.
Explore solutions for more difficult and
complicated functional validation situations.
Thanks!!
Comments 0
Log in to post a comment