Information Geometry and Machine Learning: An Invitation Prof. Jun Zhang (University of Michigan Ann Arbor) Part A (Information Geometry): 3 lectures Part B (Machine Learning Application): 2 lectures

beadkennelAI and Robotics

Oct 15, 2013 (3 years and 7 months ago)

74 views

Information Geometry
and Machine Learning
: An Invitation


Prof. Jun Zhang (University of Michigan Ann Arbor)


Part A (Information Geometry): 3 lectures

Part B (Machine Learning Application): 2 lectures

Optional: (Open Research Questions): 1 lecture


Information geometry is the differential geometric study of the manifold of probabil
ity
density functions (or

probability distribut
ions

on discrete support)
. From a geometric
perspective, a parametric
family of probability density functions
on a sample spa
ce
is
modeled

as
a differentiable manifold
, where points

on the manifold
represent
the
density
func
tions
themselves
and coordinates represent
the
indexing parameters.
Information
Geometry is seen as an emerging
tool for providing a unified perspective to

m
a
ny
branches of information science
, including coding, statistics,
machine
learning,
inference
and decision, etc.



This serial lectures

will provide an introduction

to the fundamental concepts in
Information Geometry

as well as
a sample application to mac
hine learning
.
Each lecture
will be 2 hours.
Part A will introduce foundation of information geometry, including

topics like:
Kullback
-
Leibler divergence and Bregman divergence, Fisher
-
Rao metric,
conjugate (dual) connections, alpha
-
connections,
statistical manifold, curvature, dually
-
flat manifold, exponential family, natural parameter/expectation parameter, affine
immersion, equiaffine geometry, centro
-
affine immersion, alpha
-
Hessian manifold,
symplectic, Kahler, and Einstein
-
Weyl structures of
information

systems, etc.

Part B will
start with the
regularized learning framework, with the introduction of
reproducing kernel
Hilbert space,
semi
-
inner product,
reproduc
ing kernel Banach space,
representer theorem,
feature map, kernel
-
trick, support vec
tor machine, l1
-
regularization and sparsity,
etc.
Application of information geometry to kernel methods will be discussed at the end of
this mini
-
course, as are other open research questions.


Students
at

advanced undergraduate and
graduate levels are wel
com
ed.

The instructor
looks forward to recruiting motivated mathematics students to work in this exciting new
area of applied mathematics.


PREREQUISITE:

A first course in differential geometry is expected for Part A. Real analysis or
function analysis is

expected for Part B.


MATERIALS:

(A.1) S. Amari and H. Nagaoka (2000). Method of Information Geometry. AMS
monograph vol 191. Oxford University Press.

(A.2) U. Simon, A. Schwenk
-
Schellschmidt, and H. Viesel. (1991). Introduction to the
Affine Differential

Geometry of Hypersurfaces. Science University of Tokyo Press.

(A.3) Zhang, J. (2004).
Divergence function, duality, and convex analysis.
Neural
Computation
,
vol
16
,

159
-
195.

(B.1) S. Amari and S. Wu (1999). Improving support vector machine classifiers by

modifying kernel functions. Neural Networks, vol 12(no 6), 783
-
789.

(B.2) F. Cucker and S. Smale (2001). On the mathematical foundation of learning.
Bulletin of the American Mathematical Society, vol 39 (no.1), 1
-
49.

(B.3) T. Poggio and S. Smale (2003
). The mathematics of learning: Dealing with data.
Notice of the American Mathematical Society, vol 50 (no.5), 534
-
544.