Information Geometry
and Machine Learning
: An Invitation
Prof. Jun Zhang (University of Michigan Ann Arbor)
Part A (Information Geometry): 3 lectures
Part B (Machine Learning Application): 2 lectures
Optional: (Open Research Questions): 1 lecture
Information geometry is the differential geometric study of the manifold of probabil
ity
density functions (or
probability distribut
ions
on discrete support)
. From a geometric
perspective, a parametric
family of probability density functions
on a sample spa
ce
is
modeled
as
a differentiable manifold
, where points
on the manifold
represent
the
density
func
tions
themselves
and coordinates represent
the
indexing parameters.
Information
Geometry is seen as an emerging
tool for providing a unified perspective to
m
a
ny
branches of information science
, including coding, statistics,
machine
learning,
inference
and decision, etc.
This serial lectures
will provide an introduction
to the fundamental concepts in
Information Geometry
as well as
a sample application to mac
hine learning
.
Each lecture
will be 2 hours.
Part A will introduce foundation of information geometry, including
topics like:
Kullback

Leibler divergence and Bregman divergence, Fisher

Rao metric,
conjugate (dual) connections, alpha

connections,
statistical manifold, curvature, dually

flat manifold, exponential family, natural parameter/expectation parameter, affine
immersion, equiaffine geometry, centro

affine immersion, alpha

Hessian manifold,
symplectic, Kahler, and Einstein

Weyl structures of
information
systems, etc.
Part B will
start with the
regularized learning framework, with the introduction of
reproducing kernel
Hilbert space,
semi

inner product,
reproduc
ing kernel Banach space,
representer theorem,
feature map, kernel

trick, support vec
tor machine, l1

regularization and sparsity,
etc.
Application of information geometry to kernel methods will be discussed at the end of
this mini

course, as are other open research questions.
Students
at
advanced undergraduate and
graduate levels are wel
com
ed.
The instructor
looks forward to recruiting motivated mathematics students to work in this exciting new
area of applied mathematics.
PREREQUISITE:
A first course in differential geometry is expected for Part A. Real analysis or
function analysis is
expected for Part B.
MATERIALS:
(A.1) S. Amari and H. Nagaoka (2000). Method of Information Geometry. AMS
monograph vol 191. Oxford University Press.
(A.2) U. Simon, A. Schwenk

Schellschmidt, and H. Viesel. (1991). Introduction to the
Affine Differential
Geometry of Hypersurfaces. Science University of Tokyo Press.
(A.3) Zhang, J. (2004).
Divergence function, duality, and convex analysis.
Neural
Computation
,
vol
16
,
159

195.
(B.1) S. Amari and S. Wu (1999). Improving support vector machine classifiers by
modifying kernel functions. Neural Networks, vol 12(no 6), 783

789.
(B.2) F. Cucker and S. Smale (2001). On the mathematical foundation of learning.
Bulletin of the American Mathematical Society, vol 39 (no.1), 1

49.
(B.3) T. Poggio and S. Smale (2003
). The mathematics of learning: Dealing with data.
Notice of the American Mathematical Society, vol 50 (no.5), 534

544.
Comments 0
Log in to post a comment