# Data Structures and Algorithm Analysis

Τεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 4 χρόνια και 5 μήνες)

80 εμφανίσεις

CSCI 256

Data Structures and Algorithm Analysis

Lecture 9

Shortest Path Problem

Negative Cost Edges

Dijkstra’s algorithm assumes positive cost edges

For some applications, negative cost edges make sense

Shortest path not well defined if a graph has a negative cost
cycle

Bellman
-
Ford algorithm

finds shortest paths in a graph with
negative cost edges (or reports the existence of a negative cost
cycle).

a

b

c

s

e

g

f

4

2

-
3

6

4

-
2

3

4

6

3

7

-
4

Minimum Spanning Tree

Minimum spanning tree
: Given a connected graph G =
(V, E) with real
-
valued edge weights c
e
, a MST is a
subset of the edges T

E
such that T is a spanning tree
(tree which spans G) whose sum of edge weights is
minimized.

5

23

10

21

14

24

16

6

4

18

9

7

11

8

5

6

4

9

7

11

8

G = (V, E)

T
,

e

T

c
e
= 50

Applications

MST is a fundamental problem with diverse applications

Network design

telephone, electrical, hydraulic, TV cable, computer, road

Approximation algorithms for NP
-
hard problems

traveling salesperson problem, Steiner tree

Indirect applications

max bottleneck paths

LDPC codes for error correction

image registration with Renyi entropy

learning salient features for real
-
time face verification

reducing data storage in sequencing amino acids in a protein

model locality of particle interactions in turbulent fluid flows

autoconfig protocol for Ethernet bridging to avoid cycles in a
network

Cluster analysis

Greedy Algorithms

Kruskal's algorithm

. Consider edges in ascending order of cost.
Insert edge e in T unless doing so would create a cycle

Reverse
-
Delete algorithm

Start with T = E. Consider edges in descending order of cost.
Delete edge e from T unless doing so would disconnect T

Prim's algorithm

Start with some root node s and greedily grow a tree T from s
outward. At each step, add the cheapest edge e to T that has
exactly one endpoint in T

Remark
: All three algorithms produce a MST

Greedy Algorithm 1:

Kruskal’s Algorithm

Add the cheapest edge that joins disjoint
components

9

2

13

6

4

11

5

7

20

14

15

10

1

8

12

16

22

17

3

t

a

e

c

g

f

b

s

u

v

Construct the MST
with Kruskal’s
algorithm

Label the edges in
order of insertion

Greedy Algorithm 2:

Reverse
-
Delete Algorithm

Delete the most expensive edge that does not
disconnect the graph

9

2

13

6

4

11

5

7

20

14

15

10

1

8

12

16

22

17

3

t

a

e

c

g

f

b

s

u

v

Construct the MST
with the reverse
-
delete algorithm

Label the edges in
order of removal

Greedy Algorithm 3:

Prim’s Algorithm

Extend a tree by including the cheapest out
going edge

9

2

13

6

4

11

5

7

20

14

15

10

1

8

12

16

22

17

3

t

a

e

c

g

f

b

s

u

v

Construct the MST
with Prim’s
algorithm starting
from vertex a

Label the edges in
order of insertion

Why do the greedy algorithms work?

All these algorithms work by repeatedly inserting
or deleting edges from a partial solution

Thus to analyze these algorithms, it would be useful
to have in hand some basic facts saying when it is
“safe” to include an edge in the MST or when it is
“safe” to eliminate an edge on the grounds that it
couldn’t possible be in the MST

For simplicity, assume all edge costs are
distinct. Thus, we can refer to “the MST”

When is it safe to
include
an edge in the
MST?

Edge inclusion lemma (also called the “Cut
property”)

Let S be a subset of V, and suppose e = (u, v) is the
minimum cost edge of E, with u in S and v in V
-
S. Then
e is in every MST T of G.

S

V
-

S

e

Proof: (we show the contrapositive)

Suppose T is a spanning tree that
does not
contain e
. We need to show that T does not
have the minimum possible cost

We do this using an exchange argument

we
will identify an edge e
1

in T that is more
expensive than e and with the property that
exchanging e for e
1

results in a spanning tree
that is cheaper than T

The crux is to find this e
1

Proof: (we show the contrapositive)

Edge e is incident to v (in S) and w (in V
-
S); T is
a spanning tree so there is a path P in T from v
to w. Starting at v follow the nodes in sequence
until we get the first node w’ in V
-
S. Let v’ be the
node just before w’ in P and let e
1
be (v’,w’).

Consider: T’ = T

{e
1
} + {e}

We can show that:

T’ is a spanning tree (show it is connected and
acyclic)

T’ has lower cost

Proof (we show the contrapositive)

Easy to see that T’ is connected;

Only cycle in T’ + {e
1
} must be composed of e and the
path P so if we remove e
1
we have an acyclic subgraph

e is the minimum cost edge between S and V
-
S

T’ = T

{e
1
} + {e} is a spanning tree with lower cost than
T (as we have exchanged the more expensive e
1

Hence, T is not a minimum spanning tree

S

V
-

S

e

e
1

e is the
minimum cost
edge between
S and V
-
S

Optimality Proofs

Prim’s Algorithm computes a MST

Kruskal’s Algorithm computes a MST

Idea of both proofs:
Show that when an edge is
added to the MST by Prim or Kruskal, the edge
is the minimum cost edge between S and V
-
S
for some the set S of nodes (which increases
with each addition of edges until it equals V)

Prim’s Algorithm (grow a tree, T)

S = { s }; T = { };

while S != V

choose the minimum cost edge

e = (u,v), with u in S, and v in V
-
S

Prove Prim’s algorithm computes an MST

(1) The algorithm only adds edges belonging to
every MST.

On each iteration there is a set S, which is a subset of
V on which a partial spanning tree has been
constructed and a node v and edge e have been
(u in S: e = (u,v))
c
e
. By definition e is
the cheapest edge with one end in S and the other in
V
-
S so by the Cut Property it is in every minimum
spanning tree of G.

(
2) The algorithm produces a spanning tree

-

Clear

Kruskal’s Algorithm (grow bigger connected
sets, with the minimum cost edge available
)

Let C = { C
1

={v
1
}, C
2

= {v
2
}, . . ., C
n

= {v
n
} }; T = { }

while |C| > 1

Let e = (u, v) with u in C
i

and v in C
j

be the

minimum cost edge joining (the disjoint and

disconnected) sets in C

Replace C
i

and
Cj
by their union C’
I

Prove Kruskal’s algorithm computes a MST

(1) An edge e is in the MST when it is added to T.

Since sets we begin with are disjoint and as we find
edges between any two we redefine the sets so they
remain disjoint from each other, this follows by the
“Cut Property”

(2) The process continues until there is only one
connected set containing all the vertices

so the set
spans G

When can we guarantee an edge is not in
the MST?

Cycle Property

The most expensive edge on a cycle is never in a
MST

Optimality of Reverse
-
Delete algorithm follows from
this

S

V
-

S

e

e
1

e is the most
expensive edge
on a cycle
involving S and
V
-
S

Proof of the Cycle Property (also uses an
exchange argument!)

Proof: Suppose C is a cycle and e = (
v,w
) is its most
expensive edge. We proceed by contradiction:

Assume e is in a MST T of G
.

If we delete e, we partition the nodes of T into two sets,
S and V

S, with v in S and w in V
-
S.

Since we began with a cycle, there must be another
edge e’ with one end in S and one end in V
-
S. e was the
most expensive edge, so e’ is cheaper. We exchange e
for e’ in resulting in T’.

T’ spans G and its cost is less that T.

This contradicts fact that T was a MST of G

Dealing with the assumption of no equal
weight edges

Force the edge weights to be distinct

Add small quantities to the weights

Give a tie breaking rule for equal weight edges

Clustering

Clustering
: Given a set U of n objects labeled p
1
,
…, p
n
, classify into coherent groups

Distance function
: Numeric value specifying
"closeness" of two objects

Fundamental problem
: Divide into clusters so
that points in different clusters are far apart

Identify patterns in gene expression

Document categorization for web search

Similarity searching in medical image databases

e.g., photos, documents. micro
-
organisms

e.g., number of corresponding pixels whose

intensities differ by some threshold

Clustering of Maximum Spacing

Distance function
: Assume it satisfies several natural
properties

d(p
i
, p
j
)
= 0 iff p
i

= p
j

(identity of indiscernibles)

d(p
i
, p
j
)

0

(nonnegativity)

d(p
i
, p
j
)
=
d(p
j
, p
i
)

(symmetry)

Spacing
: Min distance between any pair of points in
different clusters

Clustering of maximum spacing
: Given integer k, find a
k
-
clustering of maximum spacing

spacing

k = 4

Divide into 2 clusters

Divide into 3 clusters

Divide into 4 clusters

Greedy Clustering Algorithm

Distance clustering algorithm

Form a graph on the vertex set U as follows: (where the connected
components are the clusters
--

without any edges you would have n
clusters)

First draw an edge between the closest pair of points, then draw an
edge between the next closest pair of points and keep adding edges
between pairs of points of increasing d(p
i
,p
j
). The connected
components correspond to clusters, no need to add edge between any
pairs of points in the same cluster (thus avoiding cycles)

Repeat until there are exactly k clusters

Key observation
: This procedure is precisely Kruskal's algorithm
(except we stop when there are k connected components)

Remark
: Equivalent to finding a MST and deleting the k
-
1 most
expensive edges (if we take away k
-
1 edges from a spanning tree
we will then leave k connected components)

Distance Clustering Algorithm

like
Kruskal’s Algorithm

Let C = {{v1}, {v2},. . ., {vn}}; T = { }

while |C| > k

Let e = (u, v) with u in Ci and v in Cj be the

minimum cost edge
joining disjoint sets in C

Replace Ci and Cj by C’i = Ci U Cj

K
-
clustering

More Greedy Algorithms:

Goal
: Given currency denominations: 1, 5, 10, 25, 100, devise a
method to pay amount to customer using fewest number of coins

Ex
: 34¢

Cashier's algorithm
: At each iteration, add coin of the largest value
that does not take us past the amount to be paid

Ex
: \$2.89

Theorem
: Greedy is optimal for U.S. coinage: 1, 5, 10, 25, 100

Question
: Is Greedy algorithm is optimal for US postal
denominations: 1, 10, 21, 34, 70, 100, 350, 1225, 1500?