Data Structures and Algorithm Analysis

dealerdeputyΤεχνίτη Νοημοσύνη και Ρομποτική

25 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

59 εμφανίσεις




CSCI 256


Data Structures and Algorithm Analysis


Lecture 9





Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley
all rights reserved, and some by Iker Gondra


Shortest Path Problem


Negative Cost Edges


Dijkstra’s algorithm assumes positive cost edges


For some applications, negative cost edges make sense


Shortest path not well defined if a graph has a negative cost
cycle


Bellman
-
Ford algorithm

finds shortest paths in a graph with
negative cost edges (or reports the existence of a negative cost
cycle).



a

b

c

s

e

g

f

4

2

-
3

6

4

-
2

3

4

6

3

7

-
4

Minimum Spanning Tree


Minimum spanning tree
: Given a connected graph G =
(V, E) with real
-
valued edge weights c
e
, a MST is a
subset of the edges T


E
such that T is a spanning tree
(tree which spans G) whose sum of edge weights is
minimized.



5

23

10

21


14

24


16


6


4

18

9

7

11


8


5


6


4

9

7

11


8

G = (V, E)

T
,

e

T

c
e
= 50

Applications


MST is a fundamental problem with diverse applications


Network design


telephone, electrical, hydraulic, TV cable, computer, road


Approximation algorithms for NP
-
hard problems


traveling salesperson problem, Steiner tree


Indirect applications


max bottleneck paths


LDPC codes for error correction


image registration with Renyi entropy


learning salient features for real
-
time face verification


reducing data storage in sequencing amino acids in a protein


model locality of particle interactions in turbulent fluid flows


autoconfig protocol for Ethernet bridging to avoid cycles in a
network


Cluster analysis

Greedy Algorithms


Kruskal's algorithm


Start with T =

. Consider edges in ascending order of cost.
Insert edge e in T unless doing so would create a cycle


Reverse
-
Delete algorithm


Start with T = E. Consider edges in descending order of cost.
Delete edge e from T unless doing so would disconnect T


Prim's algorithm


Start with some root node s and greedily grow a tree T from s
outward. At each step, add the cheapest edge e to T that has
exactly one endpoint in T



Remark
: All three algorithms produce a MST

Greedy Algorithm 1:

Kruskal’s Algorithm


Add the cheapest edge that joins disjoint
components

9

2

13

6

4

11

5

7

20

14

15

10

1

8

12

16

22

17

3

t

a

e

c

g

f

b

s

u

v

Construct the MST
with Kruskal’s
algorithm

Label the edges in
order of insertion

Greedy Algorithm 2:

Reverse
-
Delete Algorithm


Delete the most expensive edge that does not
disconnect the graph

9

2

13

6

4

11

5

7

20

14

15

10

1

8

12

16

22

17

3

t

a

e

c

g

f

b

s

u

v

Construct the MST
with the reverse
-
delete algorithm

Label the edges in
order of removal

Greedy Algorithm 3:

Prim’s Algorithm


Extend a tree by including the cheapest out
going edge

9

2

13

6

4

11

5

7

20

14

15

10

1

8

12

16

22

17

3

t

a

e

c

g

f

b

s

u

v

Construct the MST
with Prim’s
algorithm starting
from vertex a

Label the edges in
order of insertion

Why do the greedy algorithms work?


All these algorithms work by repeatedly inserting
or deleting edges from a partial solution


Thus to analyze these algorithms, it would be useful
to have in hand some basic facts saying when it is
“safe” to include an edge in the MST or when it is
“safe” to eliminate an edge on the grounds that it
couldn’t possible be in the MST



For simplicity, assume all edge costs are
distinct. Thus, we can refer to “the MST”

When is it safe to
include
an edge in the
MST?


Edge inclusion lemma (also called the “Cut
property”)

Let S be a subset of V, and suppose e = (u, v) is the
minimum cost edge of E, with u in S and v in V
-
S. Then
e is in every MST T of G.

S

V
-

S

e

Proof: (we show the contrapositive)


Suppose T is a spanning tree that
does not
contain e
. We need to show that T does not
have the minimum possible cost


We do this using an exchange argument


we
will identify an edge e
1

in T that is more
expensive than e and with the property that
exchanging e for e
1

results in a spanning tree
that is cheaper than T


The crux is to find this e
1


Proof: (we show the contrapositive)


Edge e is incident to v (in S) and w (in V
-
S); T is
a spanning tree so there is a path P in T from v
to w. Starting at v follow the nodes in sequence
until we get the first node w’ in V
-
S. Let v’ be the
node just before w’ in P and let e
1
be (v’,w’).


Consider: T’ = T


{e
1
} + {e}


We can show that:


T’ is a spanning tree (show it is connected and
acyclic)


T’ has lower cost




Proof (we show the contrapositive)


Easy to see that T’ is connected;


Only cycle in T’ + {e
1
} must be composed of e and the
path P so if we remove e
1
we have an acyclic subgraph


e is the minimum cost edge between S and V
-
S







T’ = T


{e
1
} + {e} is a spanning tree with lower cost than
T (as we have exchanged the more expensive e
1


Hence, T is not a minimum spanning tree





S

V
-

S

e

e
1

e is the
minimum cost
edge between
S and V
-
S

Optimality Proofs


Prim’s Algorithm computes a MST


Kruskal’s Algorithm computes a MST



Idea of both proofs:
Show that when an edge is
added to the MST by Prim or Kruskal, the edge
is the minimum cost edge between S and V
-
S
for some the set S of nodes (which increases
with each addition of edges until it equals V)

Prim’s Algorithm (grow a tree, T)

S = { s }; T = { };

while S != V


choose the minimum cost edge

e = (u,v), with u in S, and v in V
-
S


add e to T


add v to S


Prove Prim’s algorithm computes an MST


(1) The algorithm only adds edges belonging to
every MST.


On each iteration there is a set S, which is a subset of
V on which a partial spanning tree has been
constructed and a node v and edge e have been
added to minimize min
(u in S: e = (u,v))
c
e
. By definition e is
the cheapest edge with one end in S and the other in
V
-
S so by the Cut Property it is in every minimum
spanning tree of G.

(
2) The algorithm produces a spanning tree


-

Clear

Kruskal’s Algorithm (grow bigger connected
sets, with the minimum cost edge available
)

Let C = { C
1

={v
1
}, C
2

= {v
2
}, . . ., C
n

= {v
n
} }; T = { }

while |C| > 1


Let e = (u, v) with u in C
i

and v in C
j

be the

minimum cost edge joining (the disjoint and

disconnected) sets in C


Replace C
i

and
Cj
by their union C’
I


Add e to T



Prove Kruskal’s algorithm computes a MST

(1) An edge e is in the MST when it is added to T.


Since sets we begin with are disjoint and as we find
edges between any two we redefine the sets so they
remain disjoint from each other, this follows by the
“Cut Property”



(2) The process continues until there is only one
connected set containing all the vertices


so the set
spans G


When can we guarantee an edge is not in
the MST?


Cycle Property


The most expensive edge on a cycle is never in a
MST







Optimality of Reverse
-
Delete algorithm follows from
this


S

V
-

S

e

e
1

e is the most
expensive edge
on a cycle
involving S and
V
-
S

Proof of the Cycle Property (also uses an
exchange argument!)

Proof: Suppose C is a cycle and e = (
v,w
) is its most
expensive edge. We proceed by contradiction:



Assume e is in a MST T of G
.


If we delete e, we partition the nodes of T into two sets,
S and V


S, with v in S and w in V
-
S.


Since we began with a cycle, there must be another
edge e’ with one end in S and one end in V
-
S. e was the
most expensive edge, so e’ is cheaper. We exchange e
for e’ in resulting in T’.


T’ spans G and its cost is less that T.


This contradicts fact that T was a MST of G


Dealing with the assumption of no equal
weight edges


Force the edge weights to be distinct


Add small quantities to the weights


Give a tie breaking rule for equal weight edges

Clustering


Clustering
: Given a set U of n objects labeled p
1
,
…, p
n
, classify into coherent groups



Distance function
: Numeric value specifying
"closeness" of two objects



Fundamental problem
: Divide into clusters so
that points in different clusters are far apart


Identify patterns in gene expression


Document categorization for web search


Similarity searching in medical image databases



e.g., photos, documents. micro
-
organisms

e.g., number of corresponding pixels whose

intensities differ by some threshold

Clustering of Maximum Spacing


Distance function
: Assume it satisfies several natural
properties


d(p
i
, p
j
)
= 0 iff p
i

= p
j


(identity of indiscernibles)


d(p
i
, p
j
)


0



(nonnegativity)


d(p
i
, p
j
)
=
d(p
j
, p
i
)




(symmetry)


Spacing
: Min distance between any pair of points in
different clusters


Clustering of maximum spacing
: Given integer k, find a
k
-
clustering of maximum spacing



spacing

k = 4

Divide into 2 clusters

Divide into 3 clusters

Divide into 4 clusters

Greedy Clustering Algorithm


Distance clustering algorithm


Form a graph on the vertex set U as follows: (where the connected
components are the clusters
--

without any edges you would have n
clusters)


First draw an edge between the closest pair of points, then draw an
edge between the next closest pair of points and keep adding edges
between pairs of points of increasing d(p
i
,p
j
). The connected
components correspond to clusters, no need to add edge between any
pairs of points in the same cluster (thus avoiding cycles)


Repeat until there are exactly k clusters



Key observation
: This procedure is precisely Kruskal's algorithm
(except we stop when there are k connected components)



Remark
: Equivalent to finding a MST and deleting the k
-
1 most
expensive edges (if we take away k
-
1 edges from a spanning tree
we will then leave k connected components)


Distance Clustering Algorithm


like
Kruskal’s Algorithm

Let C = {{v1}, {v2},. . ., {vn}}; T = { }

while |C| > k


Let e = (u, v) with u in Ci and v in Cj be the

minimum cost edge
joining disjoint sets in C


Replace Ci and Cj by C’i = Ci U Cj






K
-
clustering

More Greedy Algorithms:

Coin Changing vs Stamp Buying


Goal
: Given currency denominations: 1, 5, 10, 25, 100, devise a
method to pay amount to customer using fewest number of coins


Ex
: 34¢



Cashier's algorithm
: At each iteration, add coin of the largest value
that does not take us past the amount to be paid


Ex
: $2.89






Theorem
: Greedy is optimal for U.S. coinage: 1, 5, 10, 25, 100


Question
: Is Greedy algorithm is optimal for US postal
denominations: 1, 10, 21, 34, 70, 100, 350, 1225, 1500?