Data Mining and Its Applications to Image Processing

pancakesnightmuteΤεχνίτη Νοημοσύνη και Ρομποτική

5 Νοε 2013 (πριν από 4 χρόνια και 8 μέρες)

124 εμφανίσεις

1

Data Mining and Its Applications to
Image Processing


資料挖掘技術及其在影像處理之應用

指導教授:

Chang, Chin
-
Chen (
張真誠
)

研究生:

Lin, Chih
-
Yang (
林智揚
)

Department of Computer Science and Information Engineering,

National Chung Cheng University

2

The Fields of Data Mining


Mining Association Rules


Sequential Mining


Clustering (Declustering)


Classification


……………

3

Outline


Part I: Design and Analysis Data Mining
Algorithms



Part II: Data Mining Applications to
Image Processing

4

Part I: Design and Analysis Data
Mining Algorithms

1. Perfect Hashing Schemes for Mining Association Rules
(or for Mining Traversal Patterns)

5

Mining Association Rules


Mining Association Rules


Support


Obtain Large Itemset


Confidence


Generate Association Rules


6

TID

Items

100

A C D

200

B C E

300

A B C E

400

B E

D

Scan

D

Itemset

Sup.

{A}

2

{B}

3

{C}

3

{D}

1

{E}

3

C1

Itemset

Sup.

{A}

2

{B}

3

{C}

3

{E}

3

L1

Itemset

{A B}

{A C}

{A E}

{B C}

{B E}

{C E}

Scan

D

Itemset

Sup.

{A B}

1

{A C}

2

{A E}

1

{B C}

2

{B E}

3

{C E}

2

Itemset

Sup.

{A C}

2

{B C}

2

{B E}

3

{C E}

2

C2

C2

L2

Itemset

{B C E}

Scan

D

Itemset

Sup.

{B C E}

2

Itemset

Sup.

{B C E}

2

C3

C3

L3

Apriori

Sup=2

7

Apriori Cont.


Disadvantages


Inefficient


Produce much more useless candidates

8

DHP


Prune useless candidates in advance


Reduce database size at each iteration

9

C1

Count

{A}

2

{B}

3

{C}

3

{D}

1

{E}

3

L1

{A}

{B}

{C}

{E}

Min sup=2

Making a hash table

100

{A C}

200

{B C},{B E},{C E}

300

{A B},{A C},{A E},{B C},{B E},{C E}

400

{B E}

H{[x y]}=((order of x )*10+(order of y)) mod 7;

{A E}

{B E}

{C E}

{B C}

{B E}

{A C}

{C E}

{B C}

{B E}

{A B}

{A C}

3

0

2

0

3

1

2

0

1

2

3

4

5

6

1

0

1

0

1

0

1

Hash table H2

Hash address

The number of items hashed to bucket 0

Bit vector

TID

Items

100

A C D

200

B C E

300

A B C E

400

B E

D

10

Perfect Hashing Schemes (PHS)
for Mining Association Rules

11

Motivation


Apriori and DHP produce C
i

from L
i
-
1

that may be the bottleneck



Collisions in DHP



Designing a perfect hashing function for
every transaction databases is a thorny
problem

12

Definition


Definition.

A
Join

operation is to join two different
(k
-
1)
-
itemsets, , respectively, to produces a k
-
itemset, where




= p
1
p
2

p
k
-
1



= q
1
q
2

q
k
-
1
and


p
2
=q
1
, p
3
=q
2
,

,p
k
-
2
=q
k
-
3
, p
k
-
1
=q
k
-
2.



Example: ABC, BCD


3
-
itemsets of ABCD:
ABC
, ABD, ACD,
BCD


only one pair that satisfies the
join

definition


1
1
k
S

2
1
k
S

13

Algorithm


PHS (
P
erfect
H
ashing and Data
S
hrinking)

14

Example1 (sup=2)

TID


Items


100


ACD


200


BCE


300


BCDE


400


BE


TID


Items


100


(CD)


200


(BC) (BE)(CE)


300


(BC)(BD)(BE)(CD)(CE)(DE)


400


(BE)


Itemsets


(BC)


(BD)


(BE)


(CD)


(CE)


(DE)


Support


2


1


3


2


2


1


Encoding


A


B


C


D


Original


(BC)


(BE)


(CD)


(CE)


Itemset

Sup.

{B}

3

{C}

3

{D}

2

{E}

3

L1

2 2
( ) ( ) ( ( ) ( )) 1
n n-index(X)
hash X,Y C C index Y index X
    
15

Example2 (sup=2)

TID


Items


100


Null


200


(AD)


300


(AC)(AD)


400


Null


Itemsets


(AB)


(AC)


(AD)


(BC)


(BD)


(CD)


Support


0


1


2


0


0


0


Encoding


A


Original


(AD)


Decode: AD
-
> (BC)(CE) = BCE

2 2
( ) ( ) ( ( ) ( )) 1
n n-index(X)
hash X,Y C C index Y index X
    
16

Problem on Hash Table


Consider a database contains
p
transactions, which
are comprised of unique items and are of equal
length
N
, and the minimum support of 1.



At iteration
k
, the # of candidate
k
-
itemsets is


The # of buckets required in the next pass is= ,
where
m

=


While the actual # of the next candidates is



Loading density :

N
k
pC
2
pm
C
N
k
C
1
N
k
pC

2( )
,
( 1)( 1)
N k
pm N
pm k

 
17

How to Improve the Loading Density


Two level perfect hash scheme

(parital hash)




A


B


C


Hash Table


C


D


Null


Null


Count


1


2








Itemsets


(AB)


(AC)


(AD)


(BC)


(BD)


(CD)


Support


0


1


2


0


0


0


18

Experiments

T5I4D200K
0
20
40
60
80
100
1.5
1.25
1
0.75
0.5
0.25
Minimum Support (%)
Time (sec)
PHS
DHP
Apriori
T20I4D100K
0
500
1000
1500
2000
2500
1.25
1
0.75
0.5
0.25
Minimum Support (%)
Time (sec)
PHS
DHP
MPHP
19

Experiments

Increasing Number of Transactions
0
100
200
300
400
500
200K
400K
600K
800K
1000K
Number of Transactions
Time (sec)
T5I4 (PHS)
T10I6 (PHS)
T5I4 (DHP)
T10I6 (DHP)
20

Experiments

T15I8D500K
100
200
300
400
500
600
700
800
1.5
1.25
1
0.75
0.5
Support (%)
Time (sec)
Direct Hash
Partial Hash
T15I8D500K (sup=0.5%)
0
100
200
300
400
2
3
4
5
6
7
8
9
10
Passes
Memory usage (MB)
Direct Hash
Partial Hash
21

Part II: Data Mining Applications
to Image Processing

1. A Prediction Scheme for Image Vector Quantization


based on Mining Association Rules

2. Reversible Steganography for VQ
-
compressed Images


Using Clustering and Relocation

3. A Reversible Steganographic Method Using SMVQ Approach


based on Declustering

22

A Prediction Scheme for Image
Vector Quantization Based on
Mining Association Rules

23

Vector Quantization (VQ)

Image encoding and decoding techniques

block
Original Image
Closest
Codeword
xxxx
xxxx
xxxx
Codebook
block
VQ Index
Table
Lookup
Index
i
Recovered Image
VQ Encoding
VQ Decoding
block
block
Original Image
Closest
Codeword
xxxx
xxxx
xxxx
Codebook
xxxx
xxxx
xxxx
Codebook
block
block
VQ Index
Table
Lookup
Index
i
Recovered Image
VQ Encoding
VQ Decoding
24

SMVQ
(cont.)

U13
L4
X1
U14
U15
U16
X2
X3
X4
L8
X5
L12
X9
L16
X13
U
L
X
Codebook

State Codebook

25

Framework of the Proposed Method

v
/10

(Quantized)

26

Condition

Horizontal,

Vertical,

Diagonal,

Association

Rules

If

X



y'

, there is no such rule
X'



y'
,


where
X'



X

and
y'

=
y
.

27

The Prediction Strategy

Y
V
x,y
D
x
,
y
H
x,y
G
(
x
,
y
)
X
28

Example

Y
3
1
2
3
2
3
4
2
6
4
5
3
1
12
12
?
X
Matched set of rules

Matched
vertical rules

Matched
horizontal rules

Matched
diagonal rules

(4, 2, 3, 3


5)

conf
v

= 90%

(12, 12, 1, 3


5)

conf
h

= 90%

(6, 4, 2, 2, 3


5)

conf
d
=100%

(4, 2, 3


1)

conf
v

= 85%

(12, 12


1)

conf
h

= 95%

(6, 4, 2, 2


8
)

conf
d
=70%

X

X

(6, 4, 2


10)

conf
d

= 75%

Rules DB

Query

Result

? may be 5, 1, 8, or 10. How to decide?

29

Example cont.

Y
3
1
2
3
2
3
4
2
6
4
5
3
1
12
12
?
X
Matched set of rules

Matched vertical
rules

Matched
horizontal rules

Matched diagonal
rules

(4, 2, 3, 3


5)

conf
v

= 90%

(12, 12, 1, 3


5)

conf
h

= 90%

(6, 4, 2, 2, 3


5)

conf
d
=100%

(4, 2, 3


1)

conf
v

= 85%

(12, 12


1)

conf
h

= 95%

(6, 4, 2, 2


8
)

conf
d
=70%

X

X

(6, 4, 2


10)

conf
d

= 75%

The weight of 5: 4*90%+4*90%+5*100%=
12.2

The weight of 1: 3*85%+2*95% =
4.45

The weight of 8: 4*70% =
2.8

The weight of 10: 3*75% =
2.25

{5, 1} is called the
consequence list
, which

size is determined by the user

( ) { * * * }
v h d
c v c h c d
w c l conf l conf l conf
  
30

Experiments

Reconstructed image by full
-
search VQ


Reconstructed image by the
proposed method

Original Image

31

Experiments cont.




Performance

Lena

Pepper

F16

Full
-
search VQ

PSNR (dB)

32.25

31.41

31.58

Bit
-
rate (bpp)

0.5

0.5

0.5

SMVQ

PSNR (dB)

28.57

28.04

27.94

Bit
-
rate (bpp)

0.33

0.32

0.33

Our Scheme

PSNR (dB)

30.64

30.05

29.74

Bit
-
rate (bpp)

0.33

0.32

0.34

The performance comparisons on various methods


32

Experiments cont.

28
28.5
29
29.5
30
30.5
31
31.5
32
40
30
20
10
5
1
Support (%) (log|
k
1
| =2bits)
PSNR
Overfitting problem

33

Advantages


Mining association rules can be applied to
image prediction successfully



Broader spatial correlation is considered than
that of SMVQ



More efficient than that of SMVQ since no
Euclidean distances should be calculated

34

Reversible Steganography for VQ
-
compressed Images Using Clustering
and Relocation

35

Flowchart of the Proposed Method

G
0
G
1
State codebook
SMVQ
Skip
No
Yes
X

36

Construction of the Hit Map

7

13

4

7

3

11

Sorted codebook

0

0

0

0

0

0

0

0

Hit map

0

0

1

4

13

1

2

6

13

4

7

0

7

1

.

.

.

37

Clustering Codebook

Assume that the size of a codebook is 15:
cw
0
,
cw
1
,

,
cw
14

C
1
:

cw
0
,
cw
1
,
cw
3
,
cw
6
,
cw
8
,
cw
10

C
2
:

cw
4
,
cw
14

C
3
:

cw
2
,
cw
5
,
cw
9

C
4
:
cw
12

C
5
:
cw
7
,
cw
11
,
cw
13


Clustering:

38

Relocation


cw
0
,
cw
1

cw
3
,
cw
6

cw
8
,
cw
10


cw
4
,
cw
14

cw
2
,
cw
5


cw
9

cw
7
,
cw
11


cw
13

cw
12

U

L

cw
14

Assume that the size of the state codebook is 4

39

Embedding

cw
14

cw
12

cw
1

cw
2

cw
6

cw
3

cw
10

cw
8

cw
6

Secret bits: 1011

cw
4

cw
12

cw
0

cw
2

cw
6

cw
5

cw
3

cw
8

cw
1

Only the codewords in
G
0

can embed the secret bits

The codewords in
G
1

should be replaced with the codewords in
G
2

40

Extraction & Reversibility

cw
14

cw
12

cw
1

cw
2

cw
6

cw
3

cw
10

cw
8

cw
6

Secret bits:

cw
4

cw
12

cw
0

cw
2

cw
6

cw
5

cw
3

cw
8

cw
1

1

0

1

1

recover

41

Experiments

Method

Measure

Lena

Pepper

Sailboat

Baboon

Modified Tian’s

method

PSNR (dB)

26.92

26.45

25.05

22.70

Payload
(bits)

2777

3375

3283

2339

MFCVQ

PSNR (dB)

28.03

26.43

26.60

24.04

Payload
(bits)

5892

5712

5176

1798

Proposed method

PSNR (dB)

30.23

29.15

28.00

24.04

Payload
(bits)

8707

8421

7601

3400

12 hit maps (600 bits), 250 clusters

42

Experiments

Tian

s method

MFCVQ

43

Single hit
map

Multiple hit maps

without clustering

Using clustering and
multiple hit maps

44

Experiments

29
29.5
30
30.5
31
31.5
32
4
8
12
16
20
32
Size of the state codebook
PSNR (dB)
7413
8707
8706
8586
8396
7966
Using Lena as the cover image

45

A Reversible Steganographic
Method Using SMVQ Approach


based on Declustering

46


1


0

CW
1

CW
8

Find the most dissimilar pairs

(De
-
clustering)

CW
2

CW
9

CW
3

CW
10

CW
5

CW
12

CW
6

CW
13

CW
7

CW
14

CW
4

CW
11

Dissimilar

CW
0
CW
1
CW
2
CW
3
CW
4
CW
9
CW
10
CW
11
CW
12
CW
6
CW
7
CW
8
CW
14
CW
15
CW
13
CW
5




47

U13
L4
X1
U14
U15
U16
X2
X3
X4
L8
X5
L12
X9
L16
X13
U
L
X
Embedding Using Side
-
Match

CW
1

CW
8

Assume
X

=

CW
1

V
0

= ((U13+L4)/2, U14, U15, U16, L8, L12, L16)

d
1
=Euclidean_Distance(V
0
, V
1
)

V
1

= (X1, X2, X3, X4, X5, X9, X13)
CW1

V
8

= (X1, X2, X3, X4, X5, X9, X13)
CW8

d
8
=Euclidean_Distance(V
0
, V
8
)

:Dissimilar Pair

If (d
1
<d
8
), then Block
X

is replaceable

Otherwise, Block
X

is non
-
replaceable

48


A secret message: 1 0 1 0 1 0 0 1 0 1 1 1 1 0 0 0

Index Table

CW
1
, CW
2
,

CW
3
, CW
4

CW
5
, CW
6

CW
7
,
CW
15

CW
8
, CW
9

CW
10
, CW
11

CW
12
, CW
13

CW
14
,
CW
0


1


0

Secret bits

1

1

0

0

0

1

0

1

1

1

1

0

0

1

0

If (d
6
<d
13
)

0

Embedding Result

6

49


A secret message: 1 0 1 0 1 0 0 1 0 1 1 1 1 0 0 0

Index Table

CW
1
, CW
2
,

CW
3
, CW
4

CW
5
, CW
6

CW
7
,
CW
15

CW
8
, CW
9

CW
10
, CW
11

CW
12
, CW
13

CW
14
,
CW
0


1


0

Secret bits

1

1

0

0

0

1

0

1

1

1

1

0

0

1

0

If (d
2
<d
9
)

0

Embedding Result

6

9

50


A secret message: 1 0 1 0 1 0 0 1 0 1 1 1 1 0 0 0

Index Table

CW
1
, CW
2
,

CW
3
, CW
4

CW
5
, CW
6

CW
7
,
CW
15

CW
8
, CW
9

CW
10
, CW
11

CW
12
, CW
13

CW
14
,
CW
0


1


0

Secret bits

1

1

0

0

0

1

0

1

1

1

1

0

0

1

0

If (d
12
>=d
5
)

0

Embedding Result

6

9

15||12

CW
15
: embed 1

51


A secret message: 1 0 1 0 1 0 0 1 0 1 1 1 1 0 0 0

Index Table

CW
1
, CW
2
,

CW
3
, CW
4

CW
5
, CW
6

CW
7
,
CW
15

CW
8
, CW
9

CW
10
, CW
11

CW
12
, CW
13

CW
14
,
CW
0


1


0

Secret bits

1

1

0

0

0

1

0

1

1

1

1

0

0

1

0

If (d
9
>=d
2
)

0

Embedding Result

6

9

15||12

0||9

CW
0
: embed 0

52


Extraction and Recovery

Steganographic Index Table

CW
1
, CW
2
,

CW
3
, CW
4

CW
5
, CW
6

CW
7
,
CW
15

CW
8
, CW
9

CW
10
, CW
11

CW
12
, CW
13

CW
14
,
CW
0


1


0

Extract Secret bits

1

Recovery

6

6

9

15||12

0||9

If (d
6
<d
13
)

53


Extraction and Recovery

Steganographic Index Table

CW
1
, CW
2
,

CW
3
, CW
4

CW
5
, CW
6

CW
7
,
CW
15

CW
8
, CW
9

CW
10
, CW
11

CW
12
, CW
13

CW
14
,
CW
0


1


0

Extract Secret bits

1

Recovery

6

6

9

15||12

0||9

If (d
9
>=d
2
)

2

0

54


Extraction and Recovery

Steganographic Index Table

CW
1
, CW
2
,

CW
3
, CW
4

CW
5
, CW
6

CW
7
,
CW
15

CW
8
, CW
9

CW
10
, CW
11

CW
12
, CW
13

CW
14
,
CW
0


1


0

Extract Secret bits

1

Recovery

6

6

9

15||12

0||9

2

12

0

1

55


Extraction and Recovery

Steganographic Index Table

CW
1
, CW
2
,

CW
3
, CW
4

CW
5
, CW
6

CW
7
,
CW
15

CW
8
, CW
9

CW
10
, CW
11

CW
12
, CW
13

CW
14
,
CW
0


1


0

Extract Secret bits

1

Recovery

6

6

9

15||12

0||9

2

12

0

1

0

9

56

Find Dissimilar Pairs

X
Y
CW
1
CW
2
CW
3
CW
12
CW
13
CW
14
CW
9
CW
10
CW
7
CW
6
CW
11
CW
4
CW
5
CW
8
X

PCA projection

57

Improve Embedding Capacity

Partition into more groups

58

Experiments

The number of non
-
replaceable blocks: 139

The number of original image blocks:128*128=16384

Codebook size: 512

Codeword size: 16

59

Experiments

The number of non
-
replaceable blocks: 458

The number of original image blocks:128*128=16384

Codebook size: 512

Codeword size: 16

60

Experiments

Images

Tian’s
method

MFCVQ

Chang et
al.’s method

Proposed
Method

(3 groups)

Proposed
Method

(9 groups)

Proposed
Method

(17 groups)

Lena

2,777

5,892

10,111

16,129

45,075

55,186

Baboon

2,339

1,798

4,588

16,129

36,609

39,014

Embedding capacity

Image

Lena

Methods

Tian’s

method

MFCVQ

Chang et al.’s method

Proposed mehtod

Time
(sec)

0.55

1.36

Size of the state codebook

Number of groups

4

8

16

32

3

5

9

17

14.59

29.80

58.8

161.2

0.11

0.13

0.14

0.19

Time Comparison

61

Future Research Directions


Extend the proposed reversible
steganographic methods to other image
formats



Apply perfect hashing schemes to other
applications

62

Thanks all