Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

addictedswimmingΤεχνίτη Νοημοσύνη και Ρομποτική

24 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

89 εμφανίσεις

Notes on Sequence Binary Decision Diagrams:


Relationship to Acyclic Automata and

Complexities of Binary Set Operations

Shuhei Denzumi
1
, Ryo Yoshinaka
2, 1
,

Shin
-
ichi Minato
1,2
, and Hiroki Arimura
1


1) Hokkaido University

2) JST ERATO Minato Discrete Structure Manipulation System Project


Background

Researches on string processing become active.

Massive online data: The internet and sensing networks.

String matching and string mining problems.


Data mining

Input data should be represented in compact form

Computation under compressed structure is needed

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

Data

Structure

Input

Result

Operation

Compress

Input

Input


Manipulatable

& Compact

Manipulatable

Compact data structure

Represent data in compressed form

Have operations to manipulate data in compacted style

Get much attention for recent years

Binary Decision Diagram (BDD)

LSI area

Deterministic Finite Automata (DFA)

Natural Language Processing area

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

Data

Structure

Input

Input

Input

Compaction

D 1

D 2

D 3

Operation

Sequence Binary Decision Diagram (SeqBDD, SDD).

Loekito, Bailey, and Pei (2009)

Graph structure

Represent finite sets of strings

with finite length

SDD’s basic properties are unknown

Minimization

Size complexity

Operation time

Application

Data mining

Graph mining

Human genome sequencing


What is Sequence BDD?

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

Sequence

Binary

Decision

Diagram

Text

Text

Text




Family of BDDs

Compact representation for discrete structure

With rich algebraic operations

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

SDD [Loekito, et.al 2009]

Sets of strings



{a, b,
ab
,
bab
,
abbab
}

{
abc
,
acb
,
bac
,
bca
}

ZDD [Minato 1993]

Sets of combinations




{{a}, {b}, {a, b}}

{{a}, {b}, {c}, {a, b, c}}

BDD [Bryant 1986]

Boolean functions




xy



yz



zx


xyz


x

yz



xy

z

Relationship to Acyclic Deterministic Finite Automata (ADFA)

Translation from an SDD to an ADFA and vice versa

An SDD is never larger than an ADFA

An SDD can be |Σ| times smaller than an ADFA


Computational complexity of binary set operations

Generalize eight set operations

Tight analysis on time complexity for binary set operation algorithm


Experimental results

SDDs can be smaller than ADFAs

Binary operation time


Result

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

Preliminary

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011


Definition

Σ: alphabet (t
otally ordered by

)

Internal node: , , , , 1/0
-

terminal node: /

1/0
-

edge: /

SDD: directed acyclic graph

Internal node S, τ(S)




S.lab, S.1, S.0



S.lab: label

S.1: 1
-
child

S.0: 0
-
child


Ordering rule

N.lab


(N.0).lab

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

a

b

1

0



z

S.0

S.lab

S

S.1

a

b

z









1

0

a

b

c

L(N): set of strings N represents


L( ) = {ε}

L( ) = {}

L(N) = N.lab

L(N.1)





L(N.0)


A path from the root to

the 1
-
terminal node

represent a string.


Semantics

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

1

0

a

b

b

{ε}

{}

{b}

{
,
}

{
bb
}

a

{
a
,
a
,
bb
}

1

0

a

b

b

{ε}

{}

{b}

{a, b
}

{
bb
}

a

{
a
a
,
a
b
, bb}

1

0

a

b

b

{ε}

{}

{b}

{a, b
}

{bb}

a

{
a
a
,
a
b
, bb}

1

0

a

b

b

{ε}

{}

{b}

{a, b
}

{bb}

a

{
a
a
,
a
b
,
}

0

1






accept state




reject state



Comparison to ADFA

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

0

1

c

a

a

c

a

c

a

c

1

0

a

b

b

{
,
}

{
bb
}

a

{
a
,
a
,
bb
}

a

b
a

b

b

{
,
}

{
b
}

{
a
,
a
,
bb
}


Reduction process

Suppression

N.1

0
-
terminal node

In ADFA, removing edges

pointing dead state

Merging

τ(N) = τ(N’)


N = N’

In ADFA, share all

equivalent nodes


Theorem

Under these rules, SDD is unique and minimal

Like ADFA’s have unique canonical form

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

N.0

x

N

N.1

x

N’

N.0

x

N

N.1

0

a

N.0

N.0

a

{}


L(N.0) = L (N.0)

Almost isomorphic to Acyclic Deterministic Finite Automata

BDD/ZDD techniques are applicable


Binary form

Simple recursive algorithm

Easy to implement


Rich collections of operations


Use of hash tables

To share equivalent nodes

To share intermediate computations


Characteristic

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

BDD/ZDD

ADFA

SDD

Relationship to

Acyclic Automata

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011


Size

An SDD node correspond to an ADFA edge





The description size is proportional to

|N|: the number of internal nodes in SDD N

|A|: the number of edges in ADFA A
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

c

a

a

c


Theorem: Size compare

For equivalent an SDD and an ADFA

From an ADFA A to an SDD N





From an SDD N to an ADFA A





SDD |Σ| times can be smaller than ADFA

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011


0
-
child sharing

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

d

a

a

d

e

b

e

e

d

b


Example

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

1

a

a

a

c

a

a

a

a

a

c

c

c

c

c

|S| = 6

|A| = 14

ADFA A

SDD S

{
a
n
b
i
c
j
,

n = 0, …, 4,
i
, j = 0, 1}

c


Experiment

Input: Canterbury corpus

BibleAll
: bible.txt,
BibleBi
: all bigrams from bible.txt,
Ecoli
:
E.coli.txt

Fac

means store all
fanctors

of input data

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

0.6
0.7
0.8
0.9
1.0
0
1,000,000
2,000,000
SDD size / DFA size

Input size (byte)

Size ratio

BibleAll
BibleBi
BibleAll (Fac)
BibleBi (Fac)
Ecoli (Fac)
Binary Set

Operation Algorithm

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011


Set operation

A binary set operation





{

,

,

, …}


Input: two SDDs P, Q


Output: SDD R

such that

L(R) = L(P)


L(Q)


Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

P

Q

P


Q

Binary Set Operation


Apply algorithm

Originally for BDD [Bryant 1986], applied to SDD

Based on the definition L(N) = N.lab


L(N.1)


L(N.0)

In operation, (when P.lab = Q.lab)

L(P)


L(Q) = P.lab


(L(P.1)


L(Q.1
))


(
L(P.0)


L(Q.0))

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

P
1

P
0

a

Q
1

Q
0

a

P
1

Q
1

P
0

Q
1

a



P

Q

P

Q


Hash table technique

Key
-
Value hash

tables

Uniquetable

Key:

letter x, SDD node N
1
, SDD node N
0


Value: SDD node N with τ(N) =

x, N
1
, N
0


Opcache

Key:

operation id

, SDD node P, SDD node Q


Value: SDD node R which is R = P


Q

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

N0

x

N

N1

Key (triple)


x, N1, N0


Value (node)

N

Key (triple)



, P, Q


Value (node)

R

Uniquetable

Opcache

P

Q

P


Q




Node create process

Any SDD node needed during computation is

created via this process


Once an internal node is registered in
Uniquetable
,

equivalent nodes will not created anymore.

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

Check the
Uniquetable

for key

x, N1, N0

.

Exist

Not exist

Return it.

Create a new node and

return it.


Time complexity

When P


Q is executed

Every operation use
Opcache

At most |P|
×

|Q| different instances of recursive calls invoke

(Assume that the access time to hash tables is constant)

Naïve method

Prepare |P|
×

|Q| size table

This method

No useless or redundant node

Theorem

Worst case O(|P| |Q|) time

Example needs

Ω(|P| |Q|) time exist

Lower and upper bound got



Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

Check the
Opcache


for key


, P, Q

.

Exist

Not exist

P


Q is

already done,

return it.

Continue to

computation

on 0
-
side

and 1
-
side.


Experiment

Operation time

Prepare two SDDs for all factors of random texts of length n

Time to compute operation

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

0
200
400
600
800
1000
1200
1400
1600
0
20000
40000
60000
80000
100000
Execution time(ms)

Length of
text(letter)

union
intersection
difference

Conclusion

Relationship to Acyclic Automata

An SDD can be |Σ| times smaller than an ADFA

For real data, SDDs are 10~20 % more compact than ADFAs


Computational complexity of binary set operations

Worst case time complexity is quadratic

Tight time bound is analyzed

In our experiment, operation time is almost linear


Future work

Efficient implement of various operations

Propose substring index on SDD

Factor SDD construction algorithm

Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations

by Shuhei Denzumi, Ryo
Yoshinaka
, Shin
-
ichi Minato, and Hiroki
Arimura
, 2011
-
08
-
30 (TUE), Prague Stringology Conference 2011

Thank you!