Notes on Sequence Binary Decision Diagrams:
Relationship to Acyclic Automata and
Complexities of Binary Set Operations
Shuhei Denzumi
1
, Ryo Yoshinaka
2, 1
,
Shin

ichi Minato
1,2
, and Hiroki Arimura
1
1) Hokkaido University
2) JST ERATO Minato Discrete Structure Manipulation System Project
Background
Researches on string processing become active.
Massive online data: The internet and sensing networks.
String matching and string mining problems.
Data mining
Input data should be represented in compact form
Computation under compressed structure is needed
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
Data
Structure
Input
Result
Operation
Compress
Input
Input
Manipulatable
& Compact
Manipulatable
Compact data structure
Represent data in compressed form
Have operations to manipulate data in compacted style
Get much attention for recent years
Binary Decision Diagram (BDD)
LSI area
Deterministic Finite Automata (DFA)
Natural Language Processing area
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
Data
Structure
Input
Input
Input
Compaction
D 1
D 2
D 3
Operation
Sequence Binary Decision Diagram (SeqBDD, SDD).
Loekito, Bailey, and Pei (2009)
Graph structure
Represent finite sets of strings
with finite length
SDD’s basic properties are unknown
Minimization
Size complexity
Operation time
Application
Data mining
Graph mining
Human genome sequencing
What is Sequence BDD?
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
Sequence
Binary
Decision
Diagram
Text
Text
Text
…
Family of BDDs
Compact representation for discrete structure
With rich algebraic operations
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
SDD [Loekito, et.al 2009]
Sets of strings
{a, b,
ab
,
bab
,
abbab
}
{
abc
,
acb
,
bac
,
bca
}
ZDD [Minato 1993]
Sets of combinations
{{a}, {b}, {a, b}}
{{a}, {b}, {c}, {a, b, c}}
BDD [Bryant 1986]
Boolean functions
xy
∨
yz
∨
zx
￢
xyz
∨
x
￢
yz
∨
xy
￢
z
Relationship to Acyclic Deterministic Finite Automata (ADFA)
Translation from an SDD to an ADFA and vice versa
An SDD is never larger than an ADFA
An SDD can be Σ times smaller than an ADFA
Computational complexity of binary set operations
Generalize eight set operations
Tight analysis on time complexity for binary set operation algorithm
Experimental results
SDDs can be smaller than ADFAs
Binary operation time
Result
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
Preliminary
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
Definition
Σ: alphabet (t
otally ordered by
≺
)
Internal node: , , , , 1/0

terminal node: /
1/0

edge: /
SDD: directed acyclic graph
Internal node S, τ(S)
↦
〈
S.lab, S.1, S.0
〉
S.lab: label
S.1: 1

child
S.0: 0

child
Ordering rule
N.lab
≺
(N.0).lab
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
a
b
1
0
…
z
S.0
S.lab
S
S.1
a
b
z
…
≺
≺
≺
1
0
a
b
c
L(N): set of strings N represents
L( ) = {ε}
L( ) = {}
L(N) = N.lab
・
L(N.1)
∪
L(N.0)
A path from the root to
the 1

terminal node
represent a string.
Semantics
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
1
0
a
b
b
{ε}
{}
{b}
{
,
}
{
bb
}
a
{
a
,
a
,
bb
}
1
0
a
b
b
{ε}
{}
{b}
{a, b
}
{
bb
}
a
{
a
a
,
a
b
, bb}
1
0
a
b
b
{ε}
{}
{b}
{a, b
}
{bb}
a
{
a
a
,
a
b
, bb}
1
0
a
b
b
{ε}
{}
{b}
{a, b
}
{bb}
a
{
a
a
,
a
b
,
}
0
1
accept state
reject state
Comparison to ADFA
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
0
1
c
a
a
c
a
c
a
c
1
0
a
b
b
{
,
}
{
bb
}
a
{
a
,
a
,
bb
}
a
b
a
b
b
{
,
}
{
b
}
{
a
,
a
,
bb
}
Reduction process
Suppression
N.1
≠
0

terminal node
In ADFA, removing edges
pointing dead state
Merging
τ(N) = τ(N’)
⇒
N = N’
In ADFA, share all
equivalent nodes
Theorem
Under these rules, SDD is unique and minimal
Like ADFA’s have unique canonical form
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
N.0
x
N
N.1
x
N’
N.0
x
N
N.1
0
a
N.0
N.0
a
・
{}
∪
L(N.0) = L (N.0)
Almost isomorphic to Acyclic Deterministic Finite Automata
BDD/ZDD techniques are applicable
Binary form
Simple recursive algorithm
Easy to implement
Rich collections of operations
Use of hash tables
To share equivalent nodes
To share intermediate computations
Characteristic
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
BDD/ZDD
ADFA
SDD
Relationship to
Acyclic Automata
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
Size
An SDD node correspond to an ADFA edge
The description size is proportional to
N: the number of internal nodes in SDD N
A: the number of edges in ADFA A
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
c
a
a
c
Theorem: Size compare
For equivalent an SDD and an ADFA
From an ADFA A to an SDD N
From an SDD N to an ADFA A
SDD Σ times can be smaller than ADFA
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
0

child sharing
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
d
a
a
d
e
b
e
e
d
b
Example
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
1
a
a
a
c
a
a
a
a
a
c
c
c
c
c
S = 6
A = 14
ADFA A
SDD S
{
a
n
b
i
c
j
,
n = 0, …, 4,
i
, j = 0, 1}
c
Experiment
Input: Canterbury corpus
BibleAll
: bible.txt,
BibleBi
: all bigrams from bible.txt,
Ecoli
:
E.coli.txt
Fac
means store all
fanctors
of input data
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
0.6
0.7
0.8
0.9
1.0
0
1,000,000
2,000,000
SDD size / DFA size
Input size (byte)
Size ratio
BibleAll
BibleBi
BibleAll (Fac)
BibleBi (Fac)
Ecoli (Fac)
Binary Set
Operation Algorithm
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
Set operation
A binary set operation
♢
∈
{
∪
,
∩
,
＼
, …}
Input: two SDDs P, Q
Output: SDD R
such that
L(R) = L(P)
♢
L(Q)
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
P
Q
P
♢
Q
Binary Set Operation
Apply algorithm
Originally for BDD [Bryant 1986], applied to SDD
Based on the definition L(N) = N.lab
・
L(N.1)
∪
L(N.0)
In operation, (when P.lab = Q.lab)
L(P)
♢
L(Q) = P.lab
・
(L(P.1)
♢
L(Q.1
))
∪
(
L(P.0)
♢
L(Q.0))
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
P
1
P
0
a
Q
1
Q
0
a
P
1
♢
Q
1
P
0
♢
Q
1
a
♢
P
Q
P
♢
Q
Hash table technique
Key

Value hash
tables
Uniquetable
Key:
〈
letter x, SDD node N
1
, SDD node N
0
〉
Value: SDD node N with τ(N) =
〈
x, N
1
, N
0
〉
Opcache
Key:
〈
operation id
♢
, SDD node P, SDD node Q
〉
Value: SDD node R which is R = P
♢
Q
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
N0
x
N
N1
Key (triple)
〈
x, N1, N0
〉
Value (node)
N
Key (triple)
〈
♢
, P, Q
〉
Value (node)
R
Uniquetable
Opcache
P
Q
P
♢
Q
♢
Node create process
Any SDD node needed during computation is
created via this process
Once an internal node is registered in
Uniquetable
,
equivalent nodes will not created anymore.
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
Check the
Uniquetable
for key
〈
x, N1, N0
〉
.
Exist
Not exist
Return it.
Create a new node and
return it.
Time complexity
When P
♢
Q is executed
Every operation use
Opcache
At most P
×
Q different instances of recursive calls invoke
(Assume that the access time to hash tables is constant)
Naïve method
Prepare P
×
Q size table
This method
No useless or redundant node
Theorem
Worst case O(P Q) time
Example needs
Ω(P Q) time exist
Lower and upper bound got
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
Check the
Opcache
for key
〈
♢
, P, Q
〉
.
Exist
Not exist
P
♢
Q is
already done,
return it.
Continue to
computation
on 0

side
and 1

side.
Experiment
Operation time
Prepare two SDDs for all factors of random texts of length n
Time to compute operation
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
0
200
400
600
800
1000
1200
1400
1600
0
20000
40000
60000
80000
100000
Execution time(ms)
Length of
text(letter)
union
intersection
difference
Conclusion
Relationship to Acyclic Automata
An SDD can be Σ times smaller than an ADFA
For real data, SDDs are 10~20 % more compact than ADFAs
Computational complexity of binary set operations
Worst case time complexity is quadratic
Tight time bound is analyzed
In our experiment, operation time is almost linear
Future work
Efficient implement of various operations
Propose substring index on SDD
Factor SDD construction algorithm
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations
by Shuhei Denzumi, Ryo
Yoshinaka
, Shin

ichi Minato, and Hiroki
Arimura
, 2011

08

30 (TUE), Prague Stringology Conference 2011
Thank you!
Comments 0
Log in to post a comment