Chip for the DCT Transform

mittenturkeyElectronics - Devices

Nov 26, 2013 (3 years and 9 months ago)

81 views

Design and Implementation of
a High
-
Speed, Low
-
Power VLSI
Chip for the DCT Transform


Project Advisor:


Professor A. Doboli

Participates:



Tak Yuen Lam,


Kit Lam,


Wei Kit Ng,

Ying Lam


Back Ground


The DCT application can have many
purposes:



Filtering


Teleconferencing


high
-
definition television (HDTV)


speech coding, image coding


data compression, and more.

Back Ground


All of these use DCT algorithm for
compression and/or filtering purposes.
The DCT has


energy packing capabilities


approaches the statistically optimal
transform in de
-
correlating a signal.


It was implemented with discrete
components in a chip.

Goal

.


Implementation of a VLSI chip with:


-
high speed


-
low power



compute the 2
-
D Discrete Cosine
Transform (DCT) function of an 8 x 8
element matrix is presented.


Goal


Save Power Consumption during Computing
Operation in the chip:



-
Specially design multiplier with less



computation.



-
Less switching



-
Simplify of the equations.



High Speed Processing:

-
using pipeline technology.

-
Ignore zero’s in the multiplier.



Basic Formula


Forward DCT:





Inverse DCT:



]
2
)
1
2
(
cos
2
)
1
2
(
cos
)
,
(
)[
(
)
(
)
,
(
)
1
(
0
)
1
(
0
N
v
y
N
u
x
y
x
f
v
C
u
C
v
u
F
N
x
N
y











]
2
)
1
2
(
cos
2
)
1
2
(
cos
)
,
(
)
(
)
(
[
)
,
(
)
1
(
0
)
1
(
0
N
v
y
N
u
x
v
u
F
v
C
u
C
y
x
f
N
u
N
v











Basic Formula


C(u) =

,C(v) =


for u,v = 0


C(u) =

,C(v) =


for u,v = 1


through N
-
1;


N = 4, 8, or 16

N
1
N
2
N
1
N
2
1
-
D DCT Matrix




]
[
*
]
[
]
[
X
C
Y















































































































)
7
(
)
6
(
)
5
(
)
4
(
)
3
(
)
2
(
)
1
(
)
0
(
4
1
)
7
(
)
6
(
)
5
(
)
4
(
)
3
(
)
2
(
)
1
(
)
0
(
7
5
3
1
1
3
5
7
6
2
2
6
6
2
2
6
5
1
7
3
3
7
1
5
4
4
4
4
4
4
4
4
3
7
1
5
5
1
7
3
2
6
6
2
2
6
6
2
1
3
5
7
7
5
3
1
4
4
4
4
4
4
4
4
X
X
X
X
X
X
X
X
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
Y
Y
Y
Y
Y
Y
Y
Y
Simplification:
















































































































)
4
(
)
3
(
)
5
(
)
2
(
)
6
(
)
1
(
)
7
(
)
0
(
)
5
(
)
2
(
)
6
(
)
1
(
)
4
(
)
3
(
)
7
(
)
0
(
)
6
(
)
5
(
)
2
(
)
1
(
)
4
(
)
3
(
)
7
(
)
0
(
)
7
(
)
6
(
)
5
(
)
4
(
)
3
(
)
2
(
)
1
(
)
0
(
*
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
1
)
7
(
)
6
(
)
5
(
)
1
(
)
6
(
)
4
(
)
2
(
)
0
(
1
3
5
7
3
7
1
5
5
1
7
3
7
5
3
1
2
6
6
2
4
4
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
Y
Y
Y
Y
Y
Y
Y
Y
The following equations are derived
from the matrix above










4
1
(0) C (X(0)+X(1)+X(2)+X(3)+X(4)+X(5)+X(0)
+X(7))
4
Y



2 6
1
Y(2) = C (X(0)+X(7)-X(3)-X(4)) + C (X(1)
+X(6)-X(2)-X(5))
4


4
1
Y(4) = -C (X(0)+X(7)+X(3)+X(4)-X(1)-X(2)
-X(5)-X(0))
4


6 2
1
Y(6) = C (X(0)+X(7)-X(3)-X(4)) - C (X(1)
+X(6)-X(2)-X(5))
4


1 3 5 7
1
Y(1) = C (X(0)-X(7)) + C (X(1)-X(6)) + C
(X(2)-X(5)) + C (X(3)-X(4))
4


3 7 1 5
1
Y(3) = C (X(0)-X(7)) - C (X(1)-X(6)) - C
(X(2)-X(5)) - C (X(3)-X(4))
4


5 1 7 3
1
Y(5) = C (X(0)-X(7)) - C (X(1)-X(6)) + C
(X(2)-X(5)) + C (X(3)-X(4))
4


7 5 3 1
1
Y(7) = C (X(0)-X(7)) - C (X(1)-X(6)) + C
(X(2)-X(5)) - C (X(3)-X(4))
4
Simplified Equations



Y(0) = c4[j + k+l+m]


Y(2) = c2[j
-
k] + c6[m
-
l]


Y(4) =
-
c4[j+k+l+m]


Y(6) = c6[j
-
k]

c2[m
-
l]


Y(1) = e + f + h + [c+c3
-
c5
-
c7]a


Y(3) = e + g + I + [c1+c3+c5
-
c7]b


Y(5) = e + g + h + [c1+c3
-
c5+c7]c


Y(7) = e + f + I + [
-
c1+c3+c5
-
c7]d








Simplified Equations


a = x0

x7; b= x1
-
x6;


c = x2

x5; d = x3
-
x4


j = x0+x7; k = x1+x6;


l = x2+x5; m= x3+x4


e = c3[a+b+c+d]


f=[c7
-
c3][a
-
d]


g=[
-
c1
-
c3][b+c]


h=[c5
-
c3][a+c]


I=[
-
c5
-
c3][b+d]

1D DCT Flow Chart

Pixel


Memory

Cosine

Matrix

Memory

Multiplier

DCT Coefficients

Register

Bank

Shifter

2D DCT Flow Chart

1 D DCT

Transpose

1 D DCT

Control

1
-
D DCT Architecture(First Version)


X(1)

X(1)

Y(7)

Y(6)

X(3)

X(7)

X(6)

X(4)

X(5)

Y(5)
Y(4)

X(4)

X(7)

X(2)

Y(3)
X(3)

Y(0)

Y(1)
X(6)

X(2)

X(0)

X(0)

X(5)

Y(2)

1
-
D DCT Architecture(Final Version)


X0 X1 X3 X4 X2 X5 X1X6 X0 X7 X3 X4 X2 X5 X1 X6























Y0 Y4 Y2 Y6 Y1 Y7 Y5 Y3



x

+


+


x

-

-


+

x

+


x

+




x

+

-


-

+

+



+

+


x

x




x


-

x


+

+

+

+




+

x

+

x

+

+

+


x

+

+

x

+



x


+

-

+


-

-

State 1

State


2

State 3

Transpose Architecture

OUT 6

OUT 5

OUT 1

Transpose

Component

OUT 4

In 0

In 4

OUT 7

OUT 0

In 1

In 6

In 5

In 2

In 7

In 3

OUT 2

OUT 3

Hardware: Fuller Adder

B
CI
A
CI
B
Full Adder
-
B
CI
20 Transistors
A
CI
A
-
B
CI
-
A
S
CI
Co
Hardware: Multipier


Example:

X0
X3
FA
U1A
7408
1
2
3
FA
X1
X3
FA
U1A
7408
1
2
3
Y1
X1
4x4 bit
array
multiplier
U1A
7408
1
2
3
X1
U1A
7408
1
2
3
X1
Y2
FA
FA
U1A
7408
1
2
3
U1A
7408
1
2
3
X0
Y3
Y0
X0
U1A
7408
1
2
3
Z0
HA
X0
Z1
U1A
7408
1
2
3
FA
X2
Z2
U1A
7408
1
2
3
FA
X2
Z3
Z4
U1A
7408
1
2
3
U1A
7408
1
2
3
HA
Z5
HA
X2
U1A
7408
1
2
3
Z6
HA
X2
FA
U1A
7408
1
2
3
U1A
7408
1
2
3
Z7
X3
U1A
7408
1
2
3
X3
U1A
7408
1
2
3
Simplified Multiplication


Example:
-

ignore



x
1
x
x
1
0
x
x
x
x
1
0
0
x
x
x
x
x
x
x
1
x
x
x
x
x
x
x
x
1
x
x
x
x
x
x
x
x
1
0
0
0
x
x
x
x
x
x
x
x
x
x
x
x
Output
Comparison

Convential Design
First version
Final version
Numbers of addition
56
28
32
Numbers of Multiplier
32
16
14
Speed of Multiplier (ns)
Array(16x16)
Split Array(16x16)
Wallace(16x16)
Modified Booth(16x16)
Our Design (12x8)
92.6
62.9
54.5
45.4
~23
Power (mW)
43.5
38
32
41.3
~22.5
VLSI: Full Adder (from Library)

VLSI: Multiplier

Multiplier Simulation

VLSI: 1 bit Transpose

VLSI: 8x8 Transpose

Transpose Simulation

VLSI: 1D DCT Part One

1 D DCT Part One Simulation

VLSI: 1D DCT Part Two

1 D DCT Part two Simulation

Java simulation


Java Code:


public void transform() {


g = new int[8][8];




for ( int i = 0; i < 8; i++ ) {



for ( int j = 0; j < 8; j++ ) {




double ge = 0.0;




for ( int x = 0; x < 8; x++ ) {





for ( int y = 0; y < 8; y++ ) {

double cg1 = (2.0*(double)x+1.0)*(double)i*Math.PI/16.0;

double cg2 = (2.0*(double)y+1.0)*(double)j*Math.PI/16.0;







ge += ((double)f[x][y]) * Math.cos(cg1) * Math.cos(cg2);







}




}










double ci = ((i==0)?1.0/Math.sqrt(2.0):1.0);




double cj = ((j==0)?1.0/Math.sqrt(2.0):1.0);




ge *= ci * cj * 0.25;




g[i][j] = (int)Math.round(ge);



}


}

}

Simulation Result





INPUT MATRIX:



























210
190
50
235
128
90
67
54
89
90
23
47
45
44
32
23
203
167
34
65
87
120
56
48
90
49
78
12
52
39
45
178
1
189
67
187
93
51
23
57
129
0
89
27
43
155
89
67
7
125
3
98
2
67
5
1
190
78
56
234
175
89
123
48
Simulation Result




Output matrix after DCT:































































110
120
49
19
13
83
101
41
37
51
63
33
110
27
18
165
19
30
23
6
9
63
6
24
5
65
9
97
75
57
11
143
45
7
7
41
35
50
3
17
37
73
83
63
76
64
76
86
2
14
44
16
16
69
67
31
115
54
2
68
66
4
127
663