From Wireless Communication to DNA Sequencing - Electrical ...

klapdorothypondMobile - Wireless

Nov 23, 2013 (3 years and 9 months ago)

92 views



Information Theory:

From Wireless Communication

to DNA Sequencing





David
Tse


Dept. of EECS

U.C. Berkeley


Gilbreth

Lecture




TexPoint fonts used in EMF:
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A

Information in an Information Age


Some fundamental questions:



How to
quantify

information?



How fast can information be
communicated
?



How much information is needed for an
inference

task?



Information Theory






c
h
a
n
n
e
l
c
a
p
a
c
i
t
y
C
b
i
t
s
/
s
e
c
s
o
u
r
c
e
e
n
t
r
o
p
y
r
a
t
e
H
b
i
t
s
/
s
o
u
r
c
e
s
y
m
Shannon 48


Theorem:

m
a
x
.
r
a
t
e
o
f
r
e
l
i
a
b
l
e
c
o
m
m
u
n
i
c
a
t
i
o
n
=
C
H
s
o
u
r
c
e
s
y
m
/
s
e
c
.
Given
statistical models
for source and channel:

A unified way of looking at
all

communication problems.

source

sequence

Two stories



Wireless communication



High
-
throughput DNA sequencing


(a gigantic jigsaw puzzle)


Wireless Communication



Explosive increase in penetration and data rate:


~ 0 mobile phones in mid 90’s


~ 6 billions now




low
-
rate voice


high
-
rate data



Powering this increase is one of the biggest
engineering feats in human history.



Advances in
physical layer communication
techniques play a key role.



Led to
10 to 15
-
fold
increase in spectral efficiency
from 2 G to 4 G.



How do these advances come about?


Wireless communication has
been around since 1900’s.


Ingenious system design
techniques…….


but somewhat adhoc

Claude Shannon

Gugliemo Marconi


Information theory says
every channel has a
capacity.



Provides
a systematic
view
of the communication
problem.

New points of views arise.

1901

1948

Engineering meets science.


Multipath Fading

Classical view: fading channels are
unreliable





line
-
of
-
sight is best.



16dB

Traditional Approach to

Wireless System Design


















Compensates

for deep fades via
diversity techniques

over time, frequency and space.

fading channel

line
-
of
-
sight like channel


Opportunistic Communication










Information theory says:



to achieve capacity, transmit
opportunistically.






(Goldsmith &
Varaiya

96)



Multipath fading provides high
peaks

to exploit.


Multiuser Opportunistic
Communication













line
-
of
-
sight

fading



Optimal strategy transmits to the best user at each time.




With large number of users, there is always a user at the peak.


Knopp

&
Humblet

95


Tse

97

capacity


(bits/s/Hz)

number

of users

From Theory to Practice


An opportunistic scheduler was implemented for
Qualcomm’s EVDO system.
(
Tse

99)



Opportunistic while being fair and sensitive to delay.



Now used in all 3G and 4G systems. (1.6 B devices)

Lesson Learnt


Fading should be exploited rather than avoided.



Another example: MIMO (multiple antenna
communication).

12

MIMO

capacity


(bits/s/Hz)

Foschini 98

Telatar

99

line
-
of
-
sight

fading

Why?

number

of antennas per device

Power versus Dimensions


Line
-
of
-
sight allows more
power

transfer via
beamforming
.

Multipaths

provides more
signal dimensions
for spatial
multiplexing.

Information theory: more dimensions is better than more
power.




From Theory to Practice


MIMO theory established in late 90’s and early 00’s.



MIMO implemented in past few years in 802.11n
and 4G cellular.

Part 2: DNA Sequencing

DNA sequencing

Process of obtaining the sequence of nucleotides.


A basic workhorse of modern biology and medicine.

…ACGTGACTGAGGACCGTG

CGACTGAGACTGACTGGGT

CTAGCTAGACTACGTTTTA

TATATATATACGTCGTCGT

ACTGATGACTAGATTACAG

ACTGATTTAGATACCTGAC

TGATTTTAAAAAAATATT…

Impetus: Human
Genome Project

1990
:

Start

2001
:

Draft

2003
:

Finished

3 billion
basepairs

Sequencing Gets Cheaper and Faster

Cost of one human genome




HGP:

$ 3 billion


2004:

$30,000,000


2008:

$
100,000


2010:

$10,000


2011:

$4,000


2012
-
13:

$
1,000


???:


$
3
00


Time to sequence one genome: years/months


hours


Massive parallelization.

But many genomes to sequence

100 million
species

(e.g.
p
hylogeny)

7 billion individuals

(SNP, personal genomics)

10
13

cells in a human

(e.g. somatic mutations

s
uch as HIV, cancer)

Whole Genome Shotgun Sequencing

Reads are
assembled

to reconstruct the original DNA sequence.

A Gigantic Jigsaw Puzzle


Computation versus Information View



Many proposed assembly algorithms.



But what is the minimum number of reads required
for reliable reconstruction?



How much intrinsic
information

does each read
provide about the DNA sequence?


Communication and Sequencing:


An Analogy

Communication:

Sequencing:

Question: what is the
max. sequencing rate
such that
reliable reconstruction is possible?

source

sequence

S
1
;
S
2
;
:
:
:
;
S
G
R
1
;
R
2
;
:
:
:
;
R
N
m
a
x
.
c
o
m
m
u
n
i
c
a
t
i
o
n
r
a
t
e
=
C
c
h
a
n
n
e
l
H
s
o
u
r
c
e
s
o
u
r
c
e
s
y
m
/
s
e
c
.
s
e
q
u
e
n
c
i
n
g
r
a
t
e
G
N
D
N
A
s
y
m
/
r
e
a
d
Motahari
,
Bresler

&
Tse

12

Result: Sequencing Capacity


H
2
(

p) is (
Renyi
)
entropy rate


of the DNA sequence .



The higher the entropy,


the easier the problem!




C
=
0
C
=
¹
L
Complexity is in the eyes of the beholder


Low entropy

High entropy

Conclusion



Information theory has made a huge impact on
wireless communication.



It provides new points of view.



Its success stems from focusing on something
fundamental:
information
.



This philosophy is useful for other important
engineering problems.