ConceptDoppler : A Weather Tracker for Internet Censorship Authors

townripeΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 9 μήνες)

617 εμφανίσεις


ConceptDoppler :


A Weather Tracker for Internet censorship

Presenter :






Hanyang Univ. Computer Security Lab.

P a p e r

I n f o r ma t i o n

Title :


ConceptDoppler :


A Weather Tracker for Internet Censorship


Authors :


Jedidiah R. Crandall, Daniel Zinn, Michael Byrd



Publish :


ACM 2007

Hanyang Univ. Computer Security Lab.

C o n t e n t

1. INTRODUCTION

3. LSA
-
BASED THE PROBING

2. PROBING THE GFC

4. FUTURE WORK

5. CONCULSION

Hanyang Univ. Computer Security Lab.


Called the

Great Firewall of
China,


or

Golden Shield



IP address blocking


DNS redirection


Legal restrictions


etc



Keyword filtering


Blog servers, chat, HTTP traffic

All probing can be performed
from outside of China

1. Introduction(1/3)



Internet Censorship in China

Hanyang Univ. Computer Security Lab.


Where is the keyword filtering implemented?


Internet measurement techniques to locate the
filtering routers




What words are being censored?


Efficient probing via document summary techniques

1. Introduction(2/3)



This Research has Two Parts

Hanyang Univ. Computer Security Lab.



Keyword
-
based Censorship




The ability to filter keywords is an effective tool for governments that censor


the Internet
.



-

Numerous techniques comprise censorship, including IP address blocking,


DNS redirection, and a myriad of legal restictions, but the ability to filter keywords


in URL requests or HTML responses allows a high granularity of control that achieves


the censor’s goal with low cost.


(


Manually filtering web content can also be precise but is prohibitively expensive.)

.



Censorship is an economic activity.


-

The Internet has economic benefits and more blunt methods of censorship than


keyword filtering, such as blocking entire web sites or services, decrease those benefits


ex) while the Chinese government has shut down e
-
mail service for entire ISPs,


temporarily blocked Internet traffic from overseas universities, and could


conceivably stop any flow of information, they have also been responsive to


complaints about censorship from Chinese citizens.

1. Introduction(3/3)

Hanyang Univ. Computer Security Lab.

2
.

P r o b i n g

T h e

G F C (
1
/
5
)



ConceptDoppler

s Infrastructure



They use the netfilter module Queue to capture all packets elicited by probes.



They access these packets in Perl and Python scripts,


using SWIG to wrap the system library libipq.



They recorded all packets sent and received, in their entirety, in a PostgreSQL database.



They experiments require the construction of TCP/IP packets.



For this they used Scapy, a python library for packet manipulation.

Hanyang Univ. Computer Security Lab.

2
.

P r o b i n g

T h e

G F C (
2
/
5
)



The GFC does not Filter peremptorily at All Time



Target

: They launched probes against www.yahoo.cn for 72 hours.



Method


-

They started by sending “FALUN” (a known filtered keyword) until they received


RSTs from the GFC at which point they switched to “TEST” (a word known to not be


filtered) until they got a valid HTTP response to our GET request.


-

After each test that provoked a RST, They waited for 30 seconds before probing with


“TEST”; after tests that did not trigger RSTs, they waited for 5 seconds, then probed with


“FALUN”.

Slipping Filtered Keywords Through

Hanyang Univ. Computer Security Lab.

2
.

P r o b i n g

T h e

G F C (
3
/
5
)



Filtering Statistics From 00:00 to 24:00



The x
-
axis is the time of day and the y
-
axis is measured in individual probes.



What is most important to notice in Figure is that there are
diurnal patterns
, with the GFC


filtering becoming less effective
sometimes more than one fourth of offending packets


through, possibly
during busy Internet traffic periods.


(A value of 0 on the x
-
axis of Figure corresponds to midnight 00:00 Pacific Standard Time


which is 3 in the afternoon 15:00 in Beijing.)

Hanyang Univ. Computer Security Lab.

2
.

P r o b i n g

T h e

G F C (
4
/
5
)



Discovering GFC Routers

<Figuer : GFC router discover using TTLs>



The goal of this experiment


To identify the IP address of the first GFC


router between our probing site s and t,


a target web site within China, as shown in


Figure.



The general idea of the experiment


To increase the TTL field of the packets


They send out, starting from low values


corresponding to routers outside of China.



To identify GFC routers,



Algorithm 1 randomly selects a target IP


address from T, the list of targets compiled


above.

Hanyang Univ. Computer Security Lab.

2
.

P r o b i n g

T h e

G F C (
5
/
5
)

<ISP Distribution of First Hops>

<Filtering by hop within China>



Filtering does not always, or even principally, occur at the first hop into China’s


address space, with
only 29.6% of filtering occurring at the first hop

and 11.8%


occurring beyond the third, with as many as 13 hops in one case; and



Routers within CHINANET
-
* perform 83.3% of all filtering.




GFC ≠ Firewall

Hanyang Univ. Computer Security Lab.

3. LSA
-
Based Probing(1/4)



Discovering Blacklisted Keywords Using LSA



To test for new filtered keywords efficiently, They must try only words that are


related to concepts that they suspect the government might filter.



Latent semantic analysis(LSA) is a way to summarize the semantics of a corpus


of text conceptually.



Reason of Using LSA



They encoded the terms with UTF
-
8 HTTP encoding and tested each against


search.yahoo.cn.com
, waiting 100 seconds after a RST and 5 seconds otherwise.



A RST packet indicates that a word was filtered and is therefore on the blacklist.


Then by manual filtering they removed 56 false positives from the final filtered


keyword list.

Hanyang Univ. Computer Security Lab.

LSA Background(1/2)



What is LSA?



Latent semantic analysis



Word
-
document model describes the occurrences of terms in documents




LSA



Word
-
document matrix W







X

=


d
1
d
2
......................... d
j
.......... d
N

w
1
w
2


w
i


w
M

w
ij

w
ij
:
weight(importance)

tf
ij

:
j
-
th terms’s count in i
-
th documents

df
j

:
i
-
th document’s count in j
-
th term’s

Hanyang Univ. Computer Security Lab.


T
o

: orthogonal, unit
-
length columns


D
o

: orthogonal, unit
-
length columns


S
o

: Diagonal Matrix


t : Matrix X’s terms


d : Matirx X’s documents


m : Matix X’s rank (< min(t,d))


T : t
×

k


S : k
×

k


D’ : k
×

d

LSA Background(2/2)

Example

Example

Hanyang Univ. Computer Security Lab.



Start With a Large Corpus


(Wikipedia of Chinese
-
lang)

3. LSA
-
Based Probing(2/4)



LSA of Chinese Wikipedia


n=94,863

documents and


m=942,033

terms

Hanyang Univ. Computer Security Lab.

3. LSA
-
Based Probing(4/4)



LSA Results



In total, they
discovered 122 unknown keywords
.

Hanyang Univ. Computer Security Lab.

4. Future Work



Discovering Unknown Keywords

1.
Applying LSA to larger Chinese corpuses

2.
Keeping the corpus up
-
to
-
date on current events

3.
Technical implementation

4.
Implementation possibilities

5.
HTML responses

6.
More complex rulesets

7.
Imprecise filtering(ex : breasts, Cancer
-
breasts)



Internet Measurement

1.
IP tunneling or traffic engineering.

2.
IXPs Technical implementation.

3.
Route dependency.

4.
HTML responses.

5.
Destination dependency.

Hanyang Univ. Computer Security Lab.

5. Conclusions


GFC keyword filtering is more a panopticon than a firewall


motivating surveillance rather than evasion


as a focus of technical research.




GFC ≠ Firewall, GFC ≈ Panopticon



Probing the GFC is arduous motivating efficient probing
via LSA

Hanyang Univ. Computer Security Lab.

Thank you very much !!!