Humans and Bots in Internet Chat

abdomendebonairΑσφάλεια

2 Νοε 2013 (πριν από 3 χρόνια και 10 μήνες)

117 εμφανίσεις

1

Measurement and Classification of
Humans and Bots in Internet Chat

By Steven Gianvecchio, Mengjun Xie,
Zhenyu Wu, and Haining Wang

College of William and Mary

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

2

Outline


Background


Measurement


Classification System


Experimental Evaluation


Conclusion


USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

3

Outline


Background


Measurement


Classification System


Experimental Evaluation


Conclusion

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

4

Bots


Bots
-

programs that automate human
tasks


web bots

automate browsing the web


chat bots

automate online chat


can be harmful and/or helpful

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

5

Chat Bots vs. BotNets


BotNets


networks of compromised
machines


some use chat systems (IRC) for C&C, others
use P2P, HTTP, etc.


abuse various systems


Chat Bots


automated chat programs


some are helpful, e.g., chat loggers


can abuse chat systems and their users

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

6

The Chat Bot Problem


The Problem


chat bots abuse chat
services (e.g., AOL, Yahoo!,
MSN
)


send spam


spread malicious software


mount phishing attacks


Our focus is on the Yahoo! chat system

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

7

A Typical Chat

Alice12

entered the room.


Alice12

entered the room.

Alice12:

Hi room.

Alice12

entered the room.

Alice12:

Hi room.

Bob34:

hi alice

Alice12

entered the room.

Alice12:

Hi room.

Bob34:

hi alice

Susie88:

any guys want to let a cute girl
move in with them! hehe

Alice12

entered the room.

Alice12:

Hi room.

Bob34:

hi alice

Susie88:

any guys want to let a cute girl
move in with them! hehe

Alice12:

What’s up?

Alice12

entered the room.

Alice12:

Hi room.

Bob34:

hi alice

Susie88:

any guys want to let a cute girl
move in with them! hehe

Alice12:

What’s up?

Bob34:

not much

Alice12

entered the room.

Alice12:

Hi room.

Bob34:

hi alice

Susie88:

any guys want to let a cute girl
move in with them! hehe

Alice12:

What’s up?

Bob34:

not much

Susie88:

can you guys see me on my web
-
cam??
(its in my profile)

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

8

Yahoo! Chat


Yahoo! chat is a large commercial chat
service


over 3,000 chat rooms

AUTH,

CHAT,

IM, …

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

9

Yahoo! Chat


Yahoo! chat system


client connects to a server


servers relay messages to/from clients

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

10

Outline


Background


Measurement


Classification System


Experimental Evaluation


Conclusion

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

11

Measurement


August
-
November 2007


we collect data


August 2007


Yahoo! adds CAPTCHA


must pass to join a chat room


protocol update, prevents some 3
rd

party
clients from accessing chat


October 2007


bots are back


some bots return before 3
rd

party clients

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

12

Measurement


September and October 2007


very few chat bots


August and November 2007


many chat bots


1,440 hours of chat logs


147 chat logs


21 chat rooms

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

13

Measurement


To create our dataset, we read and label
the chat users as


human, bot, or ambiguous


In total, we recognized 14 different types
of chat bots


different triggering mechanisms


different text generation techniques

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

14

Triggering Mechanisms


Timer
-
Based


periodic timers, e.g., 40 seconds


random timers, e.g., 45
-
125 seconds


Response
-
Based


responds to other users

Sam77:

Bob12, you’re just full of
questions, aren’t you?

Sam77:

Bob12, lots of evidence for
evolution can be found here http://

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

15

Text Generation


Character Padding

Fiona88:

anyone boredjn wanna chat?uklcss


Synonym Phrases

Marjorie99:

Hi Babes! Marjorie Here!
Inspect My Site

Marjorie99:

Mmmm Folks! Im Marjorie! View
My Webpage


Odd Line or Word Spacing


Message Replay

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

16

Types of Chat Bots


Periodic Bots



sends messages based on
periodic timers


Random Bots



sends messages based
on random timers


Responder Bots



responds to messages
of other users


Replay Bots



replays messages of other
users

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

17


Humans


inter
-
message delay


evidence of heavy tail


message size


well fit by Exponential
(
λ
=0.034)

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

18


Periodic Bots


inter
-
message delay


several clusters with
high probabilities


message size


messages built from
templates approximate a normal distribution

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

19


Random Bots


inter
-
message delay


Equilikely distribution
at 40, 64, and 88; Uniform distribution 45
-
125


message size


messages selected from a
small database

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

20


Responder Bots


inter
-
message delay


human
-
like timing


message size


multiple templates of different
lengths

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

21


Replay Bots


inter
-
message delay


cluster with high
probabilities (replay bots are periodic)


message size


human
-
like size, well fit by
Exponential (
λ
=0.028)

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

22

Outline


Background


Measurement


Classification System


Experimental Evaluation


Conclusion

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

23

Classification System


Entropy Classifier


detects abnormal behavior


based on message sizes and inter
-
message
delays


accurate but slow


Machine Learning Classifier


detects “learned” patterns


based on message content


fast but must be trained

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

24

24


Observation


chat bots are less complex
than humans, and thus, lower in entropy


exploits the low entropy of chat bots


Corrected Conditional Entropy Test (CCE)


estimates higher
-
order entropy


Entropy Test (EN)


estimates first
-
order entropy

Entropy Classifier

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

25

Machine Learning Classifier


Observation
-

chat spam like email spam
is a text classification problem


exploits message content of chat bots


CRM114


a powerful text classification system


several built
-
in classifiers: HMM,
KNN/Hyperspace, OSB, SVM, Winnow, etc.


we use OSB

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

26


Hybrid Classification System


entropy classifier

builds and maintains the bot
corpus


machine learning classifier

uses the bot and
human corpora

BOT
CORPUS

CLASSIFY AS
CHAT BOT

HUMAN
CORPUS

CLASSIFY AS
HUMAN

INPUT

ENTROPY
CLASSIFIER

MACHINE
LEARNING
CLASSIFIER

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

27

Outline


Background


Measurement


Classification System


Experimental Evaluation


Conclusion

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

28

Experimental Evaluation


Types of Chat Bots


Periodic Bots


Random Bots


Responder Bots


Replay Bots


Classifiers


entropy classifier



100 messages


machine learning classifier



25 messages

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

29

Experimental Evaluation


Classification Tests


Ent



entropy classifier


SupML



fully
-
supervised ML classifier,
trained on AUG BOTS


SupMLre



fully
-
supervised ML classifier,
retrained on NOV BOTS


EntML



entropy
-
trained ML

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

30

AUG BOTS

NOV BOTS

periodic

random

respond

periodic

random

replay

human

test

TP

TP

TP

TP

TP

TP

FP

EN(imd)

121/121

68/68

1/30

51/51

109/109

40/40

7/1713

CCE(imd)

121/121

49/68

4/30

51/51

109/109

40/40

11/1713

EN(ms)

92/121

7/68

8/30

46/51

34/109

0/40

7/1713

CCE(ms)

77/121

8/68

30/30

51/51

6/109

0/40

11/1713

OVERALL

121/121

68/68

30/30

51/51

109/109

40/40

17/1713


Entropy Classifier


EN


entropy


CCE


corrected conditional entropy


(imd)


inter
-
message delay


(ms)


message size

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

31

AUG BOTS

NOV BOTS

periodic

random

respond

periodic

random

replay

human

test

TP

TP

TP

TP

TP

TP

FP

EN(imd)

121/121

68/68

1/30

51/51

109/109

40/40

7/1713

CCE(imd)


121/121

49/68

4/30

51/51

109/109

40/40

11/1713

EN(ms)

92/121

7/68

8/30

46/51

34/109

0/40

7/1713

CCE(ms)

77/121

8/68

30/30

51/51

6/109

0/40

11/1713

OVERALL

121/121

68/68

30/30

51/51

109/109

40/40

17/1713


EN(imd) and CCE(imd)


problems against responder bots


detect most other chat bots

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

32

AUG BOTS

NOV BOTS

periodic

random

respond

periodic

random

replay

human

test

TP

TP

TP

TP

TP

TP

FP

EN(imd)

121/121

68/68

1/30

51/51

109/109

40/40

7/1713

CCE(imd)

121/121

49/68

4/30

51/51

109/109

40/40

11/1713

EN(ms)

92/121

7/68

8/30

46/51

34/109

0/40

7/1713

CCE(ms)


77/121

8/68

30/30

51/51

6/109

0/40

11/1713

OVERALL

121/121

68/68

30/30

51/51

109/109

40/40

17/1713


EN(ms) and CCE(ms)


problems against random and replay bots


detect most other chat bots

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

33

AUG BOTS

NOV BOTS

periodic

random

respond

periodic

random

replay

human

test

TP

TP

TP

TP

TP

TP

FP

EN(imd)

121/121

68/68

1/30

51/51

109/109

40/40

7/1713

CCE(imd)

121/121

49/68

4/30

51/51

109/109

40/40

11/1713

EN(ms)

92/121

7/68

8/30

46/51

34/109

0/40

7/1713

CCE(ms)

77/121

8/68

30/30

51/51

6/109

0/40

11/1713

OVERALL

121/121

68/68

30/30

51/51

109/109

40/40

17/1713


OVERALL


detects all chat bots


false positive rate is ~0.01


100 messages

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

34

AUG BOTS

NOV BOTS

periodic

random

respond

periodic

random

replay

human

test

TP

TP

TP

TP

TP

TP

FP

Ent

121/121

68/68

30/30

51/51

109/109

40/40

17/1713

SupML

121/121

68/68

30/30

14/51

104/109

1/40

0/1713

SupMLre

121/121

68/68

30/30

51/51

109/109

40/40

0/1713

EntML

121/121

68/68

30/30

51/51

109/109

40/40

1/1713


Entropy and Machine Learning Classifiers


Ent



entropy classifier (from last slide)


SupML



fully
-
supervised machine learning


SupMLre



SupML retrained


EntML



entropy
-
trained machine learning

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

35

AUG BOTS

NOV BOTS

periodic

random

respond

periodic

random

replay

human

Test

TP

TP

TP

TP

TP

TP

FP


Ent

121/121

68/68

30/30

51/51

109/109

40/40

17/1713

SupML

121/121

68/68

30/30

14/51

104/109

1/40

0/1713

SupMLre

121/121

68/68

30/30

51/51

109/109

40/40

0/1713

EntML

121/121

68/68

30/30

51/51

109/109

40/40

1/1713


Ent


OVERALL results from previous slide

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

36

AUG BOTS

NOV BOTS

periodic

random

respond

periodic

random

replay

human

test

TP

TP

TP

TP

TP

TP

FP

Ent

121/121

68/68

30/30

51/51

109/109

40/40

17/1713

SupML

121/121

68/68

30/30

14/51

104/109

1/40

0/1713

SupMLre

121/121

68/68

30/30

51/51

109/109

40/40

0/1713

EntML

121/121

68/68

30/30

51/51

109/109

40/40

1/1713




SupML


has problems against November bots


needs to be retrained for new bots


SupMLre


detects all bots

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

37

AUG BOTS

NOV BOTS

periodic

random

respond

periodic

random

replay

human

test

TP

TP

TP

TP

TP

TP

FP

Ent

121/121

68/68

30/30

51/51

109/109

40/40

17/1713

SupML

121/121

68/68

30/30

14/51

104/109

1/40

0/1713

SupMLre

121/121

68/68

30/30

51/51

109/109

40/40

0/1713

EntML

121/121

68/68

30/30

51/51

109/109

40/40

1/1713




EntML


false positive rate is ~0.0005


(
Ent

is ~0.01)


25 messages

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

38

Outline


Background


Measurement


Classification System


Experimental Evaluation


Conclusion

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

39

Conclusion


Measurements


overall, chat bots are less complex than
humans


some chat bots more human
-
like


Classification System


exploits benefits of both classifiers


quickly classifies known chat bots


accurately classifies unknown chat bots

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

40

Conclusion (cont.)


Future Work


investigate more advanced chat bots


explore applications of entropy on other forms
of bots (e.g., web bots)


explore other applications of entropy (e.g.,
detecting covert timing channels)

USENIX Security 2008
Measurement and Classification of Humans and Bots in Internet Chat

41

Questions?

Thank You!