The BitTorrent Protocol

vetinnocentΛογισμικό & κατασκευή λογ/κού

7 Νοε 2013 (πριν από 3 χρόνια και 5 μήνες)

79 εμφανίσεις

The Bit
T
orrent Protocol



Cody Sand

(sandco@uwplatt.edu)

Department of Computer Science and

Software Engineering

University of Wisconsin
-
Platteville



Abstract


Since its creation

in 2001, Bram Cohen’s BitT
orrent file
-
sharing protocol has spread
like
wild
-
fire across the interne
t, becoming the most popular peer
-
to
-
peer (P2P) transfer
protocol on the internet, with usage statistics exceeding one hundred million
users
worldwide
.


Because of BitT
orrent

s ability to serve content of relatively large
and

po
pular files at higher speeds than that of a traditional file transfer, it has become a
standard online for storing such files
.

The protocol relies on strict messaging between
peers to achieve this
e
ffect, and this paper will describe the protocol in a rela
tively low
-
level manner
.


Introduction


BitTorrent

is a peer
-
to
-
peer (P2P) file
-
sharing protocol developed by computer
programmer Bram Cohen in April 2001
.


BitTorrent was designed to “facilitate file
transfers among multiple peers across unreliable networks
.


While
BitTorrent

is not the
newest peer
-
to
-
peer file sharing protocol,
it is a protocol that differs greatly from its
alternatives
.

The idea behind B
itTorrent is taking the bandwidth expense of serving a file
from a typical web
-
server, and put
ting

the distribution load on a network of users, called
a swarm
.

Unlike many other peer
-
to
-
peer technologies, BitTorrent utilizes a centralized
discovery of pee
rs, and a decentralized distribution between those peers
.


BitTorrent not only encourages, but forces users to upload pieces of the file while
downloading
.

The ‘tit
-
for
-
tat’ nature of the protocol is what has allowed it to become the
world’s most widely
used file
-
sharing protocol, with usage statistics showing that over
150 million people worldwide are using the protocol
.

BitTorrent allows for improved
transfer speeds,

which explains why companies are beginning to implement it into their
products
.

Compa
nies like Blizzard Entertainment, creators of the wildly popul
ar ‘World
of Warcraft’, and Valv
e, creators of games such as ‘Half Life’ and ‘Counter Strike’, have
made
use
of the BitTorrent protocol to allow for faster transfer rates of files, and
decreased

bandwidth usage, resulting in significant savings in cost
.


Bram Cohen released the first implementation of the protocol, written in the Python
programming language, in July 2001
.

Since then, the number of active users has grown
exponentially
.

Today, es
timates show that BitTorrent may account for up to 33% of all
internet traffic, and despite the multitude of copyright
-
infringing files

available via
BitTorrent

and the legal battles faced by those serving those files
, its growth shows no
signs of slowing
down
.

BitTorrent combines the best features of previous peer
-
to
-
peer
protocols to create a very efficient technology that
allows content distributors to save
bandwidth, and users to achieve the fastest download rates possible
[3]
.



Protocol

Overview


BitTorrent works by taking one or more files and dividing them into small pieces, which
typically range anywhere from 64 kilobytes to 1 megabyte in size, depending on the size
of the file
.

The user creating the BitTorrent file, called a torrent, creates a

checksum for
each piece using the SHA
-
1 hashing algorithm
.

This is often done automatically by a
piece of software called a BitTorrent client
.

These checksums are inserted into a torrent
file, which typically has a file extension of
.
torrent
.

When a use
r downloads a torrent
from the internet, the completed pieces are checked against these checksums to ensure
file integrity
.

The main idea of BitTorrent is that these pieces can be uploaded and
downloaded between a large number of peers, which allows for de
creased bandwidth
usage by the original content provider, and increases download speeds for downloads
.


This is a considerable advantage over simply hosting a file using
the standard HyperText
Transfer Protocol (HTTP)
, where all upload cost is placed on
the hosting machine
.


The
protocol currently runs on the TCP protocol, though founder Bram Cohen’s company
BitTorrent, Inc is currently working on a UDP implementation
.

There are several
components required for a working BitTorrent distribution system, wh
ich are explained
in the following sections
[2]
.



Tracker


The first, and most important,
component
of
the
BitTorrent
protocol

is the BitTorrent
tracker
.

A tracker is a piece of software that runs on top of a typically HTTP web
-
server
and its main purpose is to assist in the communication between peers using the
protocol
.

All clients are required to communicate with the tracker in order to initiate any BitTorrent
download
.

Clients that have already connected and are in the process of d
ownloading a
file will periodically communicate with the tracker to provide statistics and connect to
additional peers that have connected
.


The tracker is responsible for maintaining a list of
current peers currently downloading or serving a file, so that

when new users join the
swarm of peers, they know which other peers
are interested in sending or receiving
pieces
[2]
.



Users communicate with the tracker using standard HTTP Get requests
.

The tracker takes
the HTTP Get request and uses distributed hash tables (DHTs) to identify the torrent in
the trackers indexed list of torrents
.

The HTTP Get request contains nine parameters,
which the tracker uses to process the request
.

These parameter
s include the following:


info_hash


20
-
byte SHA
-
1 hash used to identify the specific torrent in the index
.

peer_id

20
-
byte character string generated by the BitTorrent client to identify the user and
the client software they’re running
.


ip


the IP addre
ss of the user
.

port


the port number the user’s BitTorrent client software is listening on
.

uploaded


the total amount of data the user has uploaded for a particular torrent
.

downloaded


the

total amount of data the user has downloaded for a particular torrent
.

left



the total number of bytes remaining until the download is complete
.

numwant

an optional field specifying the number of peers the client wishes to attempt to
connect with
.

If lef
t unspecified, most BitTorrent client software will default to 50
peers
.

event

the current state of the torrent
.

A torrent may assume one of following three
states:



Started

The torrent is active and may use bandwidth to send/receive pieces
.

Stopped

The
torrent is inactive
.

Completed


The torrent is completed and only outbound traffic is allowed
.


When assembled, a typical HTTP Get tracker request
appears as follows:


http://some
.
tracker
.
com:999/announce

?info_hash=12345678901234567890

&peer_id=UTABCXYZ00
0012121324

&ip=174
.
250
.
2
.
22

&port=6888

&downloaded=1234

&left=98765

&event=stopped


The tracker replies with a standard HTTP response
.

If the HTTP Get request was a
failure, then the tracker will respond with an error message
.

This error message conta
ins
both a three digit error code (much like a 404
-
Not Found HTTP error) and a human
-
readable error message
.

If the HTTP Get request was a success,
the tracker will reply
with a list of peers connected in the swarm for this torrent
.

This list is contained

in
dictionary format containing the following two keys
[2]
:


interval


number of seconds the client should wait before making another request

peers

a list of dictionaries containing information on the connected peers
.

These fields
include the peer_id, IP
address, and the port number being listened on for each
remote peer in the list
.



The client software can then use the peer list to begin establishing connections to other
peers so that the process of exchanging pieces between peers can begin
[1]
.





.
tor
rent File


When a user wishes to download a file using the BitTorrent protocol, he must first obtain
a static metainfo file containing information about the file they wish to download
.

This
file contains a BEncoded dictionary storing several
BEncoded
fields that the BitTorrent
client software uses to identify the torrent and its properties
.




BE
ncoding


BEncoding is a format for loosely structured data
.

It allows for simple plain
-
text
organization of data that allows for simple parsing
.

BEncoding supp
orts four data types,
as described below
.


Byte String

A sequence of bytes
.

These bytes are not necessarily, but often are, characters
.

Their format is <length>:<content>, where
<
length
>

is the number of bytes in the
string, and
<
content
>

is the definition of those bytes
.

For example,
9:telephone
and 4:size are BEncoded byte strings
.

Integer

A simple integer
.

BEncoded integers use the format i<number>e, where i is the
start symbol, <number>

is the integer to be stored, and e is the terminator
.

For
example, the numbers 42 and 365 would be stored as plaintext as ‘i42e’ and
‘i365e’
.


List

A BEncoded list is a simple array containing a list of other BEncoded values
.

A
list is formatted as l<con
tents>e, where l is the start symbol
, <contents> is the list
of values to be contained within the list, and e is, again, the terminator
.

For
example, a list containing the word ‘hello’, the number 66, and the word
‘goodbye’ would be BEncoded as l5:helloi66
e7:goodbyee
.


Dictionary

A BEncoded dictionary is an associative array containing other BEncoded values
.

A dictionary is formatted as d<contents>e, where d is the start symbol,
<contents> is a list of alternating keys and values
.

For example, a BEncoded
dictionary pairing the words “test” and “reset” to the integers 2 and 3, respectively
would be d4:testi2e5:reseti3ee
.


Fields


The
.
torrent file itself contains a number of BEncoded fields, which are used by the client
software to
connect to the trackers and verify the integrity of the pieces being
downloaded
.

Required

fields include the following

dictionary keys
:


announce



the URL of the tracker
.

info

another dictionary containing information about the pieces of the torrent
.

The
content of this dictionary depends on the number of files contained within the
torrent
.

For torrents containing only one file, the dictionary includes the
following keys:


length


the size of the file in bytes

name


a string containing the filename

piece_length


the number of bytes in each piece of the torrent

pieces

a string value containing concatenated 20
-
byte SHA
-
1 hashes for each
piece of the file
.

This is used for data integrity checks
.


For torrents containing more than one file, the
info

dictionary contains the
following fields:


files


a dictionary containing the following file information fields:




length



the size of the file


path



path to the file, including the filename

name


the name of the top
-
most directory

piece_length


numbe
r of bytes in each piece



pieces

a string value containing concatenated 20
-
byte SHA
-
1 hashes for each
piece of the file
.

This is used for data integrity checks
.


The
.
torrent file may also contain any other fields stored as dictionary keys
.

If a
particular BitTorrent client does not support the field being specified, it may simply
ignore the field
.

The most common optional fields include the following

[1]
:


announce
-
list

a list of alternative tracker URLs
.

Some torrents make use of this option to
specify a number of backup trackers for utilization in the event the main tracker
goes down
.

comment

any additional comments the torrent author wishes to include
.

createdby



field to specify information about
the torrent’s author
.

creationdate


the date the
.
torrent file was created
.



Peer
-
To
-
Peer Connection


A BitTorrent Peer
-
To
-
Peer connection is much like a typical TCP connection, but with
the addition of two client states. Each peer
-
to
-
peer connection contains two bits of state
information on either end: choked or not, and interested or not. Choking is a
notification
from one peer to a remote peer to inform the remote peer that no data will be sent to them
until they send an unchoked message. Interested is a state indicating that the peer is
interested in receiving data. A peer that is uninterested is ty
pically one that has either
paused or completed the torrent, meaning that it does not want data being transferred.
Each of the clients must maintain the state information for each connection it has with a
remote peer. A block of data is downloaded by a c
lient whe
n one client is interested and
the other is unchoked.


Each peer
-
to
-
peer connection is established using a handshake, like any other TCP direct
connection, and further communication is sent via peer wire messages, which are
messages of variable l
ength sent between peers. These peer connections are symmetrical,
and data is allowed to flow in either direction.
.

Handshake


Each peer
-
to
-
peer connection established between two peers in a swarm must begin with
a required handshake message. This hand
shake, in the BitTorrent Specification version
1.0, is a 68
-
byte message exchanged between peers. This handshake message contains
the following fields:


pstrlength

length of
pstr
, as a single raw byte.

In version 1.0 of the BitTorrent specification,
this is always defined as 19.

pstr

string identifier for the protocol.


In version 1.0 of the BitTorrent specification,
this is always defined as “BitTorrent protocol”

reserved

an 8
-
b
yte reserved area. All standard
BitTorrent implementations use eight all
-
zero bytes for this field. These fields allow for bit
-
flags to chan
ge the behavior of
the protocol
.

info_hash


a 20
-
byte SHA
-
1 hash of the
info
field in the .torrent file.

peer_id

a

20
-
byte string used as a unique identifier for the client. This is the same
peer_id

used in tracker requests.


The initiator of the peer
-
to
-
peer connection is expected to transmit their handshake
message immediately. In the event that a peer tri
es to co
nnect with another peer

and the
info_hash

specified in the handshake message does not correspond to a torrent th
at the
remote peer is

currently serving, the remote peer is required to immediately drop the
connection, preventing any further data exchange be
tween those two clients.

However, it
is important to remember that the peer probably received this information from the
tracker, and that this event is rare
[1]
.



Peer Wire Messaging


The communication between two peers in a BitTorrent swarm is called ‘
peer wire
messaging’.

These messages vary in length and offer a variety of functionality to allow
for structured communication between two peers.



Each peer wire message has a format of <length><id><payload>. The list of currently
supported messages i
s as follows:



keep
-
alive

a message to merely maintain the TCP connection between the two peers. This
message has a length of

0
-
bytes

and has no id or payload.

choke

informs

the remote peer that they are being choked and no data will be sent. This
message has a length of 1
-
byte
, an id of 0, and has no payload.

unchoke

informs the remote peer that they are no longer being choked and may request
data if interested. This messag
e has a length of 1
-
byte
, an id of 1, and no payload.

interested

informs the remote client that the local client is interested in receiving data. This
message has a length of 1
-
byte
, an id of 2, and no payload.

un
interested

informs

the remote client that the local client is
not
interested in receiving data.
This message has a length of 1
-
byte
, an id of
3
, and no payload.

have

informs

a remote client that the local client has a particular piece of the torrent.
This message has a length of 5
-
bytes
, an id of 4, and the payload is a zero
-
based
index of a piece that has just been successfully downloaded by the local client.

bitfield

sen
ds a string of bits representing which pieces of the file the local client has
successfully downloaded. Using this information, the remote client may request
specific pieces of data that it knows the local client has. This message has a
length of 1+(lengt
h of the bitfield)

bytes
, an id of 5, and the payload is the bitfield
itself.

request

a formal request for a specific piece
(s)

of
the torrent. This message has a length
of 13
-
bytes, an id of 6, and the payload is a set of three integers, specifying the
ze
ro
-
based index of the piece, the zero
-
based byte offset within the piece, and the
number of bytes being requested.

piece

actual transfer of a piece of the torrent. This message has a length of 9 + (size of
piece) bytes,
the id is 7, and the payload is two
integers specifying the index and
byte offset of the piece, and the block of data being sent.

cancel

cancels a piece request. This message has a length of 13
-
bytes, an id of 8, and the
payload is identical to that of the request message, a set of three int
egers,
specifying the zero
-
based index of the piece, the zero
-
based byte offset within the
piece, and the number of bytes being requested
[1]
.



Connection Process


The process of a typical BitTorrent download begins with a user who desires to download
a particular file that someone has made available via the BitTorrent protocol. The user
first obtains the .torrent file from a website serving
torrent files. Using a
BitTorrent
client, the user opens the .torrent file. The client software then communicates with the
tracker, who in return sends a list of the peers currently serving that file. Each peer
periodically checks in with the tracker to receive updated lists o
f peers, so that newer
peers may establish connections with older peers.


With the list of peers, the new peer can begin establishing connections to other peers.
Before any messages are sent between peers, the state between peers is both choked and
unint
erested. Through the exchange of the peer wire messages, as soon as a connection
between two clients is unchoked and interested, data transfer between the two clients is
permitted.


Data transfer continues between all of the peers in the swarm until all of those peers have
obtained all of the files in the torrent.



Future Considerations


Bram Cohen’s company, BitTorrent, Inc, is now in charge of future implementations of
the protoc
ol. The company is currently working on other technologies based on the
protocol, including a UDP implementation and a streaming video protocol.

The
company’s main focus is finding legitimate uses for the protocol.


The BitTorrent UDP, called the Micro T
ransport Protocol,
is a technology that
implements the same peer wire messaging as the TCP implementation, but runs without
congestion control, which some believe will result in higher transfer speeds.


BitTorrent, Inc is also working on a streaming video
protocol that uses an algorithmic
piece request selection to ensure that the pieces are transferred in order and as fast as
possible, allowing for a smooth viewing of a video.


Since its creation in 2001, the BitTorrent protocol has been exponentially expanding its
presence online and is showing no signs of stopping, even despite the wide array of
illegal, copyright infringing files being made available via BitTorrent and the law
suits
resulting
[3]
.



References


[1]

"BitTorrentSpecification
-

TheoryOrg."
TheoryOrg
. N.p., n.d. Web. 2 Nov. 2009.
<http://wiki.theory.org/BitTorrentSpecification>.


[2]

Cohen, Bram. "BitTorrent Protocol Specification."
BitTorrent.org
. N.p., n.d. Web. 2

Nov.
2009. <http://www.bittorrent.org/beps/bep_0003.html>.


[3]

Cohen, Bram. "Incentives Build Robustness in BitTorrent."
N.p. n.d.