BitTorrent is a peer-to-peer file sharing protocol used for distributing large amounts of data. BitTorrent is one of the most common protocols for transferring large files, and by some estimates it accounts for about 35% of all traffic on the entire Internet.
The protocol works initially when a file provider makes his file (or group of files) available to the network. This is called a seed and allows others, named peers, to connect and download the file. Each peer that downloads a part of the data makes it available to other peers to download. After the file is successfully downloaded by a peer, many continue to make the data available, becoming additional seeds. This distributed nature of BitTorrent leads to a viral spreading of a file throughout peers. As more seeds get added, the likelihood of a successful connection increases exponentially. Relative to standard Internet hosting, this provides a significant reduction in the original distributor's hardware and bandwidth resource costs. It also provides redundancy against system problems and reduces dependence on the original distributor.
Programmer Bram Cohen designed the protocol in April 2001 and released a first implementation on July 2, 2001. It is now maintained by Cohen's company BitTorrent, Inc. There are numerous BitTorrent clients available for a variety of computing platforms. According to isoHunt, the total amount of shared content is currently more than 1.1 petabytes.
A BitTorrent client is any program that implements the BitTorrent protocol. Each client is capable of preparing, requesting, and transmitting any type of computer file over a network, using the protocol. A peer is any computer running an instance of a client.
To share a file or group of files, a peer first creates a small file called a "torrent" (e.g. MyFile.torrent). This file contains metadata about the files to be shared and about the tracker, the computer that coordinates the file distribution. Peers that want to download the file must first obtain a torrent file for it, and connect to the specified tracker, which tells them from which other peers to download the pieces of the file.
Though both ultimately transfer files over a network, a BitTorrent download differs from a classic full-file HTTP request in several fundamental ways:
* BitTorrent makes many small data requests over different TCP sockets, while web browsers typically make a single HTTP GET request over a single TCP socket.
* BitTorrent downloads in a random or in a "rarest-first" approach that ensures high availability, while HTTP downloads in a sequential manner.
Taken together, these differences allow BitTorrent to achieve much lower cost to the content provider, much higher redundancy, and much greater resistance to abuse or to "flash crowds" than a regular HTTP server. However, this protection comes at a cost: downloads can take time to rise to full speed because it may take time for enough peer connections to be established, and it takes time for a node to receive sufficient data to become an effective uploader. As such, a typical BitTorrent download will gradually rise to very high speeds, and then slowly fall back down toward the end of the download. This contrasts with an HTTP server that, while more vulnerable to overload and abuse, rises to full speed very quickly and maintains this speed throughout.
In general, BitTorrent's non-contiguous download methods have prevented it from supporting "progressive downloads" or "streaming playback". But comments made by Bram Cohen in January 2007 suggest that streaming torrent downloads will soon be commonplace and ad supported streaming appears to be the result of those comments.
[edit] Creating and publishing torrents
The peer distributing a data file treats the file as a number of identically sized pieces, typically between 64 KB and 4 MB each. The peer creates a checksum for each piece, using the SHA1 hashing algorithm, and records it in the torrent file. Pieces with sizes greater than 512 KB will reduce the size of a torrent file for a very large payload, but is claimed to reduce the efficiency of the protocol. When another peer later receives a particular piece, the checksum of the piece is compared to the recorded checksum to test that the piece is error-free. Peers that provide a complete file are called seeders, and the peer providing the initial copy is called the initial seeder.
The exact information contained in the torrent file depends on the version of the BitTorrent protocol. By convention, the name of a torrent file has the suffix .torrent. Torrent files have an "announce" section, which specifies the URL of the tracker, and an "info" section, containing (suggested) names for the files, their lengths, the piece length used, and a SHA-1 hash code for each piece, all of which are used by clients to verify the integrity of the data they receive.
Torrent files are typically published on websites or elsewher