Monday, May 7, 2012

How bittorrent works

I guess one reason I wanted to write this blog was to give some folks the chance to step inside my universe for a minute.  This especially applies to anyone who is not someone who has a deep and abiding interest in technology, is technically inclined, and loves computers.  The non-geeks, if you will.  Also, geeks who have come up in this world and, I think, lack the perspective of a slightly older generation on the technology that is shaping our new world.

In other words, young geeks who's minds have been poisoned growing to maturity listening to self-serving corporate media propaganda.  But I'm not a radical blogger, so I won't say that.

Today I want to talk about torrents. In order to talk about torrents in the way I want to do, it is essential that you understand what they are.  Surprisingly, I have been unable to find a really tidy explanation on the internet (for example the incomprehensible diagram on the Wikipedia page I linked to), so I'm going to do it myself.

Bittorrent is a method for sharing files over the internet.  It has some pretty amazing strengths and weaknesses.  First, let's understand what a "file" is.  A file is a string of bytes, or "letters".  The letters represent information, such as text, audio, movies, ebooks, etc.  As such, you can accurately think of a file as a sequence of "chunks" of data, each of which is unique:


Sorry for the rough quality--it's surprisingly hard to find good drawing tools.  After after all, I'm not getting paid for this sh*t.

For traditional file transfers, such as when you download something from the web (indeed, when you look at a web page), you have a "server" with multiple "clients which all download the entire file directly from the "server":
Bytes streaming in order from a traditional server to its clients.

This has the advantage, for example, of quick response time--assuming the server has adequate bandwidth and internal resources to answer all requests.  This also means that the server is vulnerable when overloaded with requests, for example.  It also does require server providers to purchase hardware resources to handle lots of load, if that is what they want/need to do.

Unfortunately, this is necessary for things that require responsiveness, such as web sites, video game servers, media center pc's, file servers...anything you're likely to call a "server".

However, some things do not require instantaneous response.  They also, for example, do not require that you send and receive the bytes in the correct order.  The only thing that matters is that you have a way of accounting for all the bytes or letters, so that in the end, you have 100% of them and they are in the correct order at that time.

With bittorrent, the server doesn't actually store the files themselves, but merely mathematical representations of each of their pieces, and who is in possession which pieces.  Thus the bittorrent server is called a "tracker", because it keeps track of things.
Remember, the files themsevles are never on the bittorrent tracker.  It only contains information about the files:


Here you can see what I mean.  The tracker just keeps a database of who has which pieces of the file in question.  So what happens is that you request from the tracker, "what ip address has the sections of the file that I need?"

And once you have that information, you can request that chunk of the file from me, and I will send it to you directly--completely bypassing the server's bandwidth altogether:

You can see a few core truths, here.  First, all the file trading is done between the clients.  The server itself cannot infringe anybody's copyright--it is merely enabling others (probably) to do so.  However, in the U.S.A. we made "enabling" copyright infringement a crime, sorta.

The second is that these are private transactions between strangers.

It also uses bandwidth very efficiently.

But there are some other effects, which will make sense if you think about it.  First, if you want to download something, somebody else has got to be sharing it.  This means that the more people have it, the faster and easier it is to get.  Or, the more popular something is, the easier it is to download it.

So the latest episode of Game of Thrones is the easiest thing in the world to lay your hands on.

But an obscure reggae band from the 70's can be difficult to impossible to find (even leading to the horrible consequence of actually buying mp3's off Amazon!)

The interesting thing to me about this is that it sets classical economics on its head.  Usually, the more demand for something there is, the higher the price.  In the case of data via bittorrent, the opposite is true.

In other words, data is the opposite of real stuff in the economic sense, and therefore trying to treat it as stuff is misguided, at best.

Another issue is that if you want to shut bittorrent down, you have to go after the trackers.  But all the trackers do is talk to people.  True, all they talk about is where certain pieces of various files can be found.  But some folks don't like what it is they have to say.

Because basically all a tracker does is link to information.  And thus, when you start blocking access to a torrent site, what you are really blocking access to is a site that links to information you don't want people to have.




No comments:

Post a Comment