Jump to content

Talk:Bencode

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Korexio (talk | contribs) at 09:35, 11 June 2012 (Strings and UTF-8). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

How does bencoding actually work?

The examples of bencoding given are much, much less helpful than they should be as they are not accompanied by an explanation of how the process operates, which is far from self-evident from the examples. It's similar to having a page entitled "foocoding" and having an example saying "the foocoded form of bar is baz, and the foocoded form of quux is xyzzy". Not very helpful without an explanation of how the clear text and cyphertext relate to each other.Thisisnotme 12:37, 12 July 2006 (UTC)[reply]

I expanded the article by describing how it works (as well as adding some explanations, a few of its properties, and making a few corrections). I retained most of the examples. 130.89.167.52 21:25, 14 July 2006 (UTC)[reply]

Efficiency

"While not particularly efficient, bencoding is simple"

Not efficient compared to what? Certainly not XML, at least. I guess only byte-encoding the string lengths would gain you anything (notwithstanding some sort of compression)? — Christopher (talk) @ 23:13, 20 July 2006 (UTC)[reply]

You're right. Wording changed, comparing bencode to a pure binary encoding. --Kwi | Talk 23:26, 23 July 2006 (UTC)[reply]

It does have the added complexity of maintaining order, so whilst decode isn't affected, encoding will be. XML on the other hand does not impose any ordering during encoding, but does require the production of extra bytes. However as Christopher implies, XML is way more inefficient to decode. --Angryjames (talk) 00:43, 8 October 2008 (UTC)[reply]

Recursion

I've removed a fragment in the the Encoding algorithm section stating that Bencoding "is defined recursively." In actuality there is nothing recursive about Bencode's definition. And while bencoded dictionaries allow composition (a dictionary may contain another dictionary), recursion is not possible (a dictionary cannot contain itself.) — Foorider (talk) 01:00, 14 February 2008 (UTC)[reply]

I think you're splitting hairs. A bencode implementation is highly likely to implement encoding and decoding recursively. Schwatoo (talk) 01:41, 6 March 2008 (UTC)[reply]
That doesn't mean it's recursive. It's trivial to write a loop decoder using a data stack instead. 91.16.24.48 (talk) 17:52, 19 February 2011 (UTC)[reply]

Dictionary Sorting

Page fails to mention that dictionary elements should be sorted by key. In the trivia section it mentions "there is only a single valid bencoding", without ordered dictionaries this wouldn't be possible. Schwatoo (talk) 01:37, 6 March 2008 (UTC)[reply]

Not just ordered dictionaries, surely lists must also be ordered. Since the bencoding must be unique (bijection), and comparable without decode, does this not mean that the list element must be ordered in some way? In the example l4:spami42e, is each element compared as an ASCII string, e.g. '4:spam' compared to 'i42e'? --Angryjames (talk) 00:36, 8 October 2008 (UTC)[reply]

Lists are not sorted, only dictionaries. That is the case, because dictionary objects have normally no order (certainly not in python, the programming language which was used for the first Bittorrent client and thus the first Bencoding), while list elements have a certain order. So while one dictionary could have an arbitrary number of encoded forms, if it weren't sorted for encoding, one list has always only one encoded form.85.178.146.223 (talk) 06:24, 20 January 2009 (UTC)[reply]

With "list elements have a certain order" you mean that programming languages usually treat two differently ordered lists as semantically unequal objects, but for a dictionary a programmer usually cannot get a position of an element in a dictionary. With "list elements have a certain order" you do not mean that there is a fixed ordering algorithm for list elements, right? Still Angryjames's question remains, whether bencode mandates such an algortihm to ensure bijection. --Tddt (talk) 00:13, 21 May 2012 (UTC)[reply]
Ah, wait, I already found an example why such an algorithm cannot exist for lists. For instance the "path" element in the info part in multi file mode: "4:pathl 4:path 2:to 8:file.ext e" relies on an unsorted ordering. So it'd be up to the BEPs etc. of certain list keys to ensure bijection by mandating a certain sorting algoritm if appropriate, right? But again, looking at for instance BEP-19, this does not seem to be the case: It does not mandate a lexicographical ordering of the elements in the 'url-list', which results in multiple valid encodings for the same set of webseeds - without imposing a semantical difference upon differing orderings. Ergo, contrary to what the "Features" section claims bencoding is not really bijective? --Tddt (talk) 00:59, 21 May 2012 (UTC)[reply]

Extending Bencoding

Bencoding as presented doesn't really handle some basic types. UTF-8 strings are mentioned in the artlcle. But bencode doesn't provide encodings for some very basic types: floats, bools, sets, etc. There has been some work to extend bencoding to support more types. Should this be included or linked to in the wikipedia article? See https://lists.ubuntu.com/archives/bazaar/2007q3/029292.html for more. Schwatoo (talk) 01:44, 6 March 2008 (UTC)[reply]

Minimum number of elements in lists and dictionaries?

What's the minimum number of elements for lists and dictionaries? Zero? One? Two? From "there is a bijection between values and their encodings" I'd guess two, but I'm not really sure. --Tddt (talk) 23:32, 20 May 2012 (UTC)[reply]

Strings and UTF-8

"The specification does not deal with encoding of characters outside the ASCII set; to mitigate this, some BitTorrent applications explicitly communicate the encoding (most commonly UTF-8) in various non-standard ways."

Hmm, sure? BEP-3 says: "All strings in a .torrent file that contains text must be UTF-8 encoded." --Tddt (talk) 23:38, 20 May 2012 (UTC)[reply]

Bencoding is not only used for the torrent file but may also be used somewhere else (e.g. tracker response). Even in the .torrent file there is a string which is not UTF-8 encoded: The pieces entry in the info dictionary. It contains the SHA1 hashes in binary form (each hash is represented by 20 bytes - not characters) BEP-3. Korexio (talk) 09:35, 11 June 2012 (UTC)[reply]