Jump to content

Talk:Punycode

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by CTMacUser (talk | contribs) at 06:37, 14 March 2019 (The RFC Website Has Changed: new section). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
WikiProject iconComputing: Networking Start‑class
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
StartThis article has been rated as Start-class on Wikipedia's content assessment scale.
???This article has not yet received a rating on the project's importance scale.
Taskforce icon
This article is supported by Networking task force.
WikiProject iconInternet Start‑class
WikiProject iconThis article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the Internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
StartThis article has been rated as Start-class on Wikipedia's content assessment scale.
???This article has not yet received a rating on the project's importance scale.

Untitled

First, the linked "Punycode Exploit" should link to the advisory text, not the simple demonstration page. Secondly, this is not an exploit of punycode so much as it is an exploit of the fact that not running domain names through nameprep is asking for spoofing problems.

I would change the paragraph from:

  Punycode is easily exploitable, and for an example see Punycode exploit

to:

  Note that browsers which fail to run a string
  through nameprep before using it as a DNS name are
  vulnerable to spoofing exploits, as in Punycode spoof.

I suspect is should read more like:

   The fact that Unicode and Punycode strings result in homographs (strings
   which are different but visually indistinguishable or almost so) is a matter of
   some concern.  Phishing (a class of social engineering exploits) will make use of
   homographs like "www.paypa1.com" (note: this containst the digit, 1 ("one") rather
   than the letter, l ("ell").

I'd like to add some links to the discussion of "petnames" (which I've read from the cap-talk mailing list that's maintained by Jonathan Shapiro, creator of EROS).

Unforunately I, User:JimD, lack the time at the moment to do this properly. So, I'm tacking this in here in the hopes that it will help me remember or that someone here will take the ball and run with it. JimD 22:43, 2005 Feb 22 (UTC)

With the last two words an external link to the exploit.

You are right that this isn't a Punycode exploit: it's an IDNA exploit, and it's mentioned in the security section of RFC 3490. But I think you must be wrong about Nameprep - as far as I know, these browsers are already using Nameprep, and Nameprep has no effect on the domain label in question (pаypal). --Zundark 18:59, 7 Feb 2005 (UTC)
D'oh! - You're right, my bad. Nameprep wouldn't help since Unicode NFKC does not, in fact, normalize Cyrillic "а" to Latin "a". So never mind the suggested rephrasing. Instead, someone should rephrase it to mention that this is an IDNA problem or, better yet, move this paragraph and link to the IDNA page.



The explanation for how Punycode is supposed to work is unintelligible to me, and I don't consider myself particularly dull about technical matters. The main problem is that the author doesn't seem to have a grasp of where to begin and ends up explaining the thing in bits and pieces. Can someone replace this with something more coherent? 82.92.119.11 8 July 2005 17:17 (UTC)

The article contains a lot of information that is actually about IDNA rather than Punycode, and I think this is what makes it so confusing. There's no reason for this IDNA information to be here, since it's already in the appropriate article. Much of this information has been removed before, but it just gets added back in a different form. So I don't really hold out much hope for this article, and don't currently consider it worth the effort of working on. Fortunately, the article does have a link to RFC 3492, which is what people need to read if they want to understand Punycode. --Zundark 8 July 2005 18:24 (UTC)

Patent issues

Is Punycode patented? --84.61.51.126 13:01, 6 July 2006 (UTC)[reply]

Nobody's asserted a patent against it as far as I know. Given the nature of the patent system, that's the most anyone can say for certain. --Alvestrand 15:07, 18 October 2006 (UTC)[reply]

How about showing some examples?

Examples

From the text its not really clear about how exactly you get back to the name and even if its possible. Or how the position of the character is encoded. Since currently it looks to me as if "bücher" "bächer" "bchüer" etc. all get the same bcher-kva code??

I've rewritten the first part of the encoding description based on reading the rfc, more work is still needed though. Plugwash 01:45, 27 February 2007 (UTC)[reply]

Shouldn't "bücüher" be punycoded as "bcher-kvaba", and "bücherü" as "bcher-kvaea"? Moreover, should "ýbücher" be punycoded as "bcher-kvafa" or as "bcher-fakva"? (are both possible?) —Preceding unsigned comment added by 193.145.147.38 (talk) 18:21, 27 February 2008 (UTC)[reply]

Someone more knowledgable than I should explain why the non-ascii character "ü" being insertes at various places WITHIN an ascii string magically transmutes to "ý" if it is inserted BEFORE the string. It all sound like an obfuscation of alchemy to me. Xojo (talk) 01:42, 3 September 2010 (UTC)[reply]
Oh, that's just because "ý" (U+00FD) is the next code point after "ü" (U+00FC). No alchemy there! — Sebastian 07:13, 20 December 2011 (UTC)[reply]

Encoding principles example

The heading "Encoding procedure" says:

"The 'bücher' example given above . . ."

There is no "bücher" example "above" in the current form of the article.

This is probably an editorial anomaly in the wake of rearrangement of the article.

Le crayon rouge ne dort jamais.

Doug Kerr 17:56, 11 March 2007 (UTC)[reply]

I've reworded it. --Zundark 10:42, 12 March 2007 (UTC)[reply]

I did not understand this article but had no problem to understand Punycode from reading the RFC. I will reorder and enhance this article by. Starting with a quick description of the reasons why this particular encoding scheme is used and following the explanation order in the RFC, which start out with the decoding, which is far easier to understand then the encoding. Roeschter 19:25, 25 March 2007 (UTC)[reply]

Bootstring

'Bootstring' redirects here, even though it doesn't strike me as synonymous to 'Punycode' (Punycode, going by the title of ref1, Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA), seems to be a kind of Bootstring).

Could someone with more knowledge on the matter either create a Bootstring article describing what that is or make it clear that Bootstring indeed is merely a synonym? I'm curious which/what it is.

-pinkgothic (talk) 12:42, 12 November 2010 (UTC)[reply]

The section "Encoding of non-ASCII character insertions as code numbers" needs a complete rewrite. Someone went out of their way to use computer science jargon. 188.103.59.142 (talk) 16:24, 9 September 2011 (UTC)[reply]

Ambiguity in separation of ASCII characters

Resolved

Section Separation of ASCII characters says "Since it is a basic character, the ASCII hyphen may still appear in the string before this additional character, but the addition does not cause ambiguity". But precisely because the hyphen may still appear, we have a problem: Suppose you wanted to convert the string "bcher-kva" to a label - wouldn't that be just the same, thus rendering the label "bcher-kva" ambiguous? — Sebastian 07:19, 20 December 2011 (UTC)[reply]

The description appears to be incorrect: a hyphen is always added (unless there are no basic characters at all) so "bcher-kva" would be encoded as "bcher-kva-". See example (S) in RFC 3492. (But in practice Punycode is hardly used outside of IDNA, and IDNA wouldn't apply Punycode to such a string anyway.) I've undone 195.242.190.233's changes of 17 November 2010, which introduced the error. --Zundark (talk) 09:58, 20 December 2011 (UTC)[reply]
Thanks! — Sebastian 10:28, 20 December 2011 (UTC)[reply]

The RFC Website Has Changed

The current link to the Punycode RFC didn't work; it was redirected to an IETF home page. <https://www.rfc-editor.org/rfc/rfc3492.txt> is the new link to the Punycode RFC.