Sunday, January 5, 2014

Stupid IDN Tricks: Unicode Combining Characters (or http://░͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇.ws)

Nov 3, 2014: The domains mentioned in this article are expiring and I'm not renewing them. All links have been redirected to the archive.org mirror of the original site.



Safari will display Unicode combining diacritical marks in the URL bar (try going to  http://░͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇.ws). It is possible to register domains with these marks. Some of these domains will look much like legitimate domains (e.g. apple.com vs. apple͢.com). This is probably not good.

Internationalized Domain Names (IDN)


DNS was only designed with 7-bit unsigned ASCII in mind. However, not everyone in the world speaks English, and they really want to type domains in their own language. So there is a terrible hack to map Unicode characters to 7-bit unsigned ASCII, called IDNA.

Homograph Attacks


Hopefully everyone has heard of homograph attacks using internationalized domain names. If not, here is a recap (taken from the Chrome wiki):
... different characters from different languages can look very similar, and this can make phishing attacks possible. For example, the Latin "a" looks a lot like the Cyrillic "а", so someone could register http://ebаy.com (http://xn--eby-7cd.com/), which would easily be mistaken for http://ebay.com. This is called a homograph attack.

Defenses Against Homograph Attacks


There are multilayered solutions to the homograph attack:
  • Browser characters blacklists. These prevent you from registering characters that look like '/', and so on.
  • IDN character display rules (see: Firefox, Chrome). These rules restrict non-ASCII domain names to only those languages specifically configured by the user, and prevent display of mixed-language domains. For instance, if your have a Chinese installation of Windows then Chinese characters will be displayed for Chinese IDNs.
  • Registrar restrictions. Registrars will prevent you from registering a domain that combines  more than one language. So you can't register a name that is half English and half Russian, for instance.

Another Attempt


So how do we explain http://░͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇͇.ws?

Defeating Registrar Restrictions


Registrars prohibit combining languages in domain names. But there are characters that aren't in any language. The most interesting of these are Unicode Combining Diacritical Marks. These unicode code points will modify the glyph right before them, instead of adding a new character. For example, the letter A when combined with U+0x332 will become: A̲.

But will these characters display in browsers?

Chrome: No :(
Firefox: No :(
Safari: Yes :)

Impact


Someone could register apple͢.com and it would display in Safari as:

This is not good.