Internationalized domain name

Last updated
Example of Greek IDN with domain name in non-Latin alphabet: outopia.dpth.gr (Punycode is xn--kxae4bafwg.xn--pxaix.gr) IDN-utopia-greek.jpg
Example of Greek IDN with domain name in non-Latin alphabet: ουτοπία.δπθ.gr (Punycode is xn--kxae4bafwg.xn--pxaix.gr)

An internationalized domain name (IDN) is an Internet domain name that contains at least one label displayed in software applications, in whole or in part, in non-Latin script or alphabet [lower-alpha 1] or in the Latin alphabet-based characters with diacritics or ligatures. [lower-alpha 2] These writing systems are encoded by computers in multibyte Unicode. Internationalized domain names are stored in the Domain Name System (DNS) as ASCII strings using Punycode transcription.

Contents

The DNS, which performs a lookup service to translate mostly user-friendly names into network addresses for locating Internet resources, is restricted in practice [lower-alpha 3] to the use of ASCII characters, a practical limitation that initially set the standard for acceptable domain names. The internationalization of domain names is a technical solution to translate names written in language-native scripts into an ASCII text representation that is compatible with the DNS. Internationalized domain names can only be used with applications that are specifically designed for such use; they require no changes in the infrastructure of the Internet.

IDN was originally proposed in December 1987 by Martin Dürst [1] [2] and implemented in 1990 by Tan Juay Kwang and Leong Kok Yong under the guidance of Tan Tin Wee.[ citation needed ] After much debate and many competing proposals, a system called Internationalizing Domain Names in Applications (IDNA) [3] was adopted as a standard, and has been implemented in several top-level domains.

In IDNA, the term internationalized domain name means specifically any domain name consisting only of labels to which the IDNA ToASCII algorithm (see below) can be successfully applied. In March 2008, the IETF formed a new IDN working group to update [4] the current IDNA protocol. In April 2008, UN-ESCWA together with the Public Interest Registry (PIR) and Afilias launched the Arabic Script in IDNs Working Group (ASIWG), which comprised experts in DNS, ccTLD operators, business, academia, as well as members of regional and international organizations. Operated by Afilias's Ram Mohan, ASIWG aims to develop a unified IDN table for the Arabic script, and is an example of community collaboration that helps local and regional experts engage in global policy development, as well as technical standardization. [5]

In October 2009, the Internet Corporation for Assigned Names and Numbers (ICANN) approved the creation of internationalized country code top-level domains (IDN ccTLDs) in the Internet that use the IDNA standard for native language scripts. [6] [7] In May 2010, the first IDN ccTLDs were installed in the DNS root zone. [8]

Internationalizing Domain Names in Applications

Internationalizing Domain Names in Applications (IDNA) is a mechanism defined in 2003 for handling internationalized domain names containing non-ASCII characters.

Although the Domain Name System supports non-ASCII characters, applications such as e-mail and web browsers restrict the characters that can be used as domain names for purposes such as a hostname. Strictly speaking, it is the network protocols these applications use that have restrictions on the characters that can be used in domain names, not the applications that have these limitations or the DNS itself.[ citation needed ] To retain backward compatibility with the installed base, the IETF IDNA Working Group decided that internationalized domain names should be converted to a suitable ASCII-based form that could be handled by web browsers and other user applications.[ citation needed ] IDNA specifies how this conversion between names written in non-ASCII characters and their ASCII-based representation is performed. [ citation needed ]

An IDNA-enabled application can convert between the internationalized and ASCII representations of a domain name. It uses the ASCII form for DNS lookups but can present the internationalized form to users who presumably prefer to read and write domain names in non-ASCII scripts such as Arabic or Hiragana. Applications that do not support IDNA will not be able to handle domain names with non-ASCII characters, but will still be able to access such domains if given the (usually rather cryptic) ASCII equivalent.

ICANN issued guidelines for the use of IDNA in June 2003, and it was already possible to register .jp domains using this system in July 2003 and .info [9] domains in March 2004. Several other top-level domain registries started accepting registrations in 2004 and 2005. IDN Guidelines were first created [10] in June 2003, and have been updated [11] to respond to phishing concerns in November 2005. An ICANN working group focused on country-code domain names at the top level was formed in November 2007 [12] and promoted jointly by the country code supporting organization and the Governmental Advisory Committee. Additionally, ICANN supports the community-led Universal Acceptance Steering Group, which seeks to promote the usability of IDNs and other new gTLDS in all applications, devices, and systems. [13]

Mozilla 1.4, Netscape 7.1, and Opera 7.11 were among the first applications to support IDNA. A browser plugin is available for Internet Explorer 6 to provide IDN support. Internet Explorer 7.0 [14] and Windows Vista's URL APIs provide native support for IDN. [15]

ToASCII and ToUnicode

The conversions between ASCII and non-ASCII forms of a domain name are accomplished by a pair of algorithms called ToASCII and ToUnicode. These algorithms are not applied to the domain name as a whole, but rather to individual labels. For example, if the domain name is www.example.com, then the labels are www, example, and com. ToASCII or ToUnicode is applied to each of these three separately.

The details of these two algorithms are complex. They are specified in RFC 3490. Following is an overview of their workings.

ToASCII leaves ASCII labels unchanged. It fails if the label is unsuitable for the Domain Name System. For labels containing at least one non-ASCII character, ToASCII applies the Nameprep algorithm. This converts the label to lowercase and performs other normalization. ToASCII then translates the result to ASCII, using Punycode. [16] Finally, it prepends the four-character string "xn--". [17] This four-character string is called the ASCII Compatible Encoding (ACE) prefix. It is used to distinguish labels encoded in Punycode from ordinary ASCII labels. The ToASCII algorithm can fail in several ways. For example, the final string could exceed the 63-character limit of a DNS label. A label for which ToASCII fails cannot be used in an internationalized domain name.

The function ToUnicode reverses the action of ToASCII, stripping off the ACE prefix and applying the Punycode decode algorithm. It does not reverse the Nameprep processing, since that is merely a normalization and is by nature irreversible. Unlike ToASCII, ToUnicode always succeeds, because it simply returns the original string if decoding fails. In particular, this means that ToUnicode does not affect a string that does not begin with the ACE prefix.

Example of IDNA encoding

IDNA encoding may be illustrated using the example domain Bücher.example. (German : Bücher, lit. 'books'.) This domain name has two labels, Bücher and example. The second label is pure ASCII and is left unchanged. The first label is processed by Nameprep to give bücher, and then converted to Punycode to result in bcher-kva. It is then prefixed with xn-- to produce xn--bcher-kva. The resulting name suitable for use in DNS records and queries is therefore xn--bcher-kva.example.

Arabic Script IDN Working Group (ASIWG)

While the Arab region represents 5 percent of the world's population, it accounts for a mere 2.6 percent of global Internet usage. Moreover, the percentage of Internet users among the population in the Arab world is a low of 11 percent, compared to the global rate of 21.9 percent. However, Internet usage in the region has grown by 1,426 percent between the years 2000 and 2008, which represents a large increase, particularly compared to the average world growth rate of 305.5 percent over the same period. It is reasonable to infer, therefore, that the usage growth could have been even more significant if DNS was available in Arabic characters. The introduction of IDNs offers many potential new opportunities and benefits for Arab Internet users by allowing them to establish domains in their native languages and alphabets, and to create a whole range of services and localized applications on top of those domains. [18]

Top-level domain implementation

In 2009, ICANN decided to implement a new class of top-level domains, assignable to countries and independent regions, similar to the rules for country code top-level domains. However, the domain names may be any desirable string of characters, symbols, or glyphs in the language-specific, non-Latin alphabet or script of the applicant's language, within certain guidelines to assure sufficient visual uniqueness.

The process of installing IDN country code domains began with a long period of testing in a set of subdomains in the test top-level domain. Eleven domains used language-native scripts or alphabets, such as "δοκιμή", [19] meaning test in Greek.

These efforts culminated in the creation of the first internationalized country code top-level domains (IDN ccTLDs) for production use in 2010.

In the Domain Name System, these domains use an ASCII representation consisting of the prefix "xn--" followed by the Punycode translation of the Unicode representation of the language-specific alphabet or script glyphs. For example, the Cyrillic name of Russia's IDN ccTLD is "рф". In Punycode representation, this is "p1ai", and its DNS name is "xn--p1ai".

Non-IDNA or non-ICANN registries that support non-ASCII domain names

Other registries support non-ASCII domain names. The company ThaiURL.com in Thailand supports ".com" registrations via its own IDN encoding, ThaiURL. However, since most modern browsers only recognize IDNA/Punycode IDNs, ThaiURL-encoded domains must be typed in or linked to in their encoded form, and they will be displayed thus in the address bar. This limits their usefulness; however, they are still valid and universally accessible domains.

Several registries support Punycode emoji characters as emoji domains.

ASCII spoofing concerns

The use of Unicode in domain names makes it potentially easier to spoof websites as the visual representation of an IDN string in a web browser may make a spoof site appear indistinguishable from the legitimate site being spoofed, depending on the font used. For example, the Unicode character U+0430 -- Cyrillic small letter a -- can look identical to the Unicode character U+0061 (Latin small letter a), used in English. As a concrete example, using Cyrillic letters а, е, і, р (a; then "Ie"/"Ye" U+0435, looking essentially identical to Latin letter e; then U+0456, essentially identical to Latin letter i; and "Er" U+0440, essentially identical to Latin letter p), the URL wіkіреdіа.org is formed, which is virtually indistinguishable from the visual representation of the legitimate wikipedia.org (possibly depending on typefaces).

Top-level domains accepting IDN registration

Many top-level domains have started to accept internationalized domain name registrations at the second or lower levels. Afilias (.INFO) offered the first gTLD IDN second-level registrations in 2004 in the German language. [20]

DotAsia, the registrar for the TLD Asia, conducted a 70-day sunrise period starting May 11, 2011 for second-level domain registrations in the Chinese, Japanese and Korean scripts. [21]

Timeline

See also

Related Research Articles

The Domain Name System (DNS) is a hierarchical and distributed naming system for computers, services, and other resources in the Internet or other Internet Protocol (IP) networks. It associates various information with domain names assigned to each of the associated entities. Most prominently, it translates readily memorized domain names to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols. The Domain Name System has been an essential component of the functionality of the Internet since 1985.

A top-level domain (TLD) is one of the domains at the highest level in the hierarchical Domain Name System of the Internet after the root domain. The top-level domain names are installed in the root zone of the name space. For all domains in lower levels, it is the last part of the domain name, that is, the last non empty label of a fully qualified domain name. For example, in the domain name www.example.com, the top-level domain is .com. Responsibility for management of most top-level domains is delegated to specific organizations by the ICANN, an Internet multi-stakeholder community, which operates the Internet Assigned Numbers Authority (IANA), and is in charge of maintaining the DNS root zone.

<span class="mw-page-title-main">Domain name</span> Identification string in the Internet

In the Internet, a domain name is a string that identifies a realm of administrative autonomy, authority or control. Domain names are often used to identify services provided through the Internet, such as websites, email services and more. As of December 2023, 359.8 million domain names had been registered. Domain names are used in various networking contexts and for application-specific naming and addressing purposes. In general, a domain name identifies a network domain or an Internet Protocol (IP) resource, such as a personal computer used to access the Internet, or a server computer.

Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames. Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphens, which is called the letter–digit–hyphen (LDH) subset. For example, München is encoded as Mnchen-3ya.

In computer networking, a hostname is a label that is assigned to a device connected to a computer network and that is used to identify the device in various forms of electronic communication, such as the World Wide Web. Hostnames may be simple names consisting of a single word or phrase, or they may be structured. Each hostname usually has at least one numeric network address associated with it for routing packets for performance and other reasons.

A country code top-level domain (ccTLD) is an Internet top-level domain generally used or reserved for a country, sovereign state, or dependent territory identified with a country code. All ASCII ccTLD identifiers are two letters long, and all two-letter top-level domains are ccTLDs.

The Japan Registry Services Co., Ltd. (JPRS) was incorporated on December 26, 2000. The organization manages the .jp ccTLD, including the operation of the registry and DNS servers.

The Internationalized Resource Identifier (IRI) is an internet protocol standard which builds on the Uniform Resource Identifier (URI) protocol by greatly expanding the set of permitted characters. It was defined by the Internet Engineering Task Force (IETF) in 2005 in RFC 3987. While URIs are limited to a subset of the US-ASCII character set, IRIs may additionally contain most characters from the Universal Character Set, including Chinese, Japanese, Korean, and Cyrillic characters.

The internationalized domain name (IDN) homograph attack is a way a malicious party may deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike

Single-letter second-level domains are domains in which the second-level domain of the domain name consists of only one letter, such as x.com. In 1993, the Internet Assigned Numbers Authority (IANA) explicitly reserved all single-letter and single-digit second-level domains under the top-level domains com, net, and org, and grandfathered those that had already been assigned. In December 2005, ICANN considered auctioning these domain names.

Many email clients now offer some support for Unicode. Some clients will automatically choose between a legacy encoding and Unicode depending on the mail's content, either automatically or when the user requests it.

ThaiURL is a technology enabling the use of Thai domain names in applications that have been modified to support this technology. It is one of several such systems that were marketed before the advent of IDNA.

International email arises from the combined provision of internationalized domain names (IDN) and email address internationalization (EAI). The result is email that contains international characters, encoded as UTF-8, in the email header and in supporting mail transfer protocols. The most significant aspect of this is the allowance of email addresses in most of the world's writing systems, at both interface and transport levels.

<span class="mw-page-title-main">.рф</span> Cyrillic Internet country code top-level domain for the Russian Federation

The domain name .рф is the Cyrillic country code top-level domain for the Russian Federation, in the Domain Name System of the Internet. In the Domain Name System it has the ASCII DNS name xn--p1ai. The domain accepts only Cyrillic subdomain applications, and is the first Cyrillic implementation of the Internationalizing Domain Names in Applications (IDNA) system. The domain became operational on 13 May 2010. As of 2014 it is the most used internationalized country code top-level domain, with around 900,000 domain names.

The Arabic name امارات, romanized as emarat, is the internationalized country code top-level domain for the United Arab Emirates. The ASCII name of this domain in the Domain Name System of the Internet is xn--mgbaam7a8h, using the Internationalizing Domain Names in Applications (IDNA) procedure in the translation of the Unicode representation of the script version. The domain was installed in the Domain Name System on 5 May 2010.

مصر is the internationalized country code top-level domain in the Domain Name System (DNS) of the Internet for Egypt. Its ASCII DNS name is xn--wgbh1c, obtained by the Internationalizing Domain Names in Applications (IDNA) transcription method.

An internationalized country code top-level domain is a top-level domain in the Domain Name System (DNS) of the Internet. IDN ccTLDs are specially encoded domain names that are displayed in an end user application, such as a web browser, in their language-native script or alphabet, such as the Arabic alphabet, or a non-alphabetic writing system, such as Chinese characters. IDN ccTLDs are an application of the internationalized domain name system to top-level Internet domains assigned to countries, or independent geographic regions.

An emoji domain is a domain name with one or more emoji in it, for example 😉.tld.

Universal Acceptance (UA) is a term coined by Ram Mohan to represent the principle that every top-level domain (TLD) should function within all applications regardless of script, number of characters, or how new it is.

References

  1. Such as Arabic, Bengali, Chinese (Mandarin, simplified or traditional), Cyrillic (including Bulgarian, Russian, Serbian and Ukrainian), Devanagari, Greek, Hebrew, Hindi, Tamil or Thai.
  2. Such as French, German, Italian, Polish, Portuguese or Spanish.
  3. RFC   2181, Clarifications to the DNS Specification: section 11 explicitly allows any binary string. Non-ASCII encodings such as UTF-8 have indeed been (privately) used over DNS per RFC   6055. The system of internet domain name registration is, however, totally incapable of handling non-ASCII encodings, hence the restriction; see also RFC   5890 §§ 2.2, 2.3 on the format of names.
  1. Dürst, Martin J. (December 10, 1996). "Internet Draft: Internationalization of Domain Names". Ietf Datatracker. The Internet Engineering Task Force (IETF), Internet Society (ISOC). Retrieved 2009-10-31.
  2. Dürst, Martin J. (December 20, 1996). "URLs and internationalization". World Wide Web Consortium. Retrieved 2009-10-30.
  3. Faltstrom, P.; Hoffman, P.; Costello, A. (March 2003). Internationalizing Domain Names in Applications (IDNA). doi: 10.17487/RFC3490 . RFC 3490.
  4. John Klensin (January 6, 2010). "Internationalized Domain Names in Applications (IDNA): Protocol (RFC 5891 Draft)". Ietf Datatracker. Internet Engineering Task Force . Retrieved 2016-08-12.
  5. Economic and Social Commission for Western Asia (ESCWA), United Nations (15 June 2009). "Internet Governance: Challenges and Opportunities for the ESCWA Member Countries" (PDF). United Nations. Retrieved 7 Dec 2019.
  6. "ICANN Bringing the Languages of the World to the Global Internet" (Press release). Internet Corporation For Assigned Names and Numbers (ICANN). October 30, 2009. Retrieved 2009-10-30.
  7. "Internet addresses set for change". BBC News. October 30, 2009. Retrieved 2009-10-30.
  8. 1 2 "First IDN ccTLDs now available" (Press release). Internet Corporation For Assigned Names and Numbers (ICANN). May 5, 2010. Retrieved 2010-05-06.
  9. Mohan, Ram, German IDN, German Language Table Archived 2006-12-18 at the Wayback Machine , March 2003
  10. Dam, Mohan, Karp, Kane & Hotta, IDN Guidelines 1.0, ICANN, June 2003
  11. Karp, Mohan, Dam, Kane, Hotta, El Bashir, IDN Guidelines 2.0, ICANN, November 2005
  12. Jesdanun, Anick (Associated Press) (2 November 2007). "Group on Non-English Domains Formed". Archived from the original on December 20, 2008. Retrieved 2 November 2007.
  13. "ICANN – Universal Acceptance". ICANN. February 25, 2012.
  14. "Internet Explorer for Developers". learn.microsoft.com. October 22, 2021.
  15. "Handling Internationalized Domain Names (IDNs) - Win32 apps". learn.microsoft.com. January 7, 2021.
  16. RFC 3492, Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA), A. Costello, The Internet Society (March 2003)
  17. Internet Assigned Numbers Authority (2003-02-14). "Completion of IANA Selection of IDNA Prefix". www.atm.tut.fi. Archived from the original on 2010-04-27. Retrieved 2017-09-22.
  18. "Internationalized Domain Names - ICANN". www.icann.org. Retrieved 2019-12-08.
  19. IANA Report on Delegation of Eleven Evaluative Internationalised Top-Level Domains [ permanent dead link ]
  20. ".INFO German Character Table". www.iana.org. Retrieved 2017-04-11.
  21. Dot-Asia releases IDN dates, Managing Internet IP, April 14, 2011.
  22. Dürst, Martin J. (17 March 1998). "draft-duerst-dns-i18n-00 – Internationalization of Domain Names". Ietf Datatracker. Tools.ietf.org. Retrieved 2010-07-29.
  23. "minc.org". Archive.minc.org. 2019-12-12. Retrieved 2022-10-07.
  24. "the leading Telecom magazine, ICT magazine, Telecom magazine, ICT and Telecom". Connect-World. Archived from the original on 2008-07-23. Retrieved 2010-07-29.
  25. "Tan Tin Wee". Internet Hall of Fame.
  26. "APAN-KR" (PDF). IITA.
  27. "APNG". APNG. Retrieved 2010-07-29.
  28. "The community of Asia Pacific Internet Organization". Apstar.Org. Retrieved 2010-07-29.
  29. "Asia Pacific Networking Group Chairman's Commission on Internationalization of DNS". www.apng.org. Archived from the original on April 22, 2006.
  30. "iDOMAIN - TestBed of iDNS implementations in the Asia Pacific". www.minc.org. Archived from the original on 2003-08-23.
  31. "iDNS for IPv6". www.apng.org. Archived from the original on August 11, 2006.
  32. "Method and system for internationalizing domain names (US6182148)". Delphion.com. Archived from the original on 2010-07-15. Retrieved 2010-07-29.
  33. "draft-jseng-utf5-00 – UTF-5, a transformation format of Unicode and ISO 10646". Ietf Datatracker. Tools.ietf.org. 1999-07-27. Retrieved 2010-07-29.
  34. "draft-jseng-utf5-01 – UTF-5, a transformation format of Unicode and ISO 10646". Ietf Datatracker. Tools.ietf.org. 2000-01-28. Retrieved 2010-07-29.
  35. "iNAME project of APTLD and APNG". www.minc.org. Archived from the original on 2003-08-23. Retrieved 2023-08-20.
  36. "Internationalisation of the Domain Name System: The Next Big Step in a Multilingual Internet". NEWS. i-DNS.net. 24 July 2000. Retrieved 2016-08-13.
  37. "Proposal BoF - Meeting Summary". www.minc.org. Archived from the original on 2004-11-10. Retrieved 2023-08-20.
  38. "APRICOT 2000 in Seoul". Apricot.net. Retrieved 2010-07-29.
  39. "Multilingual Internet Names Consortium". MINC. Retrieved 2010-07-29.
  40. "History of MINC". www.minc.org. Archived from the original on 2004-01-26. Retrieved 2023-08-20.
  41. "Chinese Domain Name Consortium". CDNC. 2000-05-19. Retrieved 2010-07-29.
  42. "Chinese Domain Name Consortium". CDNC. Retrieved 2010-07-29.
  43. "urduworkshop.sdnpk.org". Archived from the original on 2016-01-06. Retrieved 2022-07-10.
  44. "Signposts in Cyberspace: The Domain Name System and Internet Navigation". Nap.edu. 2001-11-07. Archived from the original on 2008-07-06. Retrieved 2010-07-29.
  45. "ICANN | Archives | Committees | Internationalized Domain Names (IDN) Committee". archive.icann.org. Retrieved 2017-04-11.
  46. "Guidelines for the Implementation of Internationalized Domain Names – Version 1.0". ICANN.
  47. "ITU-T SG17 Meeting Documents". Itu.int. Retrieved 2010-07-29.
  48. "ITU-T Newslog – Multilingual Internet Work Progresses". Itu.int. 2006-05-04. Archived from the original on 2010-05-29. Retrieved 2010-07-29.
  49. "GNSO IDN WG". icann.org. 2007-03-22. Retrieved 2010-08-30.
  50. Mohan, Ram, GNSO IDN Working Group, Outcomes Report (PDF), ICANN
  51. "On Its Way: One of the Biggest Changes to the Internet | Internet users have key role in testing the operation of example.test in 11 languages". www.icann.org.
  52. "My Name, My Language, My Internet: IDN Test Goes Live | ICANN launches global test of Internationalized Domain Names". www.icann.org.
  53. "Successful Evaluations of .test IDN TLDs". www.icann.org.
  54. "IDN Workshop: IDNs in Indian Languages and Scripts | New Delhi 2008". archive.icann.org. Retrieved 2017-04-11.
  55. IDNAbis overview (2008)
  56. "ICANN | Archives | Internationalized Domain Names Meetings". archive.icann.org. Retrieved 2017-04-11.
  57. "ICANN - Paris/IDN CCTLD discussion - Wiki". isoc-ny.org.
  58. "ASIWIG Meeting | Paris 2008". archive.icann.org. Retrieved 2017-04-11.
  59. "ICANN Seeks Interest in IDN ccTLD Fast-Track Process". www.icann.org.
  60. Proposed Final Implementation Plan: IDN ccTLD Fast Track Process, 30 September 2009
  61. Regulator approves multi-lingual web addresses, Silicon Republic, 30.10.2009
  62. "First IDN ccTLDs Requests Successfully Pass String Evaluation". ICANN. 2010-01-21.
  63. "Board IDN Variants Working Group – ICANN". www.icann.org. Retrieved 2017-04-11.
  64. J. Klensin (February 2012). Overview and Framework for Internationalized Email. IETF. doi: 10.17487/RFC6530 . RFC 6530 . Retrieved January 14, 2017.