UTF-16

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding, now known as UCS-2 (for 2-byte Universal Character Set), once it became clear that more than 216 (65,536) code points were needed.[1]

UTF-16
The first 216 Unicode code points. The white stripe near the bottom are the surrogate halves used by UTF-16.
Language(s)International
StandardUnicode Standard
ClassificationUnicode Transformation Format, variable-width encoding
ExtendsUCS-2
Transforms / EncodesISO/IEC 10646 (Unicode)

UTF-16 is used by systems such as the Microsoft Windows API, the Java programming language and JavaScript/ECMAScript. It is also sometimes used for plain text and word-processing data files on Microsoft Windows. It is rarely used for files on Unix-like systems. It is used by SMS (the SMS standard specifies UCS-2, but almost all users actually implement UTF-16 so that emojis work).[citation needed]

UTF-16 is the only web-encoding incompatible with ASCII[2] and never gained popularity on the web, where it is declared by under 0.002% (little over 1 thousandth of 1 percent) of web pages[3] (and many of these are actually UTF-8 because of "contradictory character encoding specifications" and/or "incorrect character encoding defined").[4][5] UTF-8, by comparison, accounts for 98% of all web pages.[6] The Web Hypertext Application Technology Working Group (WHATWG) considers UTF-8 "the mandatory encoding for all [text]" and that for security reasons browser applications should not use UTF-16.[7]


Share this article:

This article uses material from the Wikipedia article UTF-16, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.