Mark Davis (Unicode)

Last updated
Mark Davis
Born
Mark Edward Davis

(1952-09-13) 13 September 1952 (age 70)
Alma mater Stanford University (PhD)
Known for Unicode
Unicode Consortium
Scientific career
Fields Internationalization and localization
Institutions IBM
Apple
Google
Taligent
Unicode Consortium
Thesis Formal problems for Utilitarianism  (1979)
Website www.macchiato.com

Mark Edward Davis (born September 13, 1952) is an American specialist in the internationalization and localization of software and the co-founder and president of the Unicode Consortium. [1] [2] [3] [4]

Contents

He is one of the key technical contributors to the Unicode specifications, being the primary author or co-author of bidirectional text algorithms (used worldwide to display Arabic language and Hebrew language text), collation (used by sorting algorithms and search algorithms), Unicode normalization, Unicode scripts, text segmentation, identifiers, regular expressions, data compression, character encoding and security. [5] [6] [7]

Education

Davis was educated at Stanford University where he was awarded a PhD in Philosophy in 1979. [8]

Career and research

Davis has specialized in Internationalization and localization of software for many years. After his PhD, he worked in Zurich, Switzerland for several years,[ quantify ] then returned to California to join Apple, where he co-authored the Macintosh KanjiTalk and Script Manager, and authored the Macintosh Arabic and Hebrew systems. He also worked on parts of the Mac OS, including contributions to the design of TrueType. Later, he was the manager and architect for the Taligent international frameworks and was then the architect for a large part of the Java international libraries. [9] At IBM, he was the Chief Software Globalization Architect. He is the author of a number of patents, primarily in internationalization and localization. At various times he has also managed groups or departments covering text, internationalization, operating system services, porting and technical communications. [10]

Davis founded and was responsible for the overall architecture of International Components for Unicode (ICU: a major Unicode software internationalization library) and designed the core of the Java internationalization classes. He also is the vice-chair of the Unicode Common Locale Data Repository (CLDR) project, [11] and is a co-author of Best Current Practice (BCP) 47 IETF language tag Request for Comments (RFC 4646 and RFC 5646), used for identifying languages in XML and HTML documents.

Since the start of 2006, Davis has been working on software internationalization at Google, focusing on effective and secure use of Unicode (especially in the index and search pipeline), overall improvement and adoption of the software internationalization libraries (including ICU) and the introduction and maintenance of stable identifiers for languages, scripts, regions, time zones and currencies. [12]

Publications

The Unicode Standard, Version 5.0 [13]

Personal life

Davis is married to Anne Gundelfinger. [3] He has two daughters from a previous marriage.

Related Research Articles

<span class="mw-page-title-main">Unicode</span> Character encoding standard

Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, which is maintained by the Unicode Consortium, defines as of the current version (15.0) 149,186 characters covering 161 modern and historic scripts, as well as symbols, emoji, and non-visual control and formatting codes.

Taligent was an American software company. Based on the Pink object-oriented operating system conceived by Apple in 1988, Taligent Inc. was incorporated as an Apple/IBM partnership in 1992, and was dissolved into IBM in 1998.

<span class="mw-page-title-main">AIM alliance</span> Historic business alliance

The AIM alliance, also known as the PowerPC alliance, was formed on October 2, 1991, between Apple, IBM, and Motorola. Its goal was to create an industry-wide open-standard computing platform based on the POWER instruction set architecture. It was intended to solve legacy problems, future-proof the industry, and compete with Microsoft's monopoly and the Wintel duopoly. The alliance yielded the launch of Taligent, Kaleida Labs, the PowerPC CPU family, the Common Hardware Reference Platform (CHRP) hardware platform standard, and Apple's Power Macintosh computer line.

<span class="mw-page-title-main">Internationalization and localization</span> Process of making software accessible to people in different areas of the world

In computing, internationalization and localization (American) or internationalisation and localisation, often abbreviated i18n and L10n, are means of adapting computer software to different languages, regional peculiarities and technical requirements of a target locale.

<span class="mw-page-title-main">Unicode Consortium</span> Nonprofit organization that coordinates the development of the Unicode Standard

The Unicode Consortium is a 501(c)(3) non-profit organization incorporated and based in Mountain View, California. Its primary purpose is to maintain and publish the Unicode Standard which was developed with the intention of replacing existing character encoding schemes which are limited in size and scope, and are incompatible with multilingual environments. The consortium describes its overall purpose as:

...enabl[ing] people around the world to use computers in any language, by providing freely-available specifications and data to form the foundation for software internationalization in all major operating systems, search engines, applications, and the World Wide Web. An essential part of this purpose is to standardize, maintain, educate and engage academic and scientific communities, and the general public about, make publicly available, promote, and disseminate to the public a standard character encoding that provides for an allocation for more than a million characters.

Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames. Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphens, which is called the letter–digit–hyphen (LDH) subset. For example, München is encoded as Mnchen-3ya.

<span class="mw-page-title-main">Internationalized domain name</span> Type of Internet domain name

An internationalized domain name (IDN) is an Internet domain name that contains at least one label displayed in software applications, in whole or in part, in non-latin script or alphabet, such as Arabic, Bengali, Chinese, Cyrillic, Devanagari, Greek, Hebrew, Hindi, Tamil or Thai or in the Latin alphabet-based characters with diacritics or ligatures, such as French, German, Italian, Polish, Portuguese or Spanish. These writing systems are encoded by computers in multibyte Unicode. Internationalized domain names are stored in the Domain Name System (DNS) as ASCII strings using Punycode transcription.

<span class="mw-page-title-main">Emoji</span> Symbols often used as emotional cues in text

An emoji is a pictogram, logogram, ideogram or smiley embedded in text and used in electronic messages and web pages. The primary function of emoji is to fill in emotional cues otherwise missing from typed conversation. Examples of emoji are 😂, 😃, 🧘🏻‍♂️, 🌍, 🌦️, 🍞, 🚗, 📞, 🎉, ❤️, 🍆, 🍑 and 🏁. Emoji exist in various genres, including facial expressions, common objects, places and types of weather, and animals. They are much like emoticons, except emoji are pictures rather than typographic approximations; the term "emoji" in the strict sense refers to such pictures which can be represented as encoded characters, but it is sometimes applied to messaging stickers by extension. Originally meaning pictograph, the word emoji comes from Japanese e + moji; the resemblance to the English words emotion and emoticon is purely coincidental. The ISO 15924 script code for emoji is Zsye.

The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently byte-by-byte compared in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc.

ISO 15924, Codes for the representation of names of scripts, is an international standard defining codes for writing systems or scripts. Each script is given both a four-letter code and a numeric code.

International Components for Unicode (ICU) is an open-source project of mature C/C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++, and Java software. The ICU project is a technical committee of the Unicode Consortium and sponsored, supported, and used by IBM and many other companies.

WorldScript is the multilingual text rendering engine for Apple Macintosh's classic Mac OS, before Mac OS X was introduced.

The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating system will typically provide to applications.

In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page. For example, the common whitespace symbol U+0020 SPACE represents a blank space punctuation character in text, used as a word divider in Western scripts.

Globalize is a cross-platform JavaScript library for internationalization and localization that uses the Unicode Common Locale Data Repository (CLDR).

An IETF BCP 47 language tag is a standardized code or tag that is used to identify human languages in the Internet. The tag structure has been standardized by the Internet Engineering Task Force (IETF) in Best Current Practice (BCP) 47; the subtags are maintained by the IANA Language Subtag Registry.

The Unicode Standard assigns various properties to each Unicode character and code point.

The regional indicator symbols are a set of 26 alphabetic Unicode characters (A–Z) intended to be used to encode ISO 3166-1 alpha-2 two-letter country codes in a way that allows optional special treatment.

Tags is a Unicode block containing formatting tag characters. The block is designed to mirror ASCII. It was originally intended for language tags, but has now been repurposed as emoji modifiers, specifically for region flags.

<span class="mw-page-title-main">Roozbeh Pournader</span> Free software activist

Roozbeh Pournader(Persian: روزبه پورنادر, Persian pronunciation: [ɾuːz'beh puːɾ'nɑːdeɾ]) is a free software activist and expert on Unicode text encoding, text rendering, and fonts, especially for bidirectional text. He is originally from Iran, and is now living in the United States. After the establishment of the Persian Wikipedia, he became the first administrator of the project. He was a major assistant, participant and co-founder of the Persian Wikipedia.

References

  1. Luckerson, Victor (2016). "Meet the 63-Year-Old in Charge of Approving New Emojis". time.com. TIME.
  2. "Advisory Committee". unicode.org.
  3. 1 2 Wong, Queenie (2016-02-12). "Q&A: Mark Davis, president of the Unicode Consortium, on the rise of emojis". mercurynews.com. The Mercury News. Retrieved 2018-04-05.
  4. Mark Davis on Twitter OOjs UI icon edit-ltr-progressive.svg
  5. "Mark Davis - President, CLDR-TC Chair, & Emoji Subcommittee Chair at Unicode Consortium". THE ORG.
  6. "Board of Directors". unicode.org.
  7. DPA, German Press Agency- (January 1, 2018). "Mark Davis: The lesser known master of emojis". Daily Sabah.
  8. Davis, Mark Edward (1979). Formal problems for Utilitarianism. stanford.edu (PhD thesis). Stanford University. OCLC   917950786. ProQuest   302982299.
  9. Davis, M. E.; Grimes, J. D.; Knoles, D. J. (1996). "Creating global software: Text handling and localization in Taligent's CommonPoint application system". IBM Systems Journal. 35 (2): 227–243. doi:10.1147/sj.352.0227. ISSN   0018-8670.
  10. Davis, Mark (2020). "Mark Davis Conference Biography". macchiato.com.
  11. "CLDR Process - CLDR - Unicode Common Locale Data Repository". cldr.unicode.org.
  12. Treanor, Sarah; Nunis, Vivienne (2021). "Face palm: When the emoji you want doesn't exist". bbc.co.uk. London: BBC News.
  13. The Unicode Consortium (November 2006), The Unicode Standard, Version 5.0, Addison-Wesley Professional, ISBN   0-321-48091-0