Korean language and computers

Last updated

South Korean standard Dubeolsik
('two-set type') layout KB South Korea.svg
South Korean standard Dubeolsik ('two-set type') layout
North Korean Dubeolsik
layout KB North Korea.svg
North Korean Dubeolsik layout

The writing system of the Korean language is a syllabic alphabet of character parts (jamo) organized into character blocks (geulja) representing syllables. The character parts cannot be written from left to right on the computer, as in many Western languages. Every possible syllable in Korean would have to be rendered as syllable blocks by a font, or each character part would have to be encoded separately. Unicode has both options; the character parts (h) and (a), and the combined syllable (ha), are encoded.

Contents

Character encoding

In RFC 1557, a method known as ISO-2022-KR for seven-bit encoding of Korean characters in email was described. Where eight bits are allowed, EUC-KR encoding is preferred. These two encodings combine US-ASCII (ISO 646) with the Korean standard KS X 1001:1992 [1] (previously named KS C 5601:1987). Another character set, KPS 9566 (similar to KS X 1001), is used in North Korea.

The international Unicode standard contains special characters for the Korean language in the Hangul phonetic system. Unicode supports two methods. The method used by Microsoft Windows is to have each of the 11,172 syllable combinations as code and a preformed font character. The other method encodes letters ( jamos ) and lets the software combine them correctly. The Windows method requires more font memory but allows better shapes, since it is complicated to create stylistically correct combinations (preferable for documents).

Another possibility is stacking a sequence of medial(s) (jungseong) and a sequence of final(s) (jongseong) or a Middle Korean pitch mark (if needed) on top of the sequence of initial(s) (choseong) if the font has medial and final jamos with zero-width spacing inserted to the left of the cursor or caret, thus appearing in the right place below (or to the right of) the initial. If a syllable has a horizontal medial (, , , or ), the initial will probably appear further left in a complete syllable than in preformed syllables due to the space that must be reserved for a vertical medial, making aesthetically poor what may be the only way to display Middle Korean hangul text without resorting to images, romanization, replacement of obsolete jamo or non-standard encodings. However, most current fonts do not support this.

The Unicode standard also has attempted to create a unified CJK character set which can represent Chinese (Hanzi) and the Japanese (Kanji) and Korean (Hanja) derivatives of this script through Han unification, which does not discriminate by language or region in rendering Chinese characters if the typographic traditions have not resulted in major differences in what a character looks like. Han unification has been criticized.

Text input

South Korean Dubeolsik typing example Typing issseubnida in Dubeolsik keyboard layout.gif
South Korean Dubeolsik typing example

On a Korean computer keyboard, text is typically entered by pressing a key for the appropriate jamo; the operating system creates each composite character on the fly. Depending on the Input method editor and keyboard layout, double consonants can be entered by holding the shift button. When all jamo making up a syllabic block has been entered, the user may initiate a conversion to hanja (or other special characters) using a keyboard shortcut or interface button; South Korean keyboards have a key for this. Subsequent semi-automated hanja conversion is supported in varying degrees by word processors.

When using a keyboard with another language, most operating systems require the user to type with an original Korean keyboard layout; the most common is Dubeolsik. In other languages, such as Japanese, text can be entered on non-native keyboards with romanization.

Operating systems such as Linux allow engine/hangul/hangul-keyboard='ro, resulting in a romaja keyboard; typing "seonggye" results in 성계. [2] In this configuration, ㄲ is obtained by "gg" rather than ⇧ Shift+G. This allows keying "jasanGun" to obtain 자산군, instead of keying "jasangun" (which would provide 자상운).

Korean typewriters

Before Korean division

Korean text input is related to Korean typewriters (타자기) before computers. The first Korean typewriter is unclear; according to Jang Bong Seon, Horace Grant Underwood made a Korean typewriter during the first decade of the 20th century. [3] Lee Won Ik, living in the United States, has been credited with developing the first Korean typewriter in 1914. [4] [5] In 1927, Song Ki Joo invented the first Dubeolsik typewriter in Chicago; however, it no longer exists. Song's 1934 typewriter is stored in the Hangul museum as the oldest existing Korean typewriter. [6] The invention of the typewriter led to the development of other typewriters in 1945 by Kim Joon Sung and 1950 by Kong Byung Woo. [7]

After division

South Korea originally had a Nebeolsik standard, but Dubeolsik became standard in 1985. [8]

Hanja

Some Korean fonts do not include hanja, and word processors do not allow a user to specify which font to use as a fallback for any hanja in a text; each hanja sequence must be manually formatted for a desired font.

Pitch marks and vertical text

Vertical text is supported poorly (or not at all) by HTML and most word processors. This is not an issue for modern Korean, which is usually written horizontally; until the second half of the 20th century, however, Korean was often written vertically. Fifteenth-century texts written in hangul had pitch marks to the left of syllables which are included in Unicode, although current fonts do not support them.

Programs

Programs designed for Korean language-related use include:

Hangul in Unicode

Hangul jamo characters in Unicode Hangul jamo characters in Unicode.svg
Hangul jamo characters in Unicode
Unicode hangul compatibility jamo block Hangul Compatibility Jamo block in Unicode.svg
Unicode hangul compatibility jamo block

Hangul letters are detailed in several parts of Unicode:

Hangul syllables block

Pre-composed hangul syllables in the Unicode hangul syllables block are algorithmically defined with the following formula:

[(initial) × 588 + (medial) × 28 + (final)] + 44032

To find the code point of "한" in Unicode:

Substituting these values in the formula above yields [(18 × 588) + (0 × 28) + 4] + 44032 = 54620. The Unicode value of 한 is 54620 in decimal, 한 in numeric character reference, and U+D55C in hexadecimal Unicode notation.

How to code this in Rust

With the below module, calling e.g. hangul::from_jamo('ㅎ','ㅏ',Some('ㄴ')) will return Some('한').

modhangul{constINITIAL_JAMO: [char;19]=['ㄱ','ㄲ','ㄴ','ㄷ','ㄸ','ㄹ','ㅁ','ㅂ','ㅃ','ㅅ','ㅆ','ㅇ','ㅈ','ㅉ','ㅊ','ㅋ','ㅌ','ㅍ','ㅎ',];constVOWEL_JAMO: [char;21]=['ㅏ','ㅐ','ㅑ','ㅒ','ㅓ','ㅔ','ㅕ','ㅖ','ㅗ','ㅘ','ㅙ','ㅚ','ㅛ','ㅜ','ㅝ','ㅞ','ㅟ','ㅠ','ㅡ','ㅢ','ㅣ',];constFINAL_JAMO: [Option<char>;28]=[None,Some('ㄱ'),Some('ㄲ'),Some('ㄳ'),Some('ㄴ'),Some('ㄵ'),Some('ㄶ'),Some('ㄷ'),Some('ㄹ'),Some('ㄺ'),Some('ㄻ'),Some('ㄼ'),Some('ㄽ'),Some('ㄾ'),Some('ㄿ'),Some('ㅀ'),Some('ㅁ'),Some('ㅂ'),Some('ㅄ'),Some('ㅅ'),Some('ㅆ'),Some('ㅇ'),Some('ㅈ'),Some('ㅊ'),Some('ㅋ'),Some('ㅌ'),Some('ㅍ'),Some('ㅎ'),];constGA_LOCATION: u32='가'asu32;// = 44_032pubfnfrom_jamo(initial: char,medial: char,last: Option<char>)-> Option<char>{if!(self::INITIAL_JAMO.contains(&initial)&&self::VOWEL_JAMO.contains(&medial)&&self::FINAL_JAMO.contains(&last)){returnNone;}char::from_u32(self::GA_LOCATION+588*(INITIAL_JAMO.iter().position(|&c|c==initial)?asu32)+28*(VOWEL_JAMO.iter().position(|&c|c==medial)?asu32)+FINAL_JAMO.iter().position(|&c|c==last)?asu32)}}

Hangul Compatibility Jamo block

The Unicode Hangul Compatibility Jamo block has been allocated for compatibility with the KS X 1001 character set. It is usually used to represent hangul without distinguishing initials and finals.

Hangul Jamo blocks

The Hangul Jamo, Hangul Jamo Extended-A and Hangul Jamo Extended-B blocks contain initial, medial and final jamo, including obsolete jamo.

Hanyang Private Use Area code

Hangul (word processor) shipped with fonts from Hanyang Information and Communication, which map obsolete hangul characters with Unicode's Private Use Areas. Despite the use of PUAs instead of dedicated code points, Hanyang's mapping was the most popular way to represent obsolete hangul in South Korea in 2007. With its Hangul 2010, however, Hancom deprecated Hanyang PUA code and began representing obsolete hangul characters with Unicode hangul jamo.

See also

Related Research Articles

An interpunct⟨·⟩, also known as an interpoint, middle dot, middot, centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in Classical Latin. It appears in a variety of uses in some modern languages and is present in Unicode as U+00B7·MIDDLE DOT.

<span class="mw-page-title-main">McCune–Reischauer</span> Korean language romanization system

McCune–Reischauer romanization is one of the two most widely used Korean-language romanization systems. It was created in 1937 and the ALA-LC variant based on it is currently used for standard romanization library catalogs in North America.

The following tables of consonants and vowels (jamo) of the Korean alphabet (Hangul) display the basic forms in the first row and their derivatives in the following row(s). They are divided into initials, vowels (middle), and finals tables.

<span class="mw-page-title-main">Revised Romanization of Korean</span> Korean language romanization system

Revised Romanization of Korean is the official Korean language romanization system in South Korea. It was developed by the National Academy of the Korean Language from 1995 and was released to the public on 7 July 2000 by South Korea's Ministry of Culture and Tourism in Proclamation No. 2000-8.

Gari Keith Ledyard was an American scholar who was Sejong Professor of Korean History Emeritus at Columbia University. He is best known for his work on the history of the Hangul alphabet.

New Gulim (새굴림/SaeGulRim) is a sans-serif type Unicode font designed especially for the Korean-language script, designed by HanYang System Co., Limited. It is an expanded version of Hanyang Gulrim.

Koryo-mar is a dialect of Korean spoken by Koryo-saram, ethnic Koreans who live in the countries of the former Soviet Union. It is descended from the Hamgyŏng dialect and multiple other varieties of Northeastern Korean. Koryo-mar is often reported as difficult to understand by speakers of standard Korean; this may be compounded by the fact that the majority of Koryo-saram today use Russian and not Korean as their first language.

This article is a technical description of the phonetics and phonology of Korean. Unless otherwise noted, statements in this article refer to South Korean standard language based on the Seoul dialect.

<span class="mw-page-title-main">New Korean Orthography</span> 1948–1954 orthography of the Korean language

The New Korean Orthography was a spelling reform used in North Korea from 1948 to 1954. It added five consonants and one vowel letter to the Hangul alphabet, supposedly making it a more morphophonologically "clear" approach to the Korean language.

Hangul (Korean: 한글) is the native script of Korea. It was created in the mid fifteenth century by King Sejong, as both a complement and an alternative to the logographic Sino-Korean Hanja. Initially denounced by the educated class as eonmun, it only became the primary Korean script following independence from Japan in the mid-20th century.

KPS 9566 is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.

KS X 1001, "Code for Information Interchange ", formerly called KS C 5601, is a South Korean coded character set standard to represent hangul and hanja characters on a computer.

Kieuk is a consonant of the Korean hangul alphabet. The Unicode for ㅋ is U+314B. It is pronounced aspirated, as at the beginning of a syllable and as at the end of a syllable. For example: 코 ko ("nose") is pronounced [kho], while 부엌 bueok ("kitchen") is pronounced [puʌk].

Hieut is a consonant letter (jamo) of the Korean Hangeul alphabet. The Unicode for ㅎ is U+314E. It has two pronunciation forms, [h] at the beginning of a syllable and [t̚] at the end of a syllable. After vowels or the consonant ㄴ it is semi-silent.

Hangul Syllables is a Unicode block containing precomposed Hangul syllable blocks for modern Korean. The syllables can be directly mapped by algorithm to sequences of two or three characters in the Hangul Jamo Unicode block:

<span class="mw-page-title-main">Yale romanization of Korean</span> Linguistic romanization scheme for Korean

The Yale romanization of Korean was developed by Samuel Elmo Martin and his colleagues at Yale University about half a decade after McCune–Reischauer. It is the standard romanization of the Korean language in linguistics.

<span class="mw-page-title-main">Hangul</span> Native alphabet of the Korean language

The Korean alphabet, known as Hangul in South Korea and Chosŏn'gŭl (조선글) in North Korea, is the modern official writing system for the Korean language. The letters for the five basic consonants reflect the shape of the speech organs used to pronounce them, and they are systematically modified to indicate phonetic features; similarly, the vowel letters are systematically modified for related sounds, making Hangul a featural writing system. It has been described as a syllabic alphabet as it combines the features of alphabetic and syllabic writing systems.

GB 12052-89, entitled Korean character coded character set for information interchange, is a Korean-language character set standard established by China. It consists of a total of 5,979 characters, and has no relationship nor compatibility with South Korea's KS X 1001 and North Korea's KPS 9566.

KS X 1002 is a South Korean character set standard established in order to supplement KS X 1001. It consists of a total of 7,649 characters.

References

  1. "KS X 1001:1992" (PDF).
  2. "Libhangul/Ibus-hangul". GitHub . May 29, 2021.
  3. 장, 봉선 (1989). 한글풀어쓰기교본. 한풀문화사(Hanpul). p. 84.
  4. "이원익 타자기". scienceall.com. December 7, 2012.
  5. "정보화 시대 이전, 타자기가 있었다<한글 타자기 전성시대>". Hangul museum.
  6. "[역사특집] 한국교회사에서 건진 근대문화유산들, 등록문화재로 새롭게 지정". Christian newspaper. February 27, 2020.
  7. "最古 한글타자기, 한글박물관서 본다". Yonhap News Agency. October 8, 2014.
  8. "한글 타자 자판표준화 등 한글 기계화(1969년)". theme.archives.go.kr.
  9. 1 2 김, 치관 (December 2, 2000). 문답으로 보는 북한 정보화의 현주소. Tongilnews.com (in Korean). Retrieved December 3, 2006.
  10. 1 2 3 김, 효석 (December 2, 2000). "<국회자료집> 북한 S/W 현황과 시연자료". Tongilnews.com (in Korean). Retrieved December 3, 2006.
  11. Yonhap (January 7, 1998). 북한의 컴퓨터산업 어디까지 왔나. Tongilnews.com (in Korean). Retrieved December 3, 2006.[ dead link ]
  12. "북한용어사전: 평양정보센터(PIC)" (in Korean). Archived from the original on September 28, 2007. Retrieved December 3, 2006.