Data Coding Scheme

Last updated

Data Coding Scheme is a one-octet field in Short Messages (SM) and Cell Broadcast Messages (CB) which carries a basic information how the recipient handset should process the received message. The information includes:

Contents

The field is described in 3GPP 23.040 and 3GPP 23.038 under the name TP-DCS.

Message character sets

A special 7-bit encoding called the GSM 7 bit default alphabet was designed for the Short Message System in GSM. The alphabet contains the most-often used symbols from most Western-European languages (and some Greek uppercase letters). Some ASCII characters and the Euro sign did not fit into the GSM 7-bit default alphabet and must be encoded using two septets. These characters form GSM 7 bit default alphabet extension table. Support of the GSM 7-bit alphabet is mandatory for GSM handsets and network elements. [1]

Languages which use Latin script, but use characters which are not present in the GSM 7-bit default alphabet, often replace missing characters with diacritic marks with corresponding characters without diacritics, which causes not entirely satisfactory user experience, but is often accepted. In order to include these missing characters the 16-bit UTF-16 (in GSM called UCS-2) encoding may be used at the price of reducing the length of a (non-segmented) message from 160 to 70 characters.

The messages in Chinese, Korean or Japanese languages must be encoded using the UTF-16 character encoding. The same was also true for other languages using non-Latin scripts like Russian, Arabic, Hebrew and various Indian languages. In 3GPP TS 23.038 8.0.0 published in 2008 a new feature, an extended National language shift table was introduced, which in the version 11.0.0 published in 2012 covers Turkish, Spanish, Portuguese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu and Urdu languages. The mechanism replaces GSM 7-bit default alphabet code table and/or extended table with a national table(s) according to special information elements in User Data Header. The non-segmented message using national language shift table(s) may carry up to 155 (or 153) 7-bit characters.

GSM recognizes only two encodings for text messages and one encoding for binary messages:

Message classes

The TP-DCS octet has a complex syntax to allow carrying of other information; the most notable are message classes:

Message Classes
ValueMessage Class
0 00 - Flash messages
0 11 - ME-specific
1 02 - SIM / USIM specific
1 13 - TE-specific

Flash messages are received by a mobile phone even though it has full memory. They are not stored in the phone, they just displayed on the phone display.

Other features

Automatic deletion after reading

The handset should delete any message received with a TP-DCS value falling to the "Message Marked for Automatic Deletion Coding Group" after user has read it.

Message waiting indication

Message Waiting Indication group of DCS values serves to set or reset flags indicating presence of unread voicemail, fax, e-mail or other messages.

Data compression

A special DCS value also allows message compression, but it perhaps is not used by any operator.

DCS values

SMS data coding scheme

The values of TP-DCS are defined in GSM recommendation 03.38. [1]

Coding Group: General Data Coding
DCS
hex dec
Character SetMessage ClassCompressedReserved Because
000GSM 7 bitDefault-
011GSM 7 bitDefault-Bits 1 and 0 have value 1 but no message class present
022GSM 7 bitDefault-Bits 1 and 0 have value 2 but no message class present
033GSM 7 bitDefault-Bits 1 and 0 have value 3 but no message class present
0448 bit dataDefault-
0558 bit dataDefault-Bits 1 and 0 have value 1 but no message class present
0668 bit dataDefault-Bits 1 and 0 have value 2 but no message class present
0778 bit dataDefault-Bits 1 and 0 have value 3 but no message class present
088UCS2Default-
099UCS2Default-Bits 1 and 0 have value 1 but no message class present
0A10UCS2Default-Bits 1 and 0 have value 2 but no message class present
0B11UCS2Default-Bits 1 and 0 have value 3 but no message class present
0C12(reserved)Default-Reserved character set
0D13(reserved)Default-Reserved character set

Bits 1 and 0 have value 1 but no message class present

0E14(reserved)Default-Reserved character set

Bits 1 and 0 have value 2 but no message class present

0F15(reserved)Default-Reserved character set

Bits 1 and 0 have value 3 but no message class present

1016GSM 7 bitClass 0 (Flash message)-
1117GSM 7 bitClass 1 (ME-specific)-
1218GSM 7 bitClass 2 (SIM/USIM-specific)-
1319GSM 7 bitClass 3 (TE-specific)-
14208 bit dataClass 0 (Flash message)-
15218 bit dataClass 1 (ME-specific)-
16228 bit dataClass 2 (SIM/USIM-specific)-
17238 bit dataClass 3 (TE-specific)-
1824UCS2Class 0 (Flash message)-
1925UCS2Class 1 (ME-specific)-
1A26UCS2Class 2 (SIM/USIM-specific)-
1B27UCS2Class 3 (TE-specific)-
1C28(reserved)Class 0 (Flash message)-Reserved character set
1D29(reserved)Class 1 (ME-specific)-Reserved character set
1E30(reserved)Class 2 (SIM/USIM-specific)-Reserved character set
1F31(reserved)Class 3 (TE-specific)-Reserved character set
2032GSM 7 bitDefault+
2133GSM 7 bitDefault+Bits 1 and 0 have value 1 but no message class present
2234GSM 7 bitDefault+Bits 1 and 0 have value 2 but no message class present
2335GSM 7 bitDefault+Bits 1 and 0 have value 3 but no message class present
24368 bit dataDefault+Compression set but Character set can't be compressed
25378 bit dataDefault+Compression set but Character set can't be compressed

Bits 1 and 0 have value 1 but no message class present

26388 bit dataDefault+Compression set but Character set can't be compressed

Bits 1 and 0 have value 2 but no message class present

27398 bit dataDefault+Compression set but Character set can't be compressed

Bits 1 and 0 have value 3 but no message class present

2840UCS2Default+Compression set but Character set can't be compressed
2941UCS2Default+Compression set but Character set can't be compressed

Bits 1 and 0 have value 1 but no message class present

2A42UCS2Default+Compression set but Character set can't be compressed

Bits 1 and 0 have value 2 but no message class present

2B43UCS2Default+Compression set but Character set can't be compressed

Bits 1 and 0 have value 3 but no message class present

2C44(reserved)Default+Reserved character set
2D45(reserved)Default+Reserved character set

Bits 1 and 0 have value 1 but no message class present

2E46(reserved)Default+Reserved character set

Bits 1 and 0 have value 1 but no message class present

2F47(reserved)Default+Reserved character set

Bits 1 and 0 have value 1 but no message class present

3048GSM 7 bitClass 0 (Flash message)+
3149GSM 7 bitClass 1 (ME-specific)+
3250GSM 7 bitClass 2 (SIM/USIM-specific)+
3351GSM 7 bitClass 3 (TE-specific)+
34528 bit dataClass 0 (Flash message)+Compression set but Character set can't be compressed
35538 bit dataClass 1 (ME-specific)+Compression set but Character set can't be compressed
36548 bit dataClass 2 (SIM/USIM-specific)+Compression set but Character set can't be compressed
37558 bit dataClass 3 (TE-specific)+Compression set but Character set can't be compressed
3856UCS2Class 0 (Flash message)+Compression set but Character set can't be compressed
3957UCS2Class 1 (ME-specific)+Compression set but Character set can't be compressed
3A58UCS2Class 2 (SIM/USIM-specific)+Compression set but Character set can't be compressed
3B59UCS2Class 3 (TE-specific)+Compression set but Character set can't be compressed
3C60(reserved)Class 0 (Flash message)+Reserved character set
3D61(reserved)Class 1 (ME-specific)+Reserved character set
3E62(reserved)Class 2 (SIM/USIM-specific)+Reserved character set
3F63(reserved)Class 3 (TE-specific)+Reserved character set
Coding Group: Message Marked for Automatic Deletion
DCS
hex dec
Character SetMessage ClassCompressedReserved Because
4064GSM 7 bitDefault-
4165GSM 7 bitDefault-Bits 1 and 0 have value 1 but no message class present
4266GSM 7 bitDefault-Bits 1 and 0 have value 2 but no message class present
4367GSM 7 bitDefault-Bits 1 and 0 have value 3 but no message class present
44688 bit dataDefault-
45698 bit dataDefault-Bits 1 and 0 have value 1 but no message class present
46708 bit dataDefault-Bits 1 and 0 have value 2 but no message class present
47718 bit dataDefault-Bits 1 and 0 have value 3 but no message class present
4872UCS2Default-
4973UCS2Default-Bits 1 and 0 have value 1 but no message class present
4A74UCS2Default-Bits 1 and 0 have value 2 but no message class present
4B75UCS2Default-Bits 1 and 0 have value 3 but no message class present
4C76(reserved)Default-Reserved character set
4D77(reserved)Default-Reserved character set

Bits 1 and 0 have value 1 but no message class present

4E78(reserved)Default-Reserved character set

Bits 1 and 0 have value 2 but no message class present

4F79(reserved)Default-Reserved character set

Bits 1 and 0 have value 3 but no message class present

5080GSM 7 bitClass 0 (Flash message)-
5181GSM 7 bitClass 1 (ME-specific)-
5282GSM 7 bitClass 2 (SIM/USIM-specific)-
5383GSM 7 bitClass 3 (TE-specific)-
54848 bit dataClass 0 (Flash message)-
55858 bit dataClass 1 (ME-specific)-
56868 bit dataClass 2 (SIM/USIM-specific)-
57878 bit dataClass 3 (TE-specific)-
5888UCS2Class 0 (Flash message)-
5989UCS2Class 1 (ME-specific)-
5A90UCS2Class 2 (SIM/USIM-specific)-
5B91UCS2Class 3 (TE-specific)-
5C92(reserved)Class 0 (Flash message)-Reserved character set
5D93(reserved)Class 1 (ME-specific)-Reserved character set
5E94(reserved)Class 2 (SIM/USIM-specific)-Reserved character set
5F95(reserved)Class 3 (TE-specific)-Reserved character set
6096GSM 7 bitDefault+
6197GSM 7 bitDefault+Bits 1 and 0 have value 1 but no message class present
6298GSM 7 bitDefault+Bits 1 and 0 have value 2 but no message class present
6399GSM 7 bitDefault+Bits 1 and 0 have value 3 but no message class present
641008 bit dataDefault+Compression set but Character set can't be compressed
651018 bit dataDefault+Compression set but Character set can't be compressed

Bits 1 and 0 have value 1 but no message class present

661028 bit dataDefault+Compression set but Character set can't be compressed

Bits 1 and 0 have value 2 but no message class present

671038 bit dataDefault+Compression set but Character set can't be compressed

Bits 1 and 0 have value 3 but no message class present

68104UCS2Default+Compression set but Character set can't be compressed
69105UCS2Default+Compression set but Character set can't be compressed

Bits 1 and 0 have value 1 but no message class present

6A106UCS2Default+Compression set but Character set can't be compressed

Bits 1 and 0 have value 2 but no message class present

6B107UCS2Default+Compression set but Character set can't be compressed

Bits 1 and 0 have value 3 but no message class present

6C108(reserved)Default+Reserved character set
6D109(reserved)Default+Reserved character set

Bits 1 and 0 have value 1 but no message class present

6E110(reserved)Default+Reserved character set

Bits 1 and 0 have value 2 but no message class present

6F111(reserved)Default+Reserved character set

Bits 1 and 0 have value 3 but no message class present

70112GSM 7 bitClass 0 (Flash message)+
71113GSM 7 bitClass 1 (ME-specific)+
72114GSM 7 bitClass 2 (SIM/USIM-specific)+
73115GSM 7 bitClass 3 (TE-specific)+
741168 bit dataClass 0 (Flash message)+Compression set but Character set can't be compressed
751178 bit dataClass 1 (ME-specific)+Compression set but Character set can't be compressed
761188 bit dataClass 2 (SIM/USIM-specific)+Compression set but Character set can't be compressed
771198 bit dataClass 3 (TE-specific)+Compression set but Character set can't be compressed
78120UCS2Class 0 (Flash message)+Compression set but Character set can't be compressed
79121UCS2Class 1 (ME-specific)+Compression set but Character set can't be compressed
7A122UCS2Class 2 (SIM/USIM-specific)+Compression set but Character set can't be compressed
7B123UCS2Class 3 (TE-specific)+Compression set but Character set can't be compressed
7C124(reserved)Class 0 (Flash message)+Reserved character set
7D125(reserved)Class 1 (ME-specific)+Reserved character set
7E126(reserved)Class 2 (SIM/USIM-specific)+Reserved character set
7F127(reserved)Class 3 (TE-specific)+Reserved character set
Coding Group: Reserved
DCS
hex dec
Character SetMessage ClassCompressedReserved Because
80128(not defined)Default-Reserved coding group
... up to ...
BF191(not defined)Default-Reserved coding group
Coding Group: Message Waiting Info: Discard Message
DCS
hex dec
Character SetMessage Waiting InformationCompressedReserved Because
C0192(not defined)Voicemail Inactive-
C1193(not defined)Fax Inactive-
C2194(not defined)E-mail Inactive-
C3195(not defined)Other Inactive-
C4196(not defined)Voicemail Inactive-Value of bit 2
C5197(not defined)Fax Inactive-Value of bit 2
C6198(not defined)E-mail Inactive-Value of bit 2
C7199(not defined)Other Inactive-Value of bit 2
C8200(not defined)Voicemail Active-
C9201(not defined)Fax Active-
CA202(not defined)E-mail Active-
CB203(not defined)Other Active-
CC204(not defined)Voicemail Active-Value of bit 2
CD205(not defined)Fax Active-Value of bit 2
CE206(not defined)E-mail Active-Value of bit 2
CF207(not defined)Other Active-Value of bit 2
Coding Group: Message Waiting Info: Store Message
DCS
hex dec
Character SetMessage Waiting InformationCompressedReserved Because
D0208GSM 7 bitVoicemail Inactive-
D1209GSM 7 bitFax Inactive-
D2210GSM 7 bitE-mail Inactive-
D3211GSM 7 bitOther Inactive-
D4212GSM 7 bitVoicemail Inactive-Value of bit 2
D5213GSM 7 bitFax Inactive-Value of bit 2
D6214GSM 7 bitE-mail Inactive-Value of bit 2
D7215GSM 7 bitOther Inactive-Value of bit 2
D8216GSM 7 bitVoicemail Active-
D9217GSM 7 bitFax Active-
DA218GSM 7 bitE-mail Active-
DB219GSM 7 bitOther Active-
DC220GSM 7 bitVoicemail Active-Value of bit 2
DD221GSM 7 bitFax Active-Value of bit 2
DE222GSM 7 bitE-mail Active-Value of bit 2
DF223GSM 7 bitOther Active-Value of bit 2
E0224UCS2Voicemail Inactive-
E1225UCS2Fax Inactive-
E2226UCS2E-mail Inactive-
E3227UCS2Other Inactive-
E4228UCS2Voicemail Inactive-Value of bit 2
E5229UCS2Fax Inactive-Value of bit 2
E6230UCS2E-mail Inactive-Value of bit 2
E7231UCS2Other Inactive-Value of bit 2
E8232UCS2Voicemail Active-
E9233UCS2Fax Active-
EA234UCS2E-mail Active-
EB235UCS2Other Active-
EC236UCS2Voicemail Active-Value of bit 2
ED237UCS2Fax Active-Value of bit 2
EE238UCS2E-mail Active-Value of bit 2
EF239UCS2Other Active-Value of bit 2
Coding Group: Data Coding/Message Class
DCS
hex dec
Character SetMessage ClassCompressedReserved Because
F0240GSM 7 bitClass 0 (Flash message)-
F1241GSM 7 bitClass 1 (ME-specific)-
F2242GSM 7 bitClass 2 (SIM/USIM-specific)-
F3243GSM 7 bitClass 3 (TE-specific)-
F42448 bit dataClass 0 (Flash message)-
F52458 bit dataClass 1 (ME-specific)-
F62468 bit dataClass 2 (SIM/USIM-specific)-
F72478 bit dataClass 3 (TE-specific)-
F8248GSM 7 bitClass 0 (Flash message)-Value of bit 3
F9249GSM 7 bitClass 1 (ME-specific)-Value of bit 3
FA250GSM 7 bitClass 2 (SIM/USIM-specific)-Value of bit 3
FB251GSM 7 bitClass 3 (TE-specific)-Value of bit 3
FC2528 bit dataClass 0 (Flash message)-Value of bit 3
FD2538 bit dataClass 1 (ME-specific)-Value of bit 3
FE2548 bit dataClass 2 (SIM/USIM-specific)-Value of bit 3
FF2558 bit dataClass 3 (TE-specific)-Value of bit 3

iDEN mobile standard uses values F716 and F816 in a special way.

CBS data coding scheme

For the DCS values in Cell Broadcast Messages see GSM recommendation 03.38. [1]

See also

Related Research Articles

<span class="mw-page-title-main">Character encoding</span> Using numbers to represent text characters

Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".

Extended Binary Coded Decimal Interchange Code is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Unisys MCP and ICL VME.

<span class="mw-page-title-main">GSM</span> Cellular telephone network standard

The Global System for Mobile Communications (GSM) is a standard developed by the European Telecommunications Standards Institute (ETSI) to describe the protocols for second-generation (2G) digital cellular networks used by mobile devices such as mobile phones and tablets. GSM is also a trade mark owned by the GSM Association. GSM may also refer to the Full Rate voice codec.

<span class="mw-page-title-main">SMS</span> Text messaging service component

Short Message/Messaging Service, commonly abbreviated as SMS, is a text messaging service component of most telephone, Internet and mobile device systems. It uses standardized communication protocols that let mobile devices exchange short text messages. An intermediary service can facilitate a text-to-voice conversion to be sent to landlines.

Short Message Peer-to-Peer (SMPP) in the telecommunications industry is an open, industry standard protocol designed to provide a flexible data communication interface for the transfer of short message data between External Short Messaging Entities (ESMEs), Routing Entities (REs) and SMSC.

Web pages authored using HyperText Markup Language (HTML) may contain multilingual text represented with the Unicode universal character set. Key to the relationship between Unicode and HTML is the relationship between the "document character set", which defines the set of characters that may be present in a HTML document and assigns numbers to them, and the "external character encoding", or "charset", used to encode a given document as a sequence of bytes.

UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from UnicodeTransformation Format – 8-bit.

<span class="mw-page-title-main">UTF-16</span> Variable-width encoding of Unicode, using one or two 16-bit code units

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding, now known as UCS-2 (for 2-byte Universal Character Set), once it became clear that more than 216 (65,536) code points were needed.

UTF-32 (32-bit Unicode Transformation Format) is a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes) per code point (but a number of leading bits must be zero as there are far fewer than 232 Unicode code points, needing actually only 21 bits). UTF-32 is a fixed-length encoding, in contrast to all other Unicode transformation formats, which are variable-length encodings. Each 32-bit value in UTF-32 represents one Unicode code point and is exactly equal to that code point's numerical value.

In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data in sequences of 24 bits that can be represented by four 6-bit Base64 digits.

External Machine Interface (EMI), an extension to Universal Computer Protocol (UCP), is a protocol primarily used to connect to short message service centres (SMSCs) for mobile telephones. The protocol was developed by CMG Wireless Data Solutions, now part of Mavenir.

<span class="mw-page-title-main">Unstructured Supplementary Service Data</span> Communications protocol

Unstructured Supplementary Service Data (USSD), sometimes referred to as "quick codes" or "feature codes", is a communications protocol used by GSM cellular telephones to communicate with the mobile network operator's computers. USSD can be used for WAP browsing, prepaid callback service, mobile-money services, location-based content services, menu-based information services, and as part of configuring the phone on the network.

GPRS Tunnelling Protocol (GTP) is a group of IP-based communications protocols used to carry general packet radio service (GPRS) within GSM, UMTS, LTE and 5G NR radio networks. In 3GPP architectures, GTP and Proxy Mobile IPv6 based interfaces are specified on various interface points.

This article compares Unicode encodings. Two situations are considered: 8-bit-clean environments, and environments that forbid use of byte values that have the high bit set. Originally such prohibitions were to allow for links that used only seven data bits, but they remain in some standards and so some standard-conforming software must generate messages that comply with the restrictions. Standard Compression Scheme for Unicode and Binary Ordered Compression for Unicode are excluded from the comparison tables because it is difficult to simply quantify their size.

<span class="mw-page-title-main">Cell Broadcast</span> Method of sending messages to multiple mobile telephone users in a defined area at the same time

Cell Broadcast (CB) is a method of sending messages to multiple mobile telephone users in a defined area at the same time. It is defined by the ETSI’s GSM committee and 3GPP and is part of the 2G, 3G, 4G LTE (telecommunication) and 5G standards. It is also known as Short Message Service-Cell Broadcast (SMS-CB) or CB SMS.

In the cellular phone industry, mobile phones and their networks sometimes support concatenated short message service to overcome the limitation on the number of characters that can be sent in a single SMS text message transmission. Using this method, long messages are split into smaller messages by the sending device and recombined at the receiving end. Each message is then billed separately. When the feature works properly, it is nearly transparent to the user, appearing as a single long text message. Previously, due to incompatibilities between providers and lack of support in some phone models, there was not widespread use of this feature.

The Universal Coded Character Set is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.

In mobile telephony GSM 03.38 or 3GPP 23.038 is a character encoding used in GSM networks for SMS, CB and USSD. The 3GPP TS 23.038 standard defines GSM 7-bit default alphabet which is mandatory for GSM handsets and network elements, but the character set is suitable only for English and a number of Western-European languages. Languages such as Chinese, Korean or Japanese must be transferred using the 16-bit UCS-2 character encoding. A limited number of languages, like Portuguese, Spanish, Turkish and a number of languages used in India written with a Brahmic scripts may use 7-bit encoding with national language shift table defined in 3GPP 23.038. For binary messages, 8-bit encoding is used.

User Data Header (UDH) is a binary structure which may be present at the start of a short message in the Short Message Service in GSM. It does not contain any text, but it specifies how the message should be formatted and processed.

GSM 03.40 or 3GPP TS 23.040 is a mobile telephony standard describing the format of the Transfer Protocol Data Units (TPDU) of the Short Message Transfer Protocol (SM-TP) used in the GSM networks to carry Short Messages. This format is used throughout the whole transfer of the message in the GSM mobile network. In contrast, application servers use different protocols, like Short Message Peer-to-Peer or Universal Computer Protocol, to exchange messages between them and the Short Message Service Center (SMSC).

References

  1. 1 2 3 3GPP TS 23.038, Alphabets and language-specific information.