Data Coding Scheme is a one-octet field in Short Messages (SM) and Cell Broadcast Messages (CB) which carries a basic information how the recipient handset should process the received message. The information includes:
The field is described in 3GPP 23.040 and 3GPP 23.038 under the name TP-DCS.
A special 7-bit encoding called the GSM 7 bit default alphabet was designed for the Short Message System in GSM. The alphabet contains the most-often used symbols from most Western-European languages (and some Greek uppercase letters). Some ASCII characters and the Euro sign did not fit into the GSM 7-bit default alphabet and must be encoded using two septets. These characters form GSM 7 bit default alphabet extension table. Support of the GSM 7-bit alphabet is mandatory for GSM handsets and network elements. [1]
Languages which use Latin script, but use characters which are not present in the GSM 7-bit default alphabet, often replace missing characters with diacritic marks with corresponding characters without diacritics, which causes not entirely satisfactory user experience, but is often accepted. In order to include these missing characters the 16-bit UTF-16 (in GSM called UCS-2) encoding may be used at the price of reducing the length of a (non-segmented) message from 160 to 70 characters.
The messages in Chinese, Korean or Japanese languages must be encoded using the UTF-16 character encoding. The same was also true for other languages using non-Latin scripts like Russian, Arabic, Hebrew and various Indian languages. In 3GPP TS 23.038 8.0.0 published in 2008 a new feature, an extended National language shift table was introduced, which in the version 11.0.0 published in 2012 covers Turkish, Spanish, Portuguese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu and Urdu languages. The mechanism replaces GSM 7-bit default alphabet code table and/or extended table with a national table(s) according to special information elements in User Data Header. The non-segmented message using national language shift table(s) may carry up to 155 (or 153) 7-bit characters.
GSM recognizes only two encodings for text messages and one encoding for binary messages:
The TP-DCS octet has a complex syntax to allow carrying of other information; the most notable are message classes:
Value | Message Class |
---|---|
0 0 | 0 - Flash messages |
0 1 | 1 - ME-specific |
1 0 | 2 - SIM / USIM specific |
1 1 | 3 - TE-specific |
Flash messages are received by a mobile phone even though it has full memory. They are not stored in the phone, they just displayed on the phone display.
The handset should delete any message received with a TP-DCS value falling to the "Message Marked for Automatic Deletion Coding Group" after user has read it.
Message Waiting Indication group of DCS values serves to set or reset flags indicating presence of unread voicemail, fax, e-mail or other messages.
A special DCS value also allows message compression, but it perhaps is not used by any operator.
The values of TP-DCS are defined in GSM recommendation 03.38. [1]
Coding Group: General Data Coding | |||||
---|---|---|---|---|---|
DCS hex dec | Character Set | Message Class | Compressed | Reserved Because | |
00 | 0 | GSM 7 bit | Default | - | |
01 | 1 | GSM 7 bit | Default | - | Bits 1 and 0 have value 1 but no message class present |
02 | 2 | GSM 7 bit | Default | - | Bits 1 and 0 have value 2 but no message class present |
03 | 3 | GSM 7 bit | Default | - | Bits 1 and 0 have value 3 but no message class present |
04 | 4 | 8 bit data | Default | - | |
05 | 5 | 8 bit data | Default | - | Bits 1 and 0 have value 1 but no message class present |
06 | 6 | 8 bit data | Default | - | Bits 1 and 0 have value 2 but no message class present |
07 | 7 | 8 bit data | Default | - | Bits 1 and 0 have value 3 but no message class present |
08 | 8 | UCS2 | Default | - | |
09 | 9 | UCS2 | Default | - | Bits 1 and 0 have value 1 but no message class present |
0A | 10 | UCS2 | Default | - | Bits 1 and 0 have value 2 but no message class present |
0B | 11 | UCS2 | Default | - | Bits 1 and 0 have value 3 but no message class present |
0C | 12 | (reserved) | Default | - | Reserved character set |
0D | 13 | (reserved) | Default | - | Reserved character set Bits 1 and 0 have value 1 but no message class present |
0E | 14 | (reserved) | Default | - | Reserved character set Bits 1 and 0 have value 2 but no message class present |
0F | 15 | (reserved) | Default | - | Reserved character set Bits 1 and 0 have value 3 but no message class present |
10 | 16 | GSM 7 bit | Class 0 (Flash message) | - | |
11 | 17 | GSM 7 bit | Class 1 (ME-specific) | - | |
12 | 18 | GSM 7 bit | Class 2 (SIM/USIM-specific) | - | |
13 | 19 | GSM 7 bit | Class 3 (TE-specific) | - | |
14 | 20 | 8 bit data | Class 0 (Flash message) | - | |
15 | 21 | 8 bit data | Class 1 (ME-specific) | - | |
16 | 22 | 8 bit data | Class 2 (SIM/USIM-specific) | - | |
17 | 23 | 8 bit data | Class 3 (TE-specific) | - | |
18 | 24 | UCS2 | Class 0 (Flash message) | - | |
19 | 25 | UCS2 | Class 1 (ME-specific) | - | |
1A | 26 | UCS2 | Class 2 (SIM/USIM-specific) | - | |
1B | 27 | UCS2 | Class 3 (TE-specific) | - | |
1C | 28 | (reserved) | Class 0 (Flash message) | - | Reserved character set |
1D | 29 | (reserved) | Class 1 (ME-specific) | - | Reserved character set |
1E | 30 | (reserved) | Class 2 (SIM/USIM-specific) | - | Reserved character set |
1F | 31 | (reserved) | Class 3 (TE-specific) | - | Reserved character set |
20 | 32 | GSM 7 bit | Default | + | |
21 | 33 | GSM 7 bit | Default | + | Bits 1 and 0 have value 1 but no message class present |
22 | 34 | GSM 7 bit | Default | + | Bits 1 and 0 have value 2 but no message class present |
23 | 35 | GSM 7 bit | Default | + | Bits 1 and 0 have value 3 but no message class present |
24 | 36 | 8 bit data | Default | + | Compression set but Character set can't be compressed |
25 | 37 | 8 bit data | Default | + | Compression set but Character set can't be compressed Bits 1 and 0 have value 1 but no message class present |
26 | 38 | 8 bit data | Default | + | Compression set but Character set can't be compressed Bits 1 and 0 have value 2 but no message class present |
27 | 39 | 8 bit data | Default | + | Compression set but Character set can't be compressed Bits 1 and 0 have value 3 but no message class present |
28 | 40 | UCS2 | Default | + | Compression set but Character set can't be compressed |
29 | 41 | UCS2 | Default | + | Compression set but Character set can't be compressed Bits 1 and 0 have value 1 but no message class present |
2A | 42 | UCS2 | Default | + | Compression set but Character set can't be compressed Bits 1 and 0 have value 2 but no message class present |
2B | 43 | UCS2 | Default | + | Compression set but Character set can't be compressed Bits 1 and 0 have value 3 but no message class present |
2C | 44 | (reserved) | Default | + | Reserved character set |
2D | 45 | (reserved) | Default | + | Reserved character set Bits 1 and 0 have value 1 but no message class present |
2E | 46 | (reserved) | Default | + | Reserved character set Bits 1 and 0 have value 1 but no message class present |
2F | 47 | (reserved) | Default | + | Reserved character set Bits 1 and 0 have value 1 but no message class present |
30 | 48 | GSM 7 bit | Class 0 (Flash message) | + | |
31 | 49 | GSM 7 bit | Class 1 (ME-specific) | + | |
32 | 50 | GSM 7 bit | Class 2 (SIM/USIM-specific) | + | |
33 | 51 | GSM 7 bit | Class 3 (TE-specific) | + | |
34 | 52 | 8 bit data | Class 0 (Flash message) | + | Compression set but Character set can't be compressed |
35 | 53 | 8 bit data | Class 1 (ME-specific) | + | Compression set but Character set can't be compressed |
36 | 54 | 8 bit data | Class 2 (SIM/USIM-specific) | + | Compression set but Character set can't be compressed |
37 | 55 | 8 bit data | Class 3 (TE-specific) | + | Compression set but Character set can't be compressed |
38 | 56 | UCS2 | Class 0 (Flash message) | + | Compression set but Character set can't be compressed |
39 | 57 | UCS2 | Class 1 (ME-specific) | + | Compression set but Character set can't be compressed |
3A | 58 | UCS2 | Class 2 (SIM/USIM-specific) | + | Compression set but Character set can't be compressed |
3B | 59 | UCS2 | Class 3 (TE-specific) | + | Compression set but Character set can't be compressed |
3C | 60 | (reserved) | Class 0 (Flash message) | + | Reserved character set |
3D | 61 | (reserved) | Class 1 (ME-specific) | + | Reserved character set |
3E | 62 | (reserved) | Class 2 (SIM/USIM-specific) | + | Reserved character set |
3F | 63 | (reserved) | Class 3 (TE-specific) | + | Reserved character set |
Coding Group: Message Marked for Automatic Deletion | |||||
DCS hex dec | Character Set | Message Class | Compressed | Reserved Because | |
40 | 64 | GSM 7 bit | Default | - | |
41 | 65 | GSM 7 bit | Default | - | Bits 1 and 0 have value 1 but no message class present |
42 | 66 | GSM 7 bit | Default | - | Bits 1 and 0 have value 2 but no message class present |
43 | 67 | GSM 7 bit | Default | - | Bits 1 and 0 have value 3 but no message class present |
44 | 68 | 8 bit data | Default | - | |
45 | 69 | 8 bit data | Default | - | Bits 1 and 0 have value 1 but no message class present |
46 | 70 | 8 bit data | Default | - | Bits 1 and 0 have value 2 but no message class present |
47 | 71 | 8 bit data | Default | - | Bits 1 and 0 have value 3 but no message class present |
48 | 72 | UCS2 | Default | - | |
49 | 73 | UCS2 | Default | - | Bits 1 and 0 have value 1 but no message class present |
4A | 74 | UCS2 | Default | - | Bits 1 and 0 have value 2 but no message class present |
4B | 75 | UCS2 | Default | - | Bits 1 and 0 have value 3 but no message class present |
4C | 76 | (reserved) | Default | - | Reserved character set |
4D | 77 | (reserved) | Default | - | Reserved character set Bits 1 and 0 have value 1 but no message class present |
4E | 78 | (reserved) | Default | - | Reserved character set Bits 1 and 0 have value 2 but no message class present |
4F | 79 | (reserved) | Default | - | Reserved character set Bits 1 and 0 have value 3 but no message class present |
50 | 80 | GSM 7 bit | Class 0 (Flash message) | - | |
51 | 81 | GSM 7 bit | Class 1 (ME-specific) | - | |
52 | 82 | GSM 7 bit | Class 2 (SIM/USIM-specific) | - | |
53 | 83 | GSM 7 bit | Class 3 (TE-specific) | - | |
54 | 84 | 8 bit data | Class 0 (Flash message) | - | |
55 | 85 | 8 bit data | Class 1 (ME-specific) | - | |
56 | 86 | 8 bit data | Class 2 (SIM/USIM-specific) | - | |
57 | 87 | 8 bit data | Class 3 (TE-specific) | - | |
58 | 88 | UCS2 | Class 0 (Flash message) | - | |
59 | 89 | UCS2 | Class 1 (ME-specific) | - | |
5A | 90 | UCS2 | Class 2 (SIM/USIM-specific) | - | |
5B | 91 | UCS2 | Class 3 (TE-specific) | - | |
5C | 92 | (reserved) | Class 0 (Flash message) | - | Reserved character set |
5D | 93 | (reserved) | Class 1 (ME-specific) | - | Reserved character set |
5E | 94 | (reserved) | Class 2 (SIM/USIM-specific) | - | Reserved character set |
5F | 95 | (reserved) | Class 3 (TE-specific) | - | Reserved character set |
60 | 96 | GSM 7 bit | Default | + | |
61 | 97 | GSM 7 bit | Default | + | Bits 1 and 0 have value 1 but no message class present |
62 | 98 | GSM 7 bit | Default | + | Bits 1 and 0 have value 2 but no message class present |
63 | 99 | GSM 7 bit | Default | + | Bits 1 and 0 have value 3 but no message class present |
64 | 100 | 8 bit data | Default | + | Compression set but Character set can't be compressed |
65 | 101 | 8 bit data | Default | + | Compression set but Character set can't be compressed Bits 1 and 0 have value 1 but no message class present |
66 | 102 | 8 bit data | Default | + | Compression set but Character set can't be compressed Bits 1 and 0 have value 2 but no message class present |
67 | 103 | 8 bit data | Default | + | Compression set but Character set can't be compressed Bits 1 and 0 have value 3 but no message class present |
68 | 104 | UCS2 | Default | + | Compression set but Character set can't be compressed |
69 | 105 | UCS2 | Default | + | Compression set but Character set can't be compressed Bits 1 and 0 have value 1 but no message class present |
6A | 106 | UCS2 | Default | + | Compression set but Character set can't be compressed Bits 1 and 0 have value 2 but no message class present |
6B | 107 | UCS2 | Default | + | Compression set but Character set can't be compressed Bits 1 and 0 have value 3 but no message class present |
6C | 108 | (reserved) | Default | + | Reserved character set |
6D | 109 | (reserved) | Default | + | Reserved character set Bits 1 and 0 have value 1 but no message class present |
6E | 110 | (reserved) | Default | + | Reserved character set Bits 1 and 0 have value 2 but no message class present |
6F | 111 | (reserved) | Default | + | Reserved character set Bits 1 and 0 have value 3 but no message class present |
70 | 112 | GSM 7 bit | Class 0 (Flash message) | + | |
71 | 113 | GSM 7 bit | Class 1 (ME-specific) | + | |
72 | 114 | GSM 7 bit | Class 2 (SIM/USIM-specific) | + | |
73 | 115 | GSM 7 bit | Class 3 (TE-specific) | + | |
74 | 116 | 8 bit data | Class 0 (Flash message) | + | Compression set but Character set can't be compressed |
75 | 117 | 8 bit data | Class 1 (ME-specific) | + | Compression set but Character set can't be compressed |
76 | 118 | 8 bit data | Class 2 (SIM/USIM-specific) | + | Compression set but Character set can't be compressed |
77 | 119 | 8 bit data | Class 3 (TE-specific) | + | Compression set but Character set can't be compressed |
78 | 120 | UCS2 | Class 0 (Flash message) | + | Compression set but Character set can't be compressed |
79 | 121 | UCS2 | Class 1 (ME-specific) | + | Compression set but Character set can't be compressed |
7A | 122 | UCS2 | Class 2 (SIM/USIM-specific) | + | Compression set but Character set can't be compressed |
7B | 123 | UCS2 | Class 3 (TE-specific) | + | Compression set but Character set can't be compressed |
7C | 124 | (reserved) | Class 0 (Flash message) | + | Reserved character set |
7D | 125 | (reserved) | Class 1 (ME-specific) | + | Reserved character set |
7E | 126 | (reserved) | Class 2 (SIM/USIM-specific) | + | Reserved character set |
7F | 127 | (reserved) | Class 3 (TE-specific) | + | Reserved character set |
Coding Group: Reserved | |||||
DCS hex dec | Character Set | Message Class | Compressed | Reserved Because | |
80 | 128 | (not defined) | Default | - | Reserved coding group |
... up to ... | |||||
BF | 191 | (not defined) | Default | - | Reserved coding group |
Coding Group: Message Waiting Info: Discard Message | |||||
DCS hex dec | Character Set | Message Waiting Information | Compressed | Reserved Because | |
C0 | 192 | (not defined) | Voicemail Inactive | - | |
C1 | 193 | (not defined) | Fax Inactive | - | |
C2 | 194 | (not defined) | E-mail Inactive | - | |
C3 | 195 | (not defined) | Other Inactive | - | |
C4 | 196 | (not defined) | Voicemail Inactive | - | Value of bit 2 |
C5 | 197 | (not defined) | Fax Inactive | - | Value of bit 2 |
C6 | 198 | (not defined) | E-mail Inactive | - | Value of bit 2 |
C7 | 199 | (not defined) | Other Inactive | - | Value of bit 2 |
C8 | 200 | (not defined) | Voicemail Active | - | |
C9 | 201 | (not defined) | Fax Active | - | |
CA | 202 | (not defined) | E-mail Active | - | |
CB | 203 | (not defined) | Other Active | - | |
CC | 204 | (not defined) | Voicemail Active | - | Value of bit 2 |
CD | 205 | (not defined) | Fax Active | - | Value of bit 2 |
CE | 206 | (not defined) | E-mail Active | - | Value of bit 2 |
CF | 207 | (not defined) | Other Active | - | Value of bit 2 |
Coding Group: Message Waiting Info: Store Message | |||||
DCS hex dec | Character Set | Message Waiting Information | Compressed | Reserved Because | |
D0 | 208 | GSM 7 bit | Voicemail Inactive | - | |
D1 | 209 | GSM 7 bit | Fax Inactive | - | |
D2 | 210 | GSM 7 bit | E-mail Inactive | - | |
D3 | 211 | GSM 7 bit | Other Inactive | - | |
D4 | 212 | GSM 7 bit | Voicemail Inactive | - | Value of bit 2 |
D5 | 213 | GSM 7 bit | Fax Inactive | - | Value of bit 2 |
D6 | 214 | GSM 7 bit | E-mail Inactive | - | Value of bit 2 |
D7 | 215 | GSM 7 bit | Other Inactive | - | Value of bit 2 |
D8 | 216 | GSM 7 bit | Voicemail Active | - | |
D9 | 217 | GSM 7 bit | Fax Active | - | |
DA | 218 | GSM 7 bit | E-mail Active | - | |
DB | 219 | GSM 7 bit | Other Active | - | |
DC | 220 | GSM 7 bit | Voicemail Active | - | Value of bit 2 |
DD | 221 | GSM 7 bit | Fax Active | - | Value of bit 2 |
DE | 222 | GSM 7 bit | E-mail Active | - | Value of bit 2 |
DF | 223 | GSM 7 bit | Other Active | - | Value of bit 2 |
E0 | 224 | UCS2 | Voicemail Inactive | - | |
E1 | 225 | UCS2 | Fax Inactive | - | |
E2 | 226 | UCS2 | E-mail Inactive | - | |
E3 | 227 | UCS2 | Other Inactive | - | |
E4 | 228 | UCS2 | Voicemail Inactive | - | Value of bit 2 |
E5 | 229 | UCS2 | Fax Inactive | - | Value of bit 2 |
E6 | 230 | UCS2 | E-mail Inactive | - | Value of bit 2 |
E7 | 231 | UCS2 | Other Inactive | - | Value of bit 2 |
E8 | 232 | UCS2 | Voicemail Active | - | |
E9 | 233 | UCS2 | Fax Active | - | |
EA | 234 | UCS2 | E-mail Active | - | |
EB | 235 | UCS2 | Other Active | - | |
EC | 236 | UCS2 | Voicemail Active | - | Value of bit 2 |
ED | 237 | UCS2 | Fax Active | - | Value of bit 2 |
EE | 238 | UCS2 | E-mail Active | - | Value of bit 2 |
EF | 239 | UCS2 | Other Active | - | Value of bit 2 |
Coding Group: Data Coding/Message Class | |||||
DCS hex dec | Character Set | Message Class | Compressed | Reserved Because | |
F0 | 240 | GSM 7 bit | Class 0 (Flash message) | - | |
F1 | 241 | GSM 7 bit | Class 1 (ME-specific) | - | |
F2 | 242 | GSM 7 bit | Class 2 (SIM/USIM-specific) | - | |
F3 | 243 | GSM 7 bit | Class 3 (TE-specific) | - | |
F4 | 244 | 8 bit data | Class 0 (Flash message) | - | |
F5 | 245 | 8 bit data | Class 1 (ME-specific) | - | |
F6 | 246 | 8 bit data | Class 2 (SIM/USIM-specific) | - | |
F7 | 247 | 8 bit data | Class 3 (TE-specific) | - | |
F8 | 248 | GSM 7 bit | Class 0 (Flash message) | - | Value of bit 3 |
F9 | 249 | GSM 7 bit | Class 1 (ME-specific) | - | Value of bit 3 |
FA | 250 | GSM 7 bit | Class 2 (SIM/USIM-specific) | - | Value of bit 3 |
FB | 251 | GSM 7 bit | Class 3 (TE-specific) | - | Value of bit 3 |
FC | 252 | 8 bit data | Class 0 (Flash message) | - | Value of bit 3 |
FD | 253 | 8 bit data | Class 1 (ME-specific) | - | Value of bit 3 |
FE | 254 | 8 bit data | Class 2 (SIM/USIM-specific) | - | Value of bit 3 |
FF | 255 | 8 bit data | Class 3 (TE-specific) | - | Value of bit 3 |
iDEN mobile standard uses values F716 and F816 in a special way.
For the DCS values in Cell Broadcast Messages see GSM recommendation 03.38. [1]
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that make up a character encoding are known as "code points" and collectively comprise a "code space", a "code page", or a "character map".
Extended Binary Coded Decimal Interchange Code is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Unisys MCP and ICL VME.
The Global System for Mobile Communications (GSM) is a standard developed by the European Telecommunications Standards Institute (ETSI) to describe the protocols for second-generation (2G) digital cellular networks used by mobile devices such as mobile phones and tablets. GSM is also a trade mark owned by the GSM Association. GSM may also refer to the Full Rate voice codec.
Short Message/Messaging Service, commonly abbreviated as SMS, is a text messaging service component of most telephone, Internet and mobile device systems. It uses standardized communication protocols that let mobile devices exchange short text messages. An intermediary service can facilitate a text-to-voice conversion to be sent to landlines.
Short Message Peer-to-Peer (SMPP) in the telecommunications industry is an open, industry standard protocol designed to provide a flexible data communication interface for the transfer of short message data between External Short Messaging Entities (ESMEs), Routing Entities (REs) and SMSC.
Web pages authored using HyperText Markup Language (HTML) may contain multilingual text represented with the Unicode universal character set. Key to the relationship between Unicode and HTML is the relationship between the "document character set", which defines the set of characters that may be present in a HTML document and assigns numbers to them, and the "external character encoding", or "charset", used to encode a given document as a sequence of bytes.
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from UnicodeTransformation Format – 8-bit.
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding, now known as UCS-2 (for 2-byte Universal Character Set), once it became clear that more than 216 (65,536) code points were needed.
UTF-32 (32-bit Unicode Transformation Format) is a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes) per code point (but a number of leading bits must be zero as there are far fewer than 232 Unicode code points, needing actually only 21 bits). UTF-32 is a fixed-length encoding, in contrast to all other Unicode transformation formats, which are variable-length encodings. Each 32-bit value in UTF-32 represents one Unicode code point and is exactly equal to that code point's numerical value.
In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data in sequences of 24 bits that can be represented by four 6-bit Base64 digits.
External Machine Interface (EMI), an extension to Universal Computer Protocol (UCP), is a protocol primarily used to connect to short message service centres (SMSCs) for mobile telephones. The protocol was developed by CMG Wireless Data Solutions, now part of Mavenir.
Unstructured Supplementary Service Data (USSD), sometimes referred to as "quick codes" or "feature codes", is a communications protocol used by GSM cellular telephones to communicate with the mobile network operator's computers. USSD can be used for WAP browsing, prepaid callback service, mobile-money services, location-based content services, menu-based information services, and as part of configuring the phone on the network.
GPRS Tunnelling Protocol (GTP) is a group of IP-based communications protocols used to carry general packet radio service (GPRS) within GSM, UMTS, LTE and 5G NR radio networks. In 3GPP architectures, GTP and Proxy Mobile IPv6 based interfaces are specified on various interface points.
This article compares Unicode encodings. Two situations are considered: 8-bit-clean environments, and environments that forbid use of byte values that have the high bit set. Originally such prohibitions were to allow for links that used only seven data bits, but they remain in some standards and so some standard-conforming software must generate messages that comply with the restrictions. Standard Compression Scheme for Unicode and Binary Ordered Compression for Unicode are excluded from the comparison tables because it is difficult to simply quantify their size.
Cell Broadcast (CB) is a method of sending messages to multiple mobile telephone users in a defined area at the same time. It is defined by the ETSI’s GSM committee and 3GPP and is part of the 2G, 3G, 4G LTE (telecommunication) and 5G standards. It is also known as Short Message Service-Cell Broadcast (SMS-CB) or CB SMS.
In the cellular phone industry, mobile phones and their networks sometimes support concatenated short message service to overcome the limitation on the number of characters that can be sent in a single SMS text message transmission. Using this method, long messages are split into smaller messages by the sending device and recombined at the receiving end. Each message is then billed separately. When the feature works properly, it is nearly transparent to the user, appearing as a single long text message. Previously, due to incompatibilities between providers and lack of support in some phone models, there was not widespread use of this feature.
The Universal Coded Character Set is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.
In mobile telephony GSM 03.38 or 3GPP 23.038 is a character encoding used in GSM networks for SMS, CB and USSD. The 3GPP TS 23.038 standard defines GSM 7-bit default alphabet which is mandatory for GSM handsets and network elements, but the character set is suitable only for English and a number of Western-European languages. Languages such as Chinese, Korean or Japanese must be transferred using the 16-bit UCS-2 character encoding. A limited number of languages, like Portuguese, Spanish, Turkish and a number of languages used in India written with a Brahmic scripts may use 7-bit encoding with national language shift table defined in 3GPP 23.038. For binary messages, 8-bit encoding is used.
User Data Header (UDH) is a binary structure which may be present at the start of a short message in the Short Message Service in GSM. It does not contain any text, but it specifies how the message should be formatted and processed.
GSM 03.40 or 3GPP TS 23.040 is a mobile telephony standard describing the format of the Transfer Protocol Data Units (TPDU) of the Short Message Transfer Protocol (SM-TP) used in the GSM networks to carry Short Messages. This format is used throughout the whole transfer of the message in the GSM mobile network. In contrast, application servers use different protocols, like Short Message Peer-to-Peer or Universal Computer Protocol, to exchange messages between them and the Short Message Service Center (SMSC).