Portable Character Set is a set of 103 characters which, according to the POSIX standard, must be present in any character set. Compared to ASCII, the Portable Character Set lacks some control characters, and does not prescribe any particular value encoding. [1] [2] The Portable Character Set is a superset of the Basic Execution Character Set as defined by ANSI C. [3]
| name | glyph | C string | Unicode | Unicode name |
|---|---|---|---|---|
| NUL | \0 | U+0000 | NULL (NUL) | |
| alert | \a | U+0007 | ALERT (BEL) | |
| backspace | \b | U+0008 | BACKSPACE (BS) | |
| tab | \t | U+0009 | CHARACTER TABULATION (HT) | |
| newline | \n | U+000A | LINE FEED (LF) | |
| vertical-tab | \v | U+000B | LINE TABULATION (VT) | |
| form-feed | \f | U+000C | FORM FEED (FF) | |
| carriage-return | \r | U+000D | CARRIAGE RETURN (CR) | |
| space | U+0020 | SPACE | ||
| exclamation-mark | ! | ! | U+0021 | EXCLAMATION MARK |
| quotation-mark | " | \" | U+0022 | QUOTATION MARK |
| number-sign | # | # | U+0023 | NUMBER SIGN |
| dollar-sign | $ | $ | U+0024 | DOLLAR SIGN |
| percent-sign | % | % | U+0025 | PERCENT SIGN |
| ampersand | & | & | U+0026 | AMPERSAND |
| apostrophe | ' | \' | U+0027 | APOSTROPHE |
| left-parenthesis | ( | ( | U+0028 | LEFT PARENTHESIS |
| right-parenthesis | ) | ) | U+0029 | RIGHT PARENTHESIS |
| asterisk | * | * | U+002A | ASTERISK |
| plus-sign | + | + | U+002B | PLUS SIGN |
| comma | , | , | U+002C | COMMA |
| hyphen | - | - | U+002D | HYPHEN-MINUS |
| period | . | . | U+002E | FULL STOP |
| slash | / | / | U+002F | SOLIDUS |
| zero | 0 | 0 | U+0030 | DIGIT ZERO |
| one | 1 | 1 | U+0031 | DIGIT ONE |
| two | 2 | 2 | U+0032 | DIGIT TWO |
| three | 3 | 3 | U+0033 | DIGIT THREE |
| four | 4 | 4 | U+0034 | DIGIT FOUR |
| five | 5 | 5 | U+0035 | DIGIT FIVE |
| six | 6 | 6 | U+0036 | DIGIT SIX |
| seven | 7 | 7 | U+0037 | DIGIT SEVEN |
| eight | 8 | 8 | U+0038 | DIGIT EIGHT |
| nine | 9 | 9 | U+0039 | DIGIT NINE |
| colon | : | : | U+003A | COLON |
| semicolon | ; | ; | U+003B | SEMICOLON |
| less-than-sign | < | < | U+003C | LESS-THAN SIGN |
| equals-sign | = | = | U+003D | EQUALS SIGN |
| greater-than-sign | > | > | U+003E | GREATER-THAN SIGN |
| question-mark | ? | ? | U+003F | QUESTION MARK |
| commercial-at | @ | @ | U+0040 | COMMERCIAL AT |
| A | A | A | U+0041 | LATIN CAPITAL LETTER A |
| B | B | B | U+0042 | LATIN CAPITAL LETTER B |
| C | C | C | U+0043 | LATIN CAPITAL LETTER C |
| D | D | D | U+0044 | LATIN CAPITAL LETTER D |
| E | E | E | U+0045 | LATIN CAPITAL LETTER E |
| F | F | F | U+0046 | LATIN CAPITAL LETTER F |
| G | G | G | U+0047 | LATIN CAPITAL LETTER G |
| H | H | H | U+0048 | LATIN CAPITAL LETTER H |
| I | I | I | U+0049 | LATIN CAPITAL LETTER I |
| J | J | J | U+004A | LATIN CAPITAL LETTER J |
| K | K | K | U+004B | LATIN CAPITAL LETTER K |
| L | L | L | U+004C | LATIN CAPITAL LETTER L |
| M | M | M | U+004D | LATIN CAPITAL LETTER M |
| N | N | N | U+004E | LATIN CAPITAL LETTER N |
| O | O | O | U+004F | LATIN CAPITAL LETTER O |
| P | P | P | U+0050 | LATIN CAPITAL LETTER P |
| Q | Q | Q | U+0051 | LATIN CAPITAL LETTER Q |
| R | R | R | U+0052 | LATIN CAPITAL LETTER R |
| S | S | S | U+0053 | LATIN CAPITAL LETTER S |
| T | T | T | U+0054 | LATIN CAPITAL LETTER T |
| U | U | U | U+0055 | LATIN CAPITAL LETTER U |
| V | V | V | U+0056 | LATIN CAPITAL LETTER V |
| W | W | W | U+0057 | LATIN CAPITAL LETTER W |
| X | X | X | U+0058 | LATIN CAPITAL LETTER X |
| Y | Y | Y | U+0059 | LATIN CAPITAL LETTER Y |
| Z | Z | Z | U+005A | LATIN CAPITAL LETTER Z |
| left-square-bracket | [ | [ | U+005B | LEFT SQUARE BRACKET |
| backslash | \ | \\ | U+005C | REVERSE SOLIDUS |
| right-square-bracket | ] | ] | U+005D | RIGHT SQUARE BRACKET |
| circumflex | ^ | ^ | U+005E | CIRCUMFLEX ACCENT |
| underscore | _ | _ | U+005F | LOW LINE |
| grave-accent | ` | ` | U+0060 | GRAVE ACCENT |
| a | a | a | U+0061 | LATIN SMALL LETTER A |
| b | b | b | U+0062 | LATIN SMALL LETTER B |
| c | c | c | U+0063 | LATIN SMALL LETTER C |
| d | d | d | U+0064 | LATIN SMALL LETTER D |
| e | e | e | U+0065 | LATIN SMALL LETTER E |
| f | f | f | U+0066 | LATIN SMALL LETTER F |
| g | g | g | U+0067 | LATIN SMALL LETTER G |
| h | h | h | U+0068 | LATIN SMALL LETTER H |
| i | i | i | U+0069 | LATIN SMALL LETTER I |
| j | j | j | U+006A | LATIN SMALL LETTER J |
| k | k | k | U+006B | LATIN SMALL LETTER K |
| l | l | l | U+006C | LATIN SMALL LETTER L |
| m | m | m | U+006D | LATIN SMALL LETTER M |
| n | n | n | U+006E | LATIN SMALL LETTER N |
| o | o | o | U+006F | LATIN SMALL LETTER O |
| p | p | p | U+0070 | LATIN SMALL LETTER P |
| q | q | q | U+0071 | LATIN SMALL LETTER Q |
| r | r | r | U+0072 | LATIN SMALL LETTER R |
| s | s | s | U+0073 | LATIN SMALL LETTER S |
| t | t | t | U+0074 | LATIN SMALL LETTER T |
| u | u | u | U+0075 | LATIN SMALL LETTER U |
| v | v | v | U+0076 | LATIN SMALL LETTER V |
| w | w | w | U+0077 | LATIN SMALL LETTER W |
| x | x | x | U+0078 | LATIN SMALL LETTER X |
| y | y | y | U+0079 | LATIN SMALL LETTER Y |
| z | z | z | U+007A | LATIN SMALL LETTER Z |
| left-brace | { | { | U+007B | LEFT CURLY BRACKET |
| vertical-line | | | | | U+007C | VERTICAL LINE |
| right-brace | } | } | U+007D | RIGHT CURLY BRACKET |
| tilde | ~ | ~ | U+007E | TILDE |
Characters grouped by their class. [4]
| Unicode range | Character Class | POSIX.1-2017 Standard |
|---|---|---|
| U+0000 | Control | Portable |
| U+0001 to U+0006 | Control | Non-Portable |
| U+0007 to U+0008 | Control | Portable |
| U+0009 to U+000D | White-space | Portable |
| U+0010 to U+001F | Control | Non-Portable |
| U+0020 | White-space | Portable |
| U+0021 to U+002F | Punctuation | Portable |
| U+0030 to U+0039 | Digit | Portable |
| U+003A to U+0040 | Punctuation | Portable |
| U+0041 to U+005A | Uppercase Letter | Portable |
| U+005B to U+0060 | Punctuation | Portable |
| U+0061 to U+007A | Lowercase Letter | Portable |
| U+007B to U+007E | Punctuation | Portable |
| U+007F | Control | Non-Portable |
POSIX also standardizes a portable filename character set, a much smaller subset of 65 of the above characters: [5] 26 uppercase letters, 26 lowercase letters, 10 decimal digits, and three punctuation characters "period", "underscore", and "hyphen".
To be usable across all POSIX locales, a pathname should consist only of characters from that portable filename character set, "slash" characters, and a single final "NUL" character. [6] Like the Portable Character Set that these characters are taken from, the encoding of the portable filename character set is not specified. [7]
To be usable across all POSIX locals, user names, group names, file names, and directory names should be composed only from characters from the portable filename character set. The "hyphen" should not be used as the first character of any of those names.
Many people recommend using only this portable filename character set for file names and directory names, even on systems that technically could use other letters and symbols; perhaps by using utilities such as detox, convmv, and Glindra to fix "bad" filenames. [8] [9] [10] [11]