The PGP Word List ("Pretty Good Privacy word list", also called a biometric word list for reasons explained below) is a list of words for conveying data bytes in a clear unambiguous way via a voice channel. They are analogous in purpose to the NATO phonetic alphabet, except that a longer list of words is used, each word corresponding to one of the 256 distinct numeric byte values.
The PGP Word List was designed in 1995 by Patrick Juola, a computational linguist, and Philip Zimmermann, creator of PGP. [1] [2] The words were carefully chosen for their phonetic distinctiveness, using genetic algorithms to select lists of words that had optimum separations in phoneme space. The candidate word lists were randomly drawn from Grady Ward's Moby Pronunciator list as raw material for the search, successively refined by the genetic algorithms. The automated search converged to an optimized solution in about 40 hours on a DEC Alpha, a particularly fast machine in that era.
The Zimmermann–Juola list was originally designed to be used in PGPfone, a secure VoIP application, to allow the two parties to verbally compare a short authentication string to detect a man-in-the-middle attack (MiTM). It was called a biometric word list because the authentication depended on the two human users recognizing each other's distinct voices as they read and compared the words over the voice channel, binding the identity of the speaker with the words, which helped protect against the MiTM attack. The list can be used in many other situations where a biometric binding of identity is not needed, so calling it a biometric word list may be imprecise. Later, it was used in PGP to compare and verify PGP public key fingerprints over a voice channel. This is known in PGP applications as the "biometric" representation. When it was applied to PGP, the list of words was further refined, with contributions by Jon Callas. More recently, it has been used in Zfone and the ZRTP protocol, the successor to PGPfone.
The list is actually composed of two lists, each containing 256 phonetically distinct words, in which each word represents a different byte value between 0 and 255. Two lists are used because reading aloud long random sequences of human words usually risks three kinds of errors: 1) transposition of two consecutive words, 2) duplicate words, or 3) omitted words. To detect all three kinds of errors, the two lists are used alternately for the even-offset bytes and the odd-offset bytes in the byte sequence. Each byte value is actually represented by two different words, depending on whether that byte appears at an even or an odd offset from the beginning of the byte sequence. The two lists are readily distinguished by the number of syllables; the even list has words of two syllables, the odd list has three. The two lists have a maximum word length of 9 and 11 letters, respectively. Using a two-list scheme was suggested by Zhahai Stewart.
Here are the two lists of words as presented in the PGPfone Owner's Manual. [3]
Hex | Even Word | Odd Word |
---|---|---|
00 | aardvark | adroitness |
01 | absurd | adviser |
02 | accrue | aftermath |
03 | acme | aggregate |
04 | adrift | alkali |
05 | adult | almighty |
06 | afflict | amulet |
07 | ahead | amusement |
08 | aimless | antenna |
09 | Algol | applicant |
0A | allow | Apollo |
0B | alone | armistice |
0C | ammo | article |
0D | ancient | asteroid |
0E | apple | Atlantic |
0F | artist | atmosphere |
10 | assume | autopsy |
11 | Athens | Babylon |
12 | atlas | backwater |
13 | Aztec | barbecue |
14 | baboon | belowground |
15 | backfield | bifocals |
16 | backward | bodyguard |
17 | banjo | bookseller |
18 | beaming | borderline |
19 | bedlamp | bottomless |
1A | beehive | Bradbury |
1B | beeswax | bravado |
1C | befriend | Brazilian |
1D | Belfast | breakaway |
1E | berserk | Burlington |
1F | billiard | businessman |
20 | bison | butterfat |
21 | blackjack | Camelot |
22 | blockade | candidate |
23 | blowtorch | cannonball |
24 | bluebird | Capricorn |
25 | bombast | caravan |
26 | bookshelf | caretaker |
27 | brackish | celebrate |
28 | breadline | cellulose |
29 | breakup | certify |
2A | brickyard | chambermaid |
2B | briefcase | Cherokee |
2C | Burbank | Chicago |
2D | button | clergyman |
2E | buzzard | coherence |
2F | cement | combustion |
30 | chairlift | commando |
31 | chatter | company |
32 | checkup | component |
33 | chisel | concurrent |
34 | choking | confidence |
35 | chopper | conformist |
36 | Christmas | congregate |
37 | clamshell | consensus |
38 | classic | consulting |
39 | classroom | corporate |
3A | cleanup | corrosion |
3B | clockwork | councilman |
3C | cobra | crossover |
3D | commence | crucifix |
3E | concert | cumbersome |
3F | cowbell | customer |
Hex | Even Word | Odd Word |
---|---|---|
40 | crackdown | Dakota |
41 | cranky | decadence |
42 | crowfoot | December |
43 | crucial | decimal |
44 | crumpled | designing |
45 | crusade | detector |
46 | cubic | detergent |
47 | dashboard | determine |
48 | deadbolt | dictator |
49 | deckhand | dinosaur |
4A | dogsled | direction |
4B | dragnet | disable |
4C | drainage | disbelief |
4D | dreadful | disruptive |
4E | drifter | distortion |
4F | dropper | document |
50 | drumbeat | embezzle |
51 | drunken | enchanting |
52 | Dupont | enrollment |
53 | dwelling | enterprise |
54 | eating | equation |
55 | edict | equipment |
56 | egghead | escapade |
57 | eightball | Eskimo |
58 | endorse | everyday |
59 | endow | examine |
5A | enlist | existence |
5B | erase | exodus |
5C | escape | fascinate |
5D | exceed | filament |
5E | eyeglass | finicky |
5F | eyetooth | forever |
60 | facial | fortitude |
61 | fallout | frequency |
62 | flagpole | gadgetry |
63 | flatfoot | Galveston |
64 | flytrap | getaway |
65 | fracture | glossary |
66 | framework | gossamer |
67 | freedom | graduate |
68 | frighten | gravity |
69 | gazelle | guitarist |
6A | Geiger | hamburger |
6B | glitter | Hamilton |
6C | glucose | handiwork |
6D | goggles | hazardous |
6E | goldfish | headwaters |
6F | gremlin | hemisphere |
70 | guidance | hesitate |
71 | hamlet | hideaway |
72 | highchair | holiness |
73 | hockey | hurricane |
74 | indoors | hydraulic |
75 | indulge | impartial |
76 | inverse | impetus |
77 | involve | inception |
78 | island | indigo |
79 | jawbone | inertia |
7A | keyboard | infancy |
7B | kickoff | inferno |
7C | kiwi | informant |
7D | klaxon | insincere |
7E | locale | insurgent |
7F | lockup | integrate |
Hex | Even Word | Odd Word |
---|---|---|
80 | merit | intention |
81 | minnow | inventive |
82 | miser | Istanbul |
83 | Mohawk | Jamaica |
84 | mural | Jupiter |
85 | music | leprosy |
86 | necklace | letterhead |
87 | Neptune | liberty |
88 | newborn | maritime |
89 | nightbird | matchmaker |
8A | Oakland | maverick |
8B | obtuse | Medusa |
8C | offload | megaton |
8D | optic | microscope |
8E | orca | microwave |
8F | payday | midsummer |
90 | peachy | millionaire |
91 | pheasant | miracle |
92 | physique | misnomer |
93 | playhouse | molasses |
94 | Pluto | molecule |
95 | preclude | Montana |
96 | prefer | monument |
97 | preshrunk | mosquito |
98 | printer | narrative |
99 | prowler | nebula |
9A | pupil | newsletter |
9B | puppy | Norwegian |
9C | python | October |
9D | quadrant | Ohio |
9E | quiver | onlooker |
9F | quota | opulent |
A0 | ragtime | Orlando |
A1 | ratchet | outfielder |
A2 | rebirth | Pacific |
A3 | reform | pandemic |
A4 | regain | Pandora |
A5 | reindeer | paperweight |
A6 | rematch | paragon |
A7 | repay | paragraph |
A8 | retouch | paramount |
A9 | revenge | passenger |
AA | reward | pedigree |
AB | rhythm | Pegasus |
AC | ribcage | penetrate |
AD | ringbolt | perceptive |
AE | robust | performance |
AF | rocker | pharmacy |
B0 | ruffled | phonetic |
B1 | sailboat | photograph |
B2 | sawdust | pioneer |
B3 | scallion | pocketful |
B4 | scenic | politeness |
B5 | scorecard | positive |
B6 | Scotland | potato |
B7 | seabird | processor |
B8 | select | provincial |
B9 | sentence | proximate |
BA | shadow | puberty |
BB | shamrock | publisher |
BC | showgirl | pyramid |
BD | skullcap | quantity |
BE | skydive | racketeer |
BF | slingshot | rebellion |
Hex | Even Word | Odd Word |
---|---|---|
C0 | slowdown | recipe |
C1 | snapline | recover |
C2 | snapshot | repellent |
C3 | snowcap | replica |
C4 | snowslide | reproduce |
C5 | solo | resistor |
C6 | southward | responsive |
C7 | soybean | retraction |
C8 | spaniel | retrieval |
C9 | spearhead | retrospect |
CA | spellbind | revenue |
CB | spheroid | revival |
CC | spigot | revolver |
CD | spindle | sandalwood |
CE | spyglass | sardonic |
CF | stagehand | Saturday |
D0 | stagnate | savagery |
D1 | stairway | scavenger |
D2 | standard | sensation |
D3 | stapler | sociable |
D4 | steamship | souvenir |
D5 | sterling | specialist |
D6 | stockman | speculate |
D7 | stopwatch | stethoscope |
D8 | stormy | stupendous |
D9 | sugar | supportive |
DA | surmount | surrender |
DB | suspense | suspicious |
DC | sweatband | sympathy |
DD | swelter | tambourine |
DE | tactics | telephone |
DF | talon | therapist |
E0 | tapeworm | tobacco |
E1 | tempest | tolerance |
E2 | tiger | tomorrow |
E3 | tissue | torpedo |
E4 | tonic | tradition |
E5 | topmost | travesty |
E6 | tracker | trombonist |
E7 | transit | truncated |
E8 | trauma | typewriter |
E9 | treadmill | ultimate |
EA | Trojan | undaunted |
EB | trouble | underfoot |
EC | tumor | unicorn |
ED | tunnel | unify |
EE | tycoon | universe |
EF | uncut | unravel |
F0 | unearth | upcoming |
F1 | unwind | vacancy |
F2 | uproot | vagabond |
F3 | upset | vertigo |
F4 | upshot | Virginia |
F5 | vapor | visitor |
F6 | village | vocalist |
F7 | virus | voyager |
F8 | Vulcan | warranty |
F9 | waffle | Waterloo |
FA | wallet | whimsical |
FB | watchword | Wichita |
FC | wayside | Wilmington |
FD | willow | Wyoming |
FE | woodlark | yesteryear |
FF | Zulu | Yucatan |
Each byte in a bytestring is encoded as a single word. A sequence of bytes is rendered in network byte order, from left to right. For example, the leftmost (i.e. byte 0) is considered "even" and is encoded using the PGP Even Word table. The next byte to the right (i.e. byte 1) is considered "odd" and is encoded using the PGP Odd Word table. This process repeats until all bytes are encoded. Thus, "E582" produces "topmost Istanbul", whereas "82E5" produces "miser travesty".
A PGP public key fingerprint that displayed in hexadecimal as
E582
94F2
E9A2
2748
6E8B
061B
31CC
528F
D7FA
3F19
would display in PGP Words (the "biometric" fingerprint) as
topmost Istanbul
Pluto vagabond
treadmill Pacific
brackish dictator
goldfish Medusa
afflict bravado
chatter revolver
Dupont midsummer
stopwatch whimsical
cowbell bottomless
The order of bytes in a bytestring depends on endianness.
There are several other word lists for conveying data in a clear unambiguous way via a voice channel:
In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication channel or storage in a storage medium. An early example is an invention of language, which enabled a person, through speech, to communicate what they thought, saw, heard, or felt to others. But speech limits the range of communication to the distance a voice can carry and limits the audience to those present when the speech is uttered. The invention of writing, which converted spoken language into visual symbols, extended the range of communication across space and time.
A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data integrity but are not relied upon to verify data authenticity.
F, or f, is the sixth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ef, and the plural is efs.
A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output. The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.
Morse code is a telecommunications method which encodes text characters as standardized sequences of two different signal durations, called dots and dashes, or dits and dahs. Morse code is named after Samuel Morse, one of the early developers of the system adopted for electrical telegraphy.
V, or v, is the twenty-second letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is vee, plural vees.
The International Radiotelephony Spelling Alphabet or simply Radiotelephony Spelling Alphabet, commonly known as the NATO phonetic alphabet, is the most widely used set of clear-code words for communicating the letters of the Roman alphabet. Technically a radiotelephonic spelling alphabet, it goes by various names, including NATO spelling alphabet, ICAO phonetic alphabet and ICAO spelling alphabet. The ITU phonetic alphabet and figure code is a rarely used variant that differs in the code words for digits.
A passphrase is a sequence of words or other text used to control access to a computer system, program or data. It is similar to a password in usage, but a passphrase is generally longer for added security. Passphrases are often used to control both access to, and the operation of, cryptographic programs and systems, especially those that derive an encryption key from a passphrase. The origin of the term is by analogy with password. The modern concept of passphrases is believed to have been invented by Sigmund N. Porter in 1982.
In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.
Base32 is an encoding method based on the base-32 numeral system. It uses an alphabet of 32 digits, each of which represents a different combination of 5 bits (25). Since base32 is not very widely adopted, the question of notation—which characters to use to represent the 32 digits—is not as settled as in the case of more well-known numeral systems (such as hexadecimal), though RFCs and unofficial and de-facto standards exist. One way to represent Base32 numbers in human-readable form is using digits 0–9 followed by the twenty-two upper-case letters A–V. However, many other variations are used in different contexts. Historically, Baudot code could be considered a modified (stateful) base32 code.
Diceware is a method for creating passphrases, passwords, and other cryptographic variables using ordinary dice as a hardware random number generator. For each word in the passphrase, five rolls of a six-sided die are required. The numbers from 1 to 6 that come up in the rolls are assembled as a five-digit number, e.g. 43146. That number is then used to look up a word in a cryptographic word list. In the original Diceware list 43146 corresponds to munch. By generating several words in sequence, a lengthy passphrase can thus be constructed randomly.
Gaj's Latin alphabet, also known as abeceda or gajica, is the form of the Latin script used for writing Serbo-Croatian and all of its standard varieties: Bosnian, Croatian, Montenegrin, and Serbian.
PGPfone was a secure voice telephony system developed by Philip Zimmermann in 1995. The PGPfone protocol had little in common with Zimmermann's popular PGP email encryption package, except for the use of the name. It used ephemeral Diffie-Hellman protocol to establish a session key, which was then used to encrypt the stream of voice packets. The two parties compared a short authentication string to detect a Man-in-the-middle attack, which is the most common method of wiretapping secure phones of this type. PGPfone could be used point-to-point over the public switched telephone network, or over the Internet as an early Voice over IP system.
The diehard tests are a battery of statistical tests for measuring the quality of a random number generator. They were developed by George Marsaglia over several years and first published in 1995 on a CD-ROM of random numbers. In 2006, the original diehard tests were extended into the dieharder
tests.
A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the communication channel does not allow binary data or is not 8-bit clean. PGP documentation uses the term "ASCII armor" for binary-to-text encoding when referring to Base64.
A spelling alphabet is a set of words used to represent the letters of an alphabet in oral communication, especially over a two-way radio or telephone. The words chosen to represent the letters sound sufficiently different from each other to clearly differentiate them. This avoids any confusion that could easily otherwise result from the names of letters that sound similar, except for some small difference easily missed or easily degraded by the imperfect sound quality of the apparatus. For example, in the Latin alphabet, the letters B, P, and D sound similar and could easily be confused, but the words "bravo", "papa" and "delta" sound completely different, making confusion unlikely.
In public-key cryptography, a public key fingerprint is a short sequence of bytes used to identify a longer public key. Fingerprints are created by applying a cryptographic hash function to a public key. Since fingerprints are shorter than the keys they refer to, they can be used to simplify certain key management tasks. In Microsoft software, "thumbprint" is used instead of "fingerprint."
The ISO basic Latin alphabet is an international standard for a Latin-script alphabet that consists of two sets of 26 letters, codified in various national and international standards and used widely in international communication. They are the same letters that comprise the current English alphabet. Since medieval times, they are also the same letters of the modern Latin alphabet. The order is also important for sorting words into alphabetical order.
B, or b, is the second letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is bee, plural bees.
{{cite web}}
: CS1 maint: archived copy as title (link)