Superimposed code

Last updated
Edge-notched card with data for a bibliographic item. Edges have not yet been notched. Edge-notched card.jpg
Edge-notched card with data for a bibliographic item. Edges have not yet been notched.

A superimposed code such as Zatocoding is a kind of hash code that was popular in marginal punched-card systems.

Contents

Marginal punched-card systems

Many names, some of them trademarked, have been used for marginal punched-card systems: edge-notched cards, slotted cards, E-Z Sort, Zatocards, McBee, McBee Keysort, Flexisort, Velom, Rocket, etc. The center of each card held the relevant information—typically the name and author of a book, research paper, or journal article on a nearby shelf; and a list of subjects and keywords. Some sets of cards contained all the information required by the user on the card itself, handwritten, typewritten, or on microfilm (aperture card). Every card in a stack had the same set of pre-punched holes. The user would find the particular cards relevant to a search by aligning the holes in the set of cards (using a card holder or card tray), inserting one or more knitting-needle-like rods all the way through the stack, so the desired cards (which had been notched or cut open) fell out from the irrelevant cards in the collection (left un-notched), which remain on the needle(s). A user could repeat this selection many times to form a complex Boolean searching query. A card that was relevant to 2 or more subjects would have the slot(s) for each of those subjects cut out, so that card would drop out when either one or the other or both subjects was selected . The "superimposed code" coding systems, such as Zatocoding, saved space by entering several or all subjects in the same field; such a "superimposed code" stores much more information in less space, but at the cost of occasional "false" selections. [1]

Once you have a collection of index cards, one per book, research paper, or journal article in a library, with a list of keywords (subjects) discussed in a particular book written on that book's card, the "obvious way" to code those subjects is to count up the total number of subjects used in the entire collection R, make a row of R holes near the top of every card, and for each subject actually discussed in a particular book, cut a slot from the hole corresponding to that subject in the card corresponding to that book. [2] Naturally, this also requires a separate list of every subject used in the collection that indicates which hole is punched for each subject. Unfortunately, there may be thousands of distinct subjects in the collection, and it is impractical to punch thousands of holes in every card. While it may not seem possible to use less than 1 hole per subject, superimposed code systems can solve this problem.

Superimposed codes

The Zatocoding system of information retrieval was developed by Calvin Mooers in 1947. [3]

Calvin Mooers invented Zatocoding at M.I.T., a mechanical information retrieval system based on superimposed codes, and formed the Zator Company in 1947 to commercialize its applications. [4] The particular superimposed code used in that system is called Zatocoding, while the marginal-punched card information retrieval system as a whole is called "Zator". [5]

Setting up a superimposed code for a particular library goes something like this:

Later, when we need to find books on some particular subject, we look up that subject in our list of all R subjects, find the corresponding slot pattern of n slots, and put n needles are through the whole stack in that pattern. All of the cards that have been cut with that pattern will fall out. It is possible that a few other, undesired cards may also fall out—cards who have several subjects whose hole patterns overlap in such a way as to mimic the desired pattern. The probability F of some undesired card with v slots cut in it falling through when we select some pattern of n needles is approximately . Most systems have a N large enough and r small enough such that, v < N/2 (i.e., the card is less than half-punched), so that probability of an undesired card falling through is less than . [2]

There are several different ways to choose which holes will be slotted for each subject.

(Several variations of Zatocoding were developed. Bourne describes a variant "for newer retrieval systems that require high performance of the superimposed coding system", [6] using an approach Mooers published in 1959. [7] )

Zatocoding

Setting up a Zatocode for a particular list of R subjects goes something like this: [2]

Other superimposed codes

A Zatocode requires a code book that lists every subject and a randomly generated notch code associated with each one. Other "direct" superimposed codes have a fixed hash function for transforming the letters in (one spelling of) a subject into a notch code. Such codes require a much shorter code book that describes the translation of letters in a word to the corresponding notch code, and can in principle easily add new subjects without changing the code book. [5]

A Bloom filter can be considered a kind of superimposed code. [8]

Related Research Articles

Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for the metadata that describes data, and for databases of texts, images or sounds.

<span class="mw-page-title-main">Jacquard machine</span> Control device attached to weaving looms

The Jacquard machine is a device fitted to a loom that simplifies the process of manufacturing textiles with such complex patterns as brocade, damask and matelassé. The resulting ensemble of the loom and Jacquard machine is then called a Jacquard loom. The machine was patented by Joseph Marie Jacquard in 1804, based on earlier inventions by the Frenchmen Basile Bouchon (1725), Jean Baptiste Falcon (1728), and Jacques Vaucanson (1740). The machine was controlled by a "chain of cards"; a number of punched cards laced together into a continuous sequence. Multiple rows of holes were punched on each card, with one complete card corresponding to one row of the design.

<span class="mw-page-title-main">Punched card</span> Paper-based recording medium

A punched card is a piece of stiff paper that holds digital data represented by the presence or absence of holes in predefined positions. Punched cards were once common in data processing applications or to directly control automated machinery.

<span class="mw-page-title-main">Punched tape</span> Form of data storage

Punched tape or perforated paper tape is a form of data storage that consists of a long strip of paper in which holes are punched. It developed from and was subsequently used alongside punched cards, differing in that the tape is continuous.

<span class="mw-page-title-main">Perforation</span> (Making) a small hole in a thin material

A perforation is a small hole in a thin material or web. There is usually more than one perforation in an organized fashion, where all of the holes collectively are called a perforation. The process of creating perforations is called perforating, which involves puncturing the workpiece with a tool.

<span class="mw-page-title-main">Hole punch</span> Office tool for making uniform holes in paper

A hole punch, also known as hole puncher, or paper puncher, is an office tool that is used to create holes in sheets of paper, often for the purpose of collecting the sheets in a binder or folder. A hole punch can also refer to similar tools for other materials, such as leather, cloth, or plastic or metal sheets.

Calvin Northrup Mooers, was an American computer scientist known for his work in information retrieval and for the programming language TRAC.

<span class="mw-page-title-main">Unit record equipment</span> Electromechanical machines which processed data using punch cards

Starting at the end of the nineteenth century, well before the advent of electronic computers, data processing was performed using electromechanical machines collectively referred to as unit record equipment, electric accounting machines (EAM) or tabulating machines. Unit record machines came to be as ubiquitous in industry and government in the first two-thirds of the twentieth century as computers became in the last third. They allowed large volume, sophisticated data-processing tasks to be accomplished before electronic computers were invented and while they were still in their infancy. This data processing was accomplished by processing punched cards through various unit record machines in a carefully choreographed progression. This progression, or flow, from machine to machine was often planned and documented with detailed flowcharts that used standardized symbols for documents and the various machine functions. All but the earliest machines had high-speed mechanical feeders to process cards at rates from around 100 to 2,000 per minute, sensing punched holes with mechanical, electrical, or, later, optical sensors. The operation of many machines was directed by the use of a removable plugboard, control panel, or connection box. Initially all machines were manual or electromechanical. The first use of an electronic component was in 1937 when a photocell was used in a Social Security bill-feed machine. Electronic components were used on other machines beginning in the late 1940s.

<span class="mw-page-title-main">Keypunch</span>

A keypunch is a device for precisely punching holes into stiff paper cards at specific locations as determined by keys struck by a human operator. Other devices included here for that same function include the gang punch, the pantograph punch, and the stamp. The term was also used for similar machines used by humans to transcribe data onto punched tape media.

<span class="mw-page-title-main">Time clock</span>

A time clock, sometimes known as a clock card machine or punch clock or time recorder, is a device that records start and end times for hourly employees at a place of business.

<span class="mw-page-title-main">Tabulating machine</span> Late 19th-century machine for summarizing information stored on punch cards

The tabulating machine was an electromechanical machine designed to assist in summarizing information stored on punched cards. Invented by Herman Hollerith, the machine was developed to help process data for the 1890 U.S. Census. Later models were widely used for business applications such as accounting and inventory control. It spawned a class of machines, known as unit record equipment, and the data processing industry.

<span class="mw-page-title-main">Index card</span>

An index card consists of card stock cut to a standard size, used for recording and storing small amounts of discrete data. A collection of such cards either serves as, or aids the creation of, an index for expedited lookup of information. This system is said to have been invented by Carl Linnaeus, around 1760.

<span class="mw-page-title-main">Edge-notched card</span>

Edge-notched cards or edge-punched cards are a system used to store a small amount of binary or logical data on paper index cards, encoded via the presence or absence of notches in the edges of the cards. The notches allowed efficient sorting and selecting of specific cards matching multiple desired criteria, from a larger number of cards in a paper-based database of information. In the mid-20th century they were sold under names such as Cope-Chat cards, E-Z Sort cards, McBee Keysort, and Indecks cards.

<span class="mw-page-title-main">Aperture card</span> Punch card in which a piece of microfilm is mounted

An aperture card is a type of punched card with a cut-out window into which a chip of microfilm is mounted. Such a card is used for archiving or for making multiple inexpensive copies of a document for ease of distribution. The card is typically punched with machine-readable metadata associated with the microfilm image, and printed across the top of the card for visual identification; it may also be punched by hand in the form of an edge-notched card. The microfilm chip is most commonly 35mm in height, and contains an optically reduced image, usually of some type of reference document, such as an engineering drawing, that is the focus of the archiving process. Machinery exists to automatically store, retrieve, sort, duplicate, create, and digitize cards with a high level of automation.

Mooers's law is an empirical observation of behavior made by American computer scientist Calvin Mooers in 1959. The observation is made in relation to information retrieval and the interpretation of the observation is used commonly throughout the information profession both within and outside its original context.

An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have it.

Herbert Marvin Ohlman (1927–2002) is the inventor of permutation indexing, or Permuterm and is one of the pioneers of Information Science and Technology. He has been recognized and included in the Pioneers of Information Science in North America Project by ASIS.

Paper data storage refers to the use of paper as a data storage device. This includes writing, illustrating, and the use of data that can be interpreted by a machine or is the result of the functioning of a machine. A defining feature of paper data storage is the ability of humans to produce it with only simple tools and interpret it visually.

<span class="mw-page-title-main">Control point (orienteering)</span> Waypoint in adventure sports

A control point is a marked waypoint used in orienteering and related sports such as rogaining and adventure racing. It is located in the competition area; marked both on an orienteering map and in the terrain, and described on a control description sheet. The control point must be identifiable on the map and on the ground. A control point has three components: a high visibility item, known as a flag or kite; an identifier, known as a control code; and a recording mechanism for contestants to record proof that they visited the control point. The control point is usually temporary, except on a permanent orienteering course.

<span class="mw-page-title-main">Claire Kelly Schultz</span> American documentalist

Claire Kelly Schultz was a leading figure in the early development of automated information retrieval systems and information science.

<span class="mw-page-title-main">Charlotte Davis Mooers</span> American computer scientist

Charlotte Davis Mooers was an American computer scientist whose research on programming languages began during World War II and continued through the early-1990s.

References

  1. Robert V. Williams. "Punched Cards: A Brief Tutorial". computing now 2002.
  2. 1 2 3 4 W. Ross Ashby. W. Ross Ashby's Journal: Zato-coding 1960 Sep. 22. p. 6208-6222
  3. "About the Cover". College and Research Libraries News, April 2008.
  4. Eugene Garfield. "Continuing relevance of superimposed coding. Journal of Information Science 8 (1984) 181.
  5. 1 2 Herbert Marvin Ohlman. "Subject-Word Letter Frequencies with Applications to Superimposed Coding". Proceedings of the International Conference on Scientific Information (1959).
  6. Bourne, Charles P. (1963). Methods of Information Handling. John Wiley & Sons, Inc. p. 67.
  7. Mooers, Calvin N. (April 1959). The Application of Simple Pattern Inclusion Selection to Large-Scale Information Retrieval Systems. Zator Company.
  8. James Blustein; and Amal El-Maazawi. "Bloom Filters - A Tutorial, Analysis, and Survey". p. 11.