Henry Spencer

Last updated
Henry Spencer in 2014. Henry-5x7-small.jpg
Henry Spencer in 2014.

Henry Spencer (born 1955) is a Canadian computer programmer and space enthusiast. He wrote "regex", a widely used software library for regular expressions, and co-wrote C News, a Usenet server program. He also wrote The Ten Commandments for C Programmers. [1] He is coauthor, with David Lawrence, of the book Managing Usenet. [2] While working at the University of Toronto he ran the first active Usenet site outside the U.S., starting in 1981. His records from that period were eventually acquired by Google to provide an archive of Usenet in the 1980s.

Contents

The first international Usenet site was run in Ottawa, in 1981; however, it is generally not remembered, as it served merely as a read-only medium. Later in 1981, Spencer acquired a Usenet feed from Duke University, and brought "utzoo" online; the earliest public archives of Usenet date from May 1981 as a result.

The small size of Usenet in its youthful days, and Spencer's early involvement, made him a well-recognised participant; this is commemorated in Vernor Vinge's 1992 novel A Fire Upon the Deep . The novel featured an interstellar communications medium remarkably similar to Usenet, down to the author including spurious message headers; one of the characters who appeared solely through postings to this was modeled on Spencer (and, slightly obliquely, named for him).

He is also credited with the claim that "Those who do not understand Unix are condemned to reinvent it, poorly." [3]

Preserving Usenet

In mid-December 2001, Google unveiled its improved Usenet archives, which now go more than a decade deeper into the Internet's past than did the millions of posts that the company had originally acquired when it bought an existing archive called Deja News.

Between 1981 and 1991, while running the zoology department's computer system at the University of Toronto, Spencer copied more than 2 million Usenet messages onto magnetic tapes. The 141 tapes wound up at the University of Western Ontario, where Google's Michael Schmidt tracked them down and, with the help of David Wiseman and others, [4] got them transferred onto disks and into Google's archives. [5]

Free software contributions

Henry Spencer helped Geoff Collyer write C News in 1987.

At around the same time he wrote a non-proprietary replacement for regex(3), the Unix library for handling regular expressions, and made it freely available; his API followed that of Eighth Edition Research Unix. [6] Spencer's library has been used in many software packages, including Tcl, MySQL, [7] and PostgreSQL, [8] as well as being adapted for others, including early versions of Perl. Circa 1993, Spencer donated a second version of his RE library to 4.4BSD, following the POSIX standard for regular expressions.

Spencer was technical lead on the FreeS/WAN project, implementing an IPsec cryptographic protocol stack for Linux.

He also wrote 'aaa' (Amazing Awk Assembler), which is one of the longest and most complex programs ever written in the awk programming language.

He also developed a 4  point font used by entomologists in labeling pinned insect specimens. [9]

Space

Spencer is a founding member of the Canadian Space Society, and has served on its board of directors several times since 1984. He did mission analysis (planning of launch and orbits) for the CSS's Canadian Solar Sail project (now defunct), and was Software Architect for MOST, a Canadian science microsatellite dedicated to studying variable light from stars and extrasolar planets launched by Eurockot in 2003. The asteroid 117329 Spencer is named in his honour.

He is a highly regarded space enthusiast, and is a familiar and respected presence on several space forums on Usenet and the Internet. From 1983 to 2007 Spencer posted over 34,000 messages to the sci.space.* newsgroups. His knowledge of space history and technology is such that the "I Corrected Henry Spencer" virtual T-shirt award was created as a reward for anyone who can catch him in an error of fact. [10]

Related Research Articles

AWK Data-driven programming language made by Alfred Aho, Peter Weinberger and Brian Kernighan

AWK (awk) is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.

Cygwin Unix subsystem for Windows machines

Cygwin is a POSIX-compatible programming and runtime environment that runs natively on Microsoft Windows. Under Cygwin, source code designed for Unix-like operating systems may be compiled with minimal modification and executed.

Integrated development environment Software engineering toolkit

An integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development. An IDE normally consists of at least a source code editor, build automation tools and a debugger. Some IDEs, such as NetBeans and Eclipse, contain the necessary compiler, interpreter, or both; others, such as SharpDevelop and Lazarus, do not.

PostgreSQL Free and open-source relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the Ingres database developed at the University of California, Berkeley. In 1996, the project was renamed to PostgreSQL to reflect its support for SQL. After a review in 2007, the development team decided to keep the name PostgreSQL and the alias Postgres.

Perl Interpreted programming language first released in 1987

Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was officially changed to Raku in October 2019.

Regular expression Sequence of characters that forms a search pattern

A regular expression is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.

rn is a news client written by Larry Wall and originally released in 1984. It was one of the first newsreaders to take full advantage of character-addressable CRT terminals. Previous newsreaders, such as readnews, were mostly line-oriented and designed for use on the printing terminals which were common on the early Unix minicomputers where the Usenet software and network originated. Later variants of the original rn program included rrn, trn, and strn.

sed is a Unix utility that parses and transforms text, using a simple, compact programming language. sed was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed was based on the scripting features of the interactive editor ed and the earlier qed. sed was one of the earliest tools to support regular expressions, and remains in use for text processing, most notably with the substitution command. Popular alternative tools for plaintext string manipulation and "stream editing" include AWK and Perl.

A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine.

In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An application written using ODBC can be ported to other platforms, both on the client and server side, with few changes to the data access code.

Geoff Collyer is a Canadian computer scientist. He is the senior author of C News, a protocol-neutral news transport, and the designer of NOV, the News Overview database used by all modern newsreaders. He contributed the code that allowed to convert the Bourne Shell from using the non-portable sbrk to a portable malloc based implementation. In the past he worked as a Unix system programmer, but since 1994 he has been living on Plan 9 while working at Bell Laboratories.

In computer programming, glob patterns specify sets of filenames with wildcard characters. For example, the Unix Bash shell command mv *.txt textfiles/ moves all files with names ending in .txt from the current directory to the directory textfiles. Here, * is a wildcard standing for "any string of characters except /" and *.txt is a glob pattern. The other common wildcard is the question mark (?), which stands for one character. For example, mv ?.txt shorttextfiles/ will move all files named with a single character followed by .txt from the current directory to directory shorttextfiles, while ??.txt would match all files whose name consists of 2 characters followed by .txt.

Mary Ann Horton American computer scientist and Usenet pioneer (born 1955)

Mary Ann Horton, is a Usenet and Internet pioneer. Horton contributed to Berkeley UNIX (BSD), including the vi editor and terminfo database, created the first email attachment tool uuencode, and led the growth of Usenet in the 1980s.

The term "Research Unix" refers to early versions of the Unix operating system for DEC PDP-7, PDP-11, VAX and Interdata 7/32 and 8/32 computers, developed in the Bell Labs Computing Sciences Research Center (CSRC).

Embedded SQL is a method of combining the computing power of a programming language and the database manipulation capabilities of SQL. Embedded SQL statements are SQL statements written inline with the program source code, of the host language. The embedded SQL statements are parsed by an embedded SQL preprocessor and replaced by host-language calls to a code library. The output from the preprocessor is then compiled by the host compiler. This allows programmers to embed SQL statements in programs written in any number of languages such as C/C++, COBOL and Fortran. This differs from SQL-derived programming languages that don't go through discrete preprocessors, such as PL/SQL and T-SQL.

RegexBuddy

RegexBuddy is a regular expression programming tool by Just Great Software Co. Ltd. for the Microsoft Windows operating system. It provides an interface for building, testing, and debugging regular expressions, in addition to a library of commonly used regular expressions, an interface for generating code to use regular expressions in the desired programming environment, a graphical grep tool for searching through files and directories, and an integrated forum for seeking and providing regular expression advice with other RegexBuddy users.

Usenet Worldwide computer-based distributed discussion system

Usenet is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it was established in 1980. Users read and post messages to one or more categories, known as newsgroups. Usenet resembles a bulletin board system (BBS) in many respects and is the precursor to Internet forums that became widely used. Discussions are threaded, as with web forums and BBSs, though posts are stored on the server sequentially.

Rob Pike Computer programmer and co-creator of Go

Robert "Rob" Pike is a Canadian programmer and author. He is best known for his work on the Go programming language and at Bell Labs, where he was a member of the Unix team and was involved in the creation of the Plan 9 from Bell Labs and Inferno operating systems, as well as the Limbo programming language.

The following outline is provided as an overview of and topical guide to the Perl programming language:

References

  1. Spencer, Henry (October 14, 1987). "Ten Commandments For C Programmers". comp.lang.c.
  2. Lawrence, David; Spencer, Henry (January 1998). Managing Usenet . O'Reilly Media. ISBN   1-56592-198-4.
  3. Spencer, Henry (November 14, 1987). "space news from Sept 28 AW&ST". sci.space.shuttle.
  4. Wiseman, David G. (December 11, 2001). "Magi's NetNews Archive Involvement". Archived from the original on February 9, 2005.
  5. Mieszkowski, Katharine (January 8, 2002). "The Geeks Who Saved Usenet". Salon.com. Archived from the original on September 2, 2003.
  6. Henry Spencer (1986-01-19). "regexp(3)". Newsgroup:  mod.sources. Usenet:   1316@panda.UUCP . Retrieved 9 January 2013.
  7. "Regular Expressions". MySQL 5.6 Reference Manual. Oracle. 27 November 2012.
  8. "Regular Expression Details". PostgreSQL 8.4+ Reference Manual. 1 September 2008.
  9. Darling, D. Christopher; Plowright, R.C. (May–June 1990). "HPLABEL: A Program and Microfont for the Generation of Date / Locality Labels Using a Laser Printer" (PDF). Entomological News. 101 (3). Archived from the original (PDF) on 2017-06-05.
  10. Yarvin, Norman. "About Yarchive.net". Archived from the original on August 1, 2003.