Software archaeology

Last updated

Software archaeology or software archeology is the study of poorly documented or undocumented legacy software implementations, as part of software maintenance. [1] [2] Software archaeology, named by analogy with archaeology, [3] includes the reverse engineering of software modules, and the application of a variety of tools and processes for extracting and understanding program structure and recovering design information. [1] [4] Software archaeology may reveal dysfunctional team processes which have produced poorly designed or even unused software modules. [5] The term has been in use for decades, [6] and reflects a fairly natural metaphor: a programmer reading legacy code may feel that he or she is in the same situation as an archaeologist exploring the rubble of an ancient civilization. [7]

Legacy system

In computing, a legacy system is an old method, technology, computer system, or application program, "of, relating to, or being a previous or outdated computer system," yet still in use. Often referencing a system as "legacy" means that it paved the way for the standards that would follow it. This can also imply that the system is out of date or in need of replacement.

Software maintenance in software engineering is the modification of a software product after delivery to correct faults, to improve performance or other attributes.

Archaeology, or archeology, is the study of human activity through the recovery and analysis of material culture. The archaeological record consists of artifacts, architecture, biofacts or ecofacts and cultural landscapes. Archaeology can be considered both a social science and a branch of the humanities. In North America archaeology is a sub-field of anthropology, while in Europe it is often viewed as either a discipline in its own right or a sub-field of other disciplines.



A workshop on Software Archaeology at the 2001 OOPSLA (Object-Oriented Programming, Systems, Languages & Applications) conference identified the following software archaeology techniques, some of which are specific to object-oriented programming: [7]

OOPSLA is an annual ACM research conference. OOPSLA mainly takes place in the United States, while the sister conference of OOPSLA, ECOOP, is typically held in Europe. It is operated by the Special Interest Group for Programming Languages (SIGPLAN) group of the Association for Computing Machinery (ACM).

Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data, in the form of fields, and code, in the form of procedures. A feature of objects is an object's procedures that can access and often modify the data fields of the object with which they are associated. In OOP, computer programs are designed by making them out of objects that interact with one another. OOP languages are diverse, but the most popular ones are class-based, meaning that objects are instances of classes, which also determine their types.

A scripting or script language is a programming language for a special run-time environment that automates the execution of tasks; the tasks could alternatively be executed one-by-one by a human operator. Scripting languages are often interpreted.

Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior- and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.

truss is a system tool available on some Unix-like operating systems. When invoked with an additional executable command-line argument, truss makes it possible to print out the system calls made by and the signals received by this executable command-line argument. As of version IEEE Std 1003.1-2008, truss is not part of the Single UNIX Specification (POSIX).

More generally, Andy Hunt and Dave Thomas note the importance of version control, dependency management, text indexing tools such as GLIMPSE and SWISH-E, and "[drawing] a map as you begin exploring." [7]

Andy Hunt (author) American computer programmer

Andy Hunt is a writer of books on software development. Hunt co-authored The Pragmatic Programmer, ten other books and many articles, and was one of the 17 original authors of the Agile Manifesto and founders of the Agile Alliance. He and partner Dave Thomas founded the Pragmatic Bookshelf series of books for software developers. He also plays the trumpet, flugel horn, and keyboards.

Dave Thomas (programmer) British computer programmer

Dave Thomas is a computer programmer, author and editor. He has written about Ruby and together with Andy Hunt, he co-authored The Pragmatic Programmer and runs The Pragmatic Bookshelf publishing company. Thomas moved to the United States from England in 1994 and lives north of Dallas, Texas.

GLIMPSE is a text indexing and retrieval software program originally developed at the University of Arizona by Udi Manber, Sun Wu, and Burra Gopal. It was released under the ISC license in September 2014.

Like true archaeology, software archaeology involves investigative work to understand the thought processes of one's predecessors. [7] At the OOPSLA workshop, Ward Cunningham suggested a synoptic signature analysis technique which gave an overall "feel" for a program by showing only punctuation, such as semicolons and curly braces. [8] In the same vein, Cunningham has suggested viewing programs in 2 point font in order to understand the overall structure. [9] Another technique identified at the workshop was the use of aspect-oriented programming tools such as AspectJ to systematically introduce tracing code without directly editing the legacy program. [7]

Ward Cunningham American computer programmer who developed the first wiki

Howard G. "Ward" Cunningham is an American programmer who developed the first wiki. A pioneer in both design patterns and extreme programming, he started programming the software WikiWikiWeb in 1994 and installed it on the website of the software consultancy he started with his wife, Karen, Cunningham & Cunningham, on March 25, 1995, as an add-on to the Portland Pattern Repository. He has authored a book about wikis, titled The Wiki Way, and also invented Framework for Integrated Tests. He was a keynote speaker at the first three instances of the WikiSym conference series on wiki research and practice as well as a keynote speaker at the Wikimedia Developer Summit 2017.

In computer programming, a block or code block is a lexical structure of source code which is grouped together. Blocks consist of one or more declarations and statements. A programming language that permits the creation of blocks, including blocks nested within other blocks, is called a block-structured programming language. Blocks are fundamental to structured programming, where control structures are formed from blocks.

In computing, aspect-oriented programming (AOP) is a programming paradigm that aims to increase modularity by allowing the separation of cross-cutting concerns. It does so by adding additional behavior to existing code without modifying the code itself, instead separately specifying which code is modified via a "pointcut" specification, such as "log all function calls when the function's name begins with 'set'". This allows behaviors that are not central to the business logic to be added to a program without cluttering the code, core to the functionality. AOP forms a basis for aspect-oriented software development.

Network and temporal analysis techniques can reveal the patterns of collaborative activity by the developers of legacy software, which in turn may shed light on the strengths and weaknesses of the software artifacts produced. [10]

Michael Rozlog of Embarcadero Technologies has described software archaeology as a six-step process which enables programmers to answer questions such as "What have I just inherited?" and "Where are the scary sections of the code?" [11] These steps, similar to those identified by the OOPSLA workshop, include using visualization to obtain a visual representation of the program's design, using software metrics to look for design and style violations, using unit testing and profiling to look for bugs and performance bottlenecks, and assembling design information recovered by the process. [11] Software archaeology can also be a service provided to programmers by external consultants. [12]

Mitch Rosenberg of, Inc. claims[ citation needed ] that the first law of software archaeology (he calls it code or data archaeology) is:

Everything that is there is there for a reason, and there are 3 possible reasons:

  1. It used to need to be there but no longer does
  2. It never needed to be there and the person that wrote the code had no clue
  3. It STILL needs to be there and YOU have no clue

The corollary to this "law" is that, until you know which was the reason, you should NOT modify the code (or data).

Software archaeology has continued to be a topic of discussion at more recent software engineering conferences. [13]

The profession of programmer–archaeologist features prominently in Vernor Vinge's A Deepness in the Sky. [14]

See also

Related Research Articles

Computing Activity that uses computers

HTTP Strict Transport Security

A compiler is a computer program that translates computer code written in one programming language into another programming language. The name compiler is primarily used for programs that translate source code from a high-level programming language to a lower level language to create an executable program.

Static program analysis is the analysis of computer software that is performed without actually executing programs, in contrast with dynamic analysis, which is analysis performed on programs while they are executing. In most cases the analysis is performed on some version of the source code, and in the other cases, some form of the object code.

Software development is the process of conceiving, specifying, designing, programming, documenting, testing, and bug fixing involved in creating and maintaining applications, frameworks, or other software components. Software development is a process of writing and maintaining the source code, but in a broader sense, it includes all that is involved between the conception of the desired software through to the final manifestation of the software, sometimes in a planned and structured process. Therefore, software development may include research, new development, prototyping, modification, reuse, re-engineering, maintenance, or any other activities that result in software products.

The following outline is provided as an overview of and topical guide to software engineering:

The Portland Pattern Repository (PPR) is a repository for computer programming software design patterns. It was accompanied by a companion website, WikiWikiWeb, which was the world's first wiki. The repository has an emphasis on Extreme Programming, and it is hosted by Cunningham & Cunningham (C2) of Portland, Oregon. The PPR's motto is "People, Projects & Patterns".

In computer programming, a software framework is an abstraction in which software providing generic functionality can be selectively changed by additional user-written code, thus providing application-specific software. A software framework provides a standard way to build and deploy applications. A software framework is a universal, reusable software environment that provides particular functionality as part of a larger software platform to facilitate development of software applications, products and solutions. Software frameworks may include support programs, compilers, code libraries, tool sets, and application programming interfaces (APIs) that bring together all the different components to enable development of a project or system.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization.

Object-oriented analysis and design (OOAD) is a popular technical approach for analyzing and designing an application, system, or business by applying object-oriented programming, as well as using visual modeling throughout the development life cycles to foster better stakeholder communication and product quality.

End-user development (EUD) or end-user programming (EUP) refers to activities and tools that allow end-users – people who are not professional software developers – to program computers. People who are not professional developers can use EUD tools to create or modify software artifacts and complex data objects without significant knowledge of a programming language. In 2005 it was estimated that by 2012 there would be more than 55 million end-user developers in the United States, compared with fewer than 3 million professional programmers. Various EUD approaches exist, and it is an active research topic within the field of computer science and human-computer interaction. Examples include natural language programming, spreadsheets, scripting languages, visual programming, trigger-action programming and programming by example.

Intentional Software is a software company that designed tools and platforms that follow the principles of intentional programming in which programmers focus on capturing the intent of users and designers, and spend as little time as possible interacting with machines and compilers. Its tools include language workbenches, tools that separated software function from implementation, and allowed 'language-focused' development. This allowed automatic rewriting of code as expert knowledge of implementation options changed. The company later began developing a platform for improving productivity of software groups.

In computing, aspect-oriented software development (AOSD) is a software development technology that seeks new modularizations of software systems in order to isolate secondary or supporting functions from the main program's business logic. AOSD allows multiple concerns to be expressed separately and automatically unified into working systems.

In computing, subject-oriented programming is an object-oriented software paradigm in which the state (fields) and behavior (methods) of objects are not seen as intrinsic to the objects themselves, but are provided by various subjective perceptions (“subjects”) of the objects. The term and concepts were first published in September 1993 in a conference paper which was later recognized as being one of the three most influential papers to be presented at the conference between 1986 and 1996. As illustrated in that paper, an analogy is made with the contrast between the philosophical views of Plato and Kant with respect to the characteristics of “real” objects, but applied to software ones. For example, while we may all perceive a tree as having a measurable height, weight, leaf-mass, etc., from the point of view of a bird, a tree may also have measures of relative value for food or nesting purposes, or from the point of view of a tax-assessor, it may have a certain taxable value in a given year. Neither the bird’s nor the tax-assessor’s additional state information need be seen as intrinsic to the tree, but are added by the perceptions of the bird and tax-assessor, and from Kant’s analysis, the same may be true even of characteristics we think of as intrinsic.

The Texas Instruments Explorer is a family of Lisp machine computers. These computers were sold by Texas Instruments in the 1980s. The Explorer is based on a design from Lisp Machines Incorporated, which is based on the MIT Lisp Machine. The Explorer was used for development and deployment of artificial-intelligence software

Debugging is the process of finding and resolving defects or problems within a computer program that prevent correct operation of computer software or a system.

Software diagnosis refers to concepts, techniques, and tools that allow for obtaining findings, conclusions, and evaluations about software systems and their implementation, composition, behavior, and evolution. It serves as means to monitor, steer, observe and optimize software development, software maintenance, and software re-engineering in the sense of a business intelligence approach specific to software systems. It is generally based on the automatic extraction, analysis, and visualization of corresponding information sources of the software system. It can also be manually done and not automatic.


  1. 1 2 Gregorio Robles, Jesus M. Gonzalez-Barahona, and Israel Herraiz, "An Empirical Approach to Software Archaeology," Poster Proceedings of the International Conference on Software Maintenance, 2005.
  2. "Agile Legacy System Analysis and Integration Modeling" by Scott W. Ambler at, accessed 20 August 2010: "Without accurate documentation, or access to knowledgeable people, your last resort may be to analyze the source code for the legacy system... This effort is often referred to as software archaeology."
  3. Bryon Moyer, "Software Archeology: Modernizing Old Systems," Embedded Technology Journal, March 4, 2009.
  4. Richard Hopkins and Kevin Jenkins, Eating the IT Elephant: Moving from greenfield development to brownfield , Addison-Wesley, 2008, ISBN   0-13-713012-0, p. 93.
  5. Diomidis Spinellis and Georgios Gousios, Beautiful Architecture , O'Reilly, 2009, ISBN   0-596-51798-X, p. 29.
  6. An early discussion is Judith E. Grass, "Object-Oriented Design Archaeology with CIA++," Computing Systems, Vol. 5, No. 1, Winter 1992.
  7. 1 2 3 4 5 Andy Hunt and Dave Thomas, "Software Archaeology", IEEE Software, vol. 19, no. 2, pp. 20-22, Mar./Apr. 2002, doi : 10.1109/52.991327.
  8. Ward Cunningham, "Signature Survey: A Method for Browsing Unfamiliar Code," Workshop Position Statement, Software Archeology: Understanding Large Systems, OOPSLA 2001.
  9. "Software Archeology" on John D. Cook's blog The Endeavour, November 10, 2009.
  10. Cleidson de Souza, Jon Froehlich, and Paul Dourish, "Seeking the Source: Software Source Code as a Social and Technical Artifact," Proceedings of the 2005 International ACM SIGGROUP Conference on Supporting Group Work, pp. 197-206.
  11. 1 2 Michael Rozlog, "Software Archeology: What Is It and Why Should Java Developers Care?," article on, January 28, 2008.
  12. Simon Sharwood, Raiders of the Lost Code, ZDNet, November 3, 2004.
  13. For example, the 32nd ACM/IEEE International Conference on Software Engineering in Cape Town, South Africa in May 2010.