Software archaeology

Last updated

Software archaeology or source code archeology is the study of poorly documented or undocumented legacy software implementations, as part of software maintenance. [1] [2] Software archaeology, named by analogy with archaeology, [3] includes the reverse engineering of software modules, and the application of a variety of tools and processes for extracting and understanding program structure and recovering design information. [1] [4] Software archaeology may reveal dysfunctional team processes which have produced poorly designed or even unused software modules, and in some cases deliberately obfuscatory code may be found. [5] The term has been in use for decades. [6]

Contents

Software archaeology has continued to be a topic of discussion at more recent software engineering conferences. [7]

Techniques

A workshop on Software Archaeology at the 2001 OOPSLA (Object-Oriented Programming, Systems, Languages & Applications) conference identified the following software archaeology techniques, some of which are specific to object-oriented programming: [8]

More generally, Andy Hunt and Dave Thomas note the importance of version control, dependency management, text indexing tools such as GLIMPSE and SWISH-E, and "[drawing] a map as you begin exploring." [8]

Like true archaeology, software archaeology involves investigative work to understand the thought processes of one's predecessors. [8] At the OOPSLA workshop, Ward Cunningham suggested a synoptic signature analysis technique which gave an overall "feel" for a program by showing only punctuation, such as semicolons and curly braces. [9] In the same vein, Cunningham has suggested viewing programs in 2 point font in order to understand the overall structure. [10] Another technique identified at the workshop was the use of aspect-oriented programming tools such as AspectJ to systematically introduce tracing code without directly editing the legacy program. [8]

Network and temporal analysis techniques can reveal the patterns of collaborative activity by the developers of legacy software, which in turn may shed light on the strengths and weaknesses of the software artifacts produced. [11]

Michael Rozlog of Embarcadero Technologies has described software archaeology as a six-step process which enables programmers to answer questions such as "What have I just inherited?" and "Where are the scary sections of the code?" [12] These steps, similar to those identified by the OOPSLA workshop, include using visualization to obtain a visual representation of the program's design, using software metrics to look for design and style violations, using unit testing and profiling to look for bugs and performance bottlenecks, and assembling design information recovered by the process. [12] Software archaeology can also be a service provided to programmers by external consultants. [13]

The profession of "programmer–archaeologist" features prominently in Vernor Vinge's 1999 sci-fi novel A Deepness in the Sky. [14]

See also

Related Research Articles

<span class="mw-page-title-main">Pair programming</span> Collaborative technique for software development

Pair programming is a software development technique in which two programmers work together at one workstation. One, the driver, writes code while the other, the observer or navigator, reviews each line of code as it is typed in. The two programmers switch roles frequently.

In computer programming and software design, code refactoring is the process of restructuring existing source code—changing the factoring—without changing its external behavior. Refactoring is intended to improve the design, structure, and/or implementation of the software, while preserving its functionality. Potential advantages of refactoring may include improved code readability and reduced complexity; these can improve the source code's maintainability and create a simpler, cleaner, or more expressive internal architecture or object model to improve extensibility. Another potential goal for refactoring is improved performance; software engineers face an ongoing challenge to write programs that perform faster or use less memory.

Software documentation is written text or illustration that accompanies computer software or is embedded in the source code. The documentation either explains how the software operates or how to use it, and may mean different things to people in different roles.

In computer science, static program analysis is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution in the integrated environment.

In software engineering, a design pattern describes a relatively small, well-defined aspect of a computer program in terms of how to write the code.

Unit testing, a.k.a. component or module testing, is a form of software testing by which isolated source code is tested to validate expected behavior.

Software development is the process used to create software. Programming and maintaining the source code is the central step of this process, but it also includes conceiving the project, evaluating its feasibility, analyzing the business requirements, software design, testing, to release. Software engineering, in addition to development, also includes project management, employee management, and other overhead functions. Software development may be sequential, in which each step is complete before the next begins, but iterative development methods where multiple steps can be executed at once and earlier steps can be revisited have also been devised to improve flexibility, efficiency, and scheduling.

The following outline is provided as an overview of and topical guide to software engineering:

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined to accomplish a task, much as one might use multiple hands to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer, often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.

In computer programming, a software framework is an abstraction in which software, providing generic functionality, can be selectively changed by additional user-written code, thus providing application-specific software. It provides a standard way to build and deploy applications and is a universal, reusable software environment that provides particular functionality as part of a larger software platform to facilitate the development of software applications, products and solutions.

In computer science, automatic programming is a type of computer programming in which some mechanism generates a computer program to allow human programmers to write the code at a higher abstraction level.

In the context of software engineering, software quality refers to two related but distinct notions:

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior—and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.

Legacy modernization, also known as software modernization or platform modernization, refers to the conversion, rewriting or porting of a legacy system to modern computer programming languages, architectures, software libraries, protocols or hardware platforms. Legacy transformation aims to retain and extend the value of the legacy investment through migration to new platforms to benefit from the advantage of the new technologies.

In software engineering, team programming is a project management strategy for coordinating task distribution in computer software development projects, which involves the assignment of two or more computer programmers to work collaboratively on an individual sub-task within a larger programming project. In general, the manner in which this term is used today refers to methods currently in vogue within the software development industry where multiple individuals work simultaneously on the same activity; in these systems, programmers are often grouped in pairs at the same computer workstation, one observing the other working on the software and alternating roles at time intervals.

In computing, subject-oriented programming is an object-oriented software paradigm in which the state (fields) and behavior (methods) of objects are not seen as intrinsic to the objects themselves, but are provided by various subjective perceptions ("subjects") of the objects. The term and concepts were first published in September 1993 in a conference paper which was later recognized as being one of the three most influential papers to be presented at the conference between 1986 and 1996. As illustrated in that paper, an analogy is made with the contrast between the philosophical views of Plato and Kant with respect to the characteristics of "real" objects, but applied to software ones. For example, while we may all perceive a tree as having a measurable height, weight, leaf-mass, etc., from the point of view of a bird, a tree may also have measures of relative value for food or nesting purposes, or from the point of view of a tax-assessor, it may have a certain taxable value in a given year. Neither the bird's nor the tax-assessor's additional state information need be seen as intrinsic to the tree, but are added by the perceptions of the bird and tax-assessor, and from Kant's analysis, the same may be true even of characteristics we think of as intrinsic.

Reverse engineering is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accomplishes a task with very little insight into exactly how it does so. Depending on the system under consideration and the technologies employed, the knowledge gained during reverse engineering can help with repurposing obsolete objects, doing security analysis, or learning how something works.

In engineering, debugging is the process of finding the root cause of and workarounds and possible fixes for bugs.

Software diagnosis refers to concepts, techniques, and tools that allow for obtaining findings, conclusions, and evaluations about software systems and their implementation, composition, behaviour, and evolution. It serves as means to monitor, steer, observe and optimize software development, software maintenance, and software re-engineering in the sense of a business intelligence approach specific to software systems. It is generally based on the automatic extraction, analysis, and visualization of corresponding information sources of the software system. It can also be manually done and not automatic.

References

  1. 1 2 Robles, Gregorio; Gonzalez-Barahona, Jesus M.; Herraiz, Israel (2005). "An Empirical Approach to Software Archaeology" (PDF). Poster Proceedings of the International Conference on Software Maintenance.
  2. Ambler, Scott W. "Agile Legacy System Analysis and Integration Modeling". agilemodeling.com. Retrieved 2010-08-20. Without accurate documentation, or access to knowledgeable people, your last resort may be to analyze the source code for the legacy system... This effort is often referred to as software archaeology.
  3. Moyer, Bryon (4 March 2009). "Software Archeology: Modernizing Old Systems" (PDF). Embedded Technology Journal.
  4. Hopkins, Richard; Jenkins, Kevin (2008). "5. The Mythical Metaman". Eating the IT Elephant: Moving from greenfield development to brownfield. Addison-Wesley. p. 93. ISBN   978-0-13-713012-2.
  5. Spinellis, Diomidis; Gousios, Georgios (2009). "2. A Tale of Two Systems § Lack of Cohesion". Beautiful Architecture. O'Reilly. p. 29. ISBN   978-0-596-51798-4.
  6. An early discussion is Grass, Judith E. (Winter 1992). "Object-Oriented Design Archaeology with CIA++" (PDF). Computing Systems. 5 (1).
  7. For example, the "32nd ACM/IEEE International Conference on Software Engineering". May 2010..
  8. 1 2 3 4 Hunt, Andy; Thomas, Dave (March–April 2002). "Software Archaeology" (PDF). IEEE Software. 19 (2): 20–22. doi:10.1109/52.991327.
  9. Cunningham, Ward (2001). "Signature Survey: A Method for Browsing Unfamiliar Code". Workshop Position Statement, Software Archeology: Understanding Large Systems, OOPSLA 2001.
  10. Cook, John D. (10 November 2009). "Software Archeology". The Endeavour.
  11. de Souza, Cleidson; Froehlich, Jon; Dourish, Paul (2005). "Seeking the Source: Software Source Code as a Social and Technical Artifact" (PDF). Proceedings of the 2005 International ACM SIGGROUP Conference on Supporting Group Work. pp. 197–206. doi:10.1145/1099203.1099239. ISBN   1595932232.
  12. 1 2 Rozlog, Michael (28 January 2008). "Software Archeology: What Is It and Why Should Java Developers Care?". java.sys-con.com.
  13. Sharwood, Simon (3 November 2004). "Raiders of the Lost Code". ZDNet.
  14. Rees, Gareth (2013-06-12). "Software archaeology and technical debt".