Software archaeology

Last updated

Software archaeology or source code archeology is the study of poorly documented or undocumented legacy software implementations, as part of software maintenance. [1] [2] Software archaeology, named by analogy with archaeology, [3] includes the reverse engineering of software modules, and the application of a variety of tools and processes for extracting and understanding program structure and recovering design information. [1] [4] Software archaeology may reveal dysfunctional team processes which have produced poorly designed or even unused software modules, and in some cases deliberately obfuscatory code may be found. [5] The term has been in use for decades. [6]

Contents

Software archaeology has continued to be a topic of discussion at more recent software engineering conferences. [7]

Techniques

A workshop on Software Archaeology at the 2001 OOPSLA (Object-Oriented Programming, Systems, Languages & Applications) conference identified the following software archaeology techniques, some of which are specific to object-oriented programming: [8]

More generally, Andy Hunt and Dave Thomas note the importance of version control, dependency management, text indexing tools such as GLIMPSE and SWISH-E, and "[drawing] a map as you begin exploring." [8]

Like true archaeology, software archaeology involves investigative work to understand the thought processes of one's predecessors. [8] At the OOPSLA workshop, Ward Cunningham suggested a synoptic signature analysis technique which gave an overall "feel" for a program by showing only punctuation, such as semicolons and curly braces. [9] In the same vein, Cunningham has suggested viewing programs in 2 point font in order to understand the overall structure. [10] Another technique identified at the workshop was the use of aspect-oriented programming tools such as AspectJ to systematically introduce tracing code without directly editing the legacy program. [8]

Network and temporal analysis techniques can reveal the patterns of collaborative activity by the developers of legacy software, which in turn may shed light on the strengths and weaknesses of the software artifacts produced. [11]

Michael Rozlog of Embarcadero Technologies has described software archaeology as a six-step process which enables programmers to answer questions such as "What have I just inherited?" and "Where are the scary sections of the code?" [12] These steps, similar to those identified by the OOPSLA workshop, include using visualization to obtain a visual representation of the program's design, using software metrics to look for design and style violations, using unit testing and profiling to look for bugs and performance bottlenecks, and assembling design information recovered by the process. [12] Software archaeology can also be a service provided to programmers by external consultants. [13]

The profession of programmer–archaeologist features prominently in Vernor Vinge's 1999 sci-fi novel A Deepness in the Sky. [14]

See also

Related Research Articles

Computer programming is the process of performing a particular computation, usually by designing and building an executable computer program. Programming involves tasks such as analysis, generating algorithms, profiling algorithms' accuracy and resource consumption, and the implementation of algorithms. The source code of a program is written in one or more languages that are intelligible to programmers, rather than machine code, which is directly executed by the central processing unit. The purpose of programming is to find a sequence of instructions that will automate the performance of a task on a computer, often for solving a given problem. Proficient programming thus usually requires expertise in several different subjects, including knowledge of the application domain, specialized algorithms, and formal logic.

<span class="mw-page-title-main">Pair programming</span> Collaborative technique for software development

Pair programming is an software development technique in which two programmers work together at one workstation. One, the driver, writes code while the other, the observer or navigator, reviews each line of code as it is typed in. The two programmers switch roles frequently.

In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the factoring—without changing its external behavior. Refactoring is intended to improve the design, structure, and/or implementation of the software, while preserving its functionality. Potential advantages of refactoring may include improved code readability and reduced complexity; these can improve the source code's maintainability and create a simpler, cleaner, or more expressive internal architecture or object model to improve extensibility. Another potential goal for refactoring is improved performance; software engineers face an ongoing challenge to write programs that perform faster or use less memory.

In computer science, static program analysis is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution.

In software engineering, a software design pattern is a general, reusable solution to a commonly occurring problem within a given context in software design. It is not a finished design that can be transformed directly into source or machine code. Rather, it is a description or template for how to solve a problem that can be used in many different situations. Design patterns are formalized best practices that the programmer can use to solve common problems when designing an application or system.

The following outline is provided as an overview of and topical guide to software engineering:

A programming tool or software development tool is a computer program that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs, that can be combined to accomplish a task, much as one might use multiple hands to fix a physical object. The most basic tools are a source code editor and a compiler or interpreter, which are used ubiquitously and continuously. Other tools are used more or less depending on the language, development methodology, and individual engineer, often used for a discrete task, like a debugger or profiler. Tools may be discrete programs, executed separately – often from the command line – or may be parts of a single large program, called an integrated development environment (IDE). In many cases, particularly for simpler use, simple ad hoc techniques are used instead of a tool, such as print debugging instead of using a debugger, manual timing instead of a profiler, or tracking bugs in a text file or spreadsheet instead of a bug tracking system.

In computer science, the term automatic programming identifies a type of computer programming in which some mechanism generates a computer program to allow human programmers to write the code at a higher abstraction level.

In software engineering, continuous integration (CI) is the practice of merging all developers' working copies to a shared mainline several times a day that triggers automated build with testing. Grady Booch first proposed the term CI in his 1991 method, although he did not advocate integrating several times a day. Extreme programming (XP) adopted the concept of CI and did advocate integrating more than once per day – perhaps as many as tens of times per day.

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

Object-oriented analysis and design (OOAD) is a technical approach for analyzing and designing an application, system, or business by applying object-oriented programming, as well as using visual modeling throughout the software development process to guide stakeholder communication and product quality.

Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior—and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.

Legacy modernization, also known as software modernization or platform modernization, refers to the conversion, rewriting or porting of a legacy system to modern computer programming languages, architectures, software libraries, protocols or hardware platforms. Legacy transformation aims to retain and extend the value of the legacy investment through migration to new platforms to benefit from the advantage of the new technologies.

In software engineering, team programming is a project management strategy for coordinating task distribution in computer software development projects, which involves the assignment of two or more computer programmers to work collaboratively on an individual sub-task within a larger programming project. In general, the manner in which this term is used today refers to methods currently in vogue within the software development industry where multiple individuals work simultaneously on the same activity; in these systems, programmers are often grouped in pairs at the same computer workstation, one observing the other working on the software and alternating roles at time intervals.

In computing, subject-oriented programming is an object-oriented software paradigm in which the state (fields) and behavior (methods) of objects are not seen as intrinsic to the objects themselves, but are provided by various subjective perceptions ("subjects") of the objects. The term and concepts were first published in September 1993 in a conference paper which was later recognized as being one of the three most influential papers to be presented at the conference between 1986 and 1996. As illustrated in that paper, an analogy is made with the contrast between the philosophical views of Plato and Kant with respect to the characteristics of "real" objects, but applied to software ones. For example, while we may all perceive a tree as having a measurable height, weight, leaf-mass, etc., from the point of view of a bird, a tree may also have measures of relative value for food or nesting purposes, or from the point of view of a tax-assessor, it may have a certain taxable value in a given year. Neither the bird's nor the tax-assessor's additional state information need be seen as intrinsic to the tree, but are added by the perceptions of the bird and tax-assessor, and from Kant's analysis, the same may be true even of characteristics we think of as intrinsic.

Reverse engineering is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accomplishes a task with very little insight into exactly how it does so. It is essentially the process of opening up or dissecting a system to see how it works, in order to duplicate or enhance it. Depending on the system under consideration and the technologies employed, the knowledge gained during reverse engineering can help with repurposing obsolete objects, doing security analysis, or learning how something works.

BORO is an approach to developing ontological or semantic models for large complex operational applications that consists of a top ontology as well as a process for constructing the ontology. It was originally developed as a method for mining ontologies from multiple legacy systems – as the first stage in an architectural transformation or software modernization. It has also been used to enable semantic interoperability between legacy systems. It is described in detail in. It is the analysis method used in the development and maintenance of the U.S. Department of Defense Architecture Framework (DoDAF) Meta Model (DM2), where a data modeling working group of over 350 members was able to systematically resolve a broad spectrum of knowledge representation issues.

In computer programming and software development, debugging is the process of finding and resolving bugs within computer programs, software, or systems.

<span class="mw-page-title-main">API</span> Software interface between computers and/or programs

An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build or use such a connection or interface is called an API specification. A computer system that meets this standard is said to implement or expose an API. The term API may refer either to the specification or to the implementation.

Software diagnosis refers to concepts, techniques, and tools that allow for obtaining findings, conclusions, and evaluations about software systems and their implementation, composition, behaviour, and evolution. It serves as means to monitor, steer, observe and optimize software development, software maintenance, and software re-engineering in the sense of a business intelligence approach specific to software systems. It is generally based on the automatic extraction, analysis, and visualization of corresponding information sources of the software system. It can also be manually done and not automatic.

References

  1. 1 2 Robles, Gregorio; Gonzalez-Barahona, Jesus M.; Herraiz, Israel (2005). "An Empirical Approach to Software Archaeology" (PDF). Poster Proceedings of the International Conference on Software Maintenance.
  2. Ambler, Scott W. "Agile Legacy System Analysis and Integration Modeling". agilemodeling.com. Retrieved 2010-08-20. Without accurate documentation, or access to knowledgeable people, your last resort may be to analyze the source code for the legacy system... This effort is often referred to as software archaeology.
  3. Moyer, Bryon (4 March 2009). "Software Archeology: Modernizing Old Systems" (PDF). Embedded Technology Journal.
  4. Hopkins, Richard; Jenkins, Kevin (2008). "5. The Mythical Metaman". Eating the IT Elephant: Moving from greenfield development to brownfield. Addison-Wesley. p. 93. ISBN   978-0-13-713012-2.
  5. Spinellis, Diomidis; Gousios, Georgios (2009). "2. A Tale of Two Systems § Lack of Cohesion". Beautiful Architecture. O'Reilly. p. 29. ISBN   978-0-596-51798-4.
  6. An early discussion is Grass, Judith E. (Winter 1992). "Object-Oriented Design Archaeology with CIA++" (PDF). Computing Systems. 5 (1).
  7. For example, the "32nd ACM/IEEE International Conference on Software Engineering". May 2010..
  8. 1 2 3 4 Hunt, Andy; Thomas, Dave (March–April 2002). "Software Archaeology" (PDF). IEEE Software. 19 (2): 20–22. doi:10.1109/52.991327.
  9. Cunningham, Ward (2001). "Signature Survey: A Method for Browsing Unfamiliar Code". Workshop Position Statement, Software Archeology: Understanding Large Systems, OOPSLA 2001.
  10. Cook, John D. (10 November 2009). "Software Archeology". The Endeavour.
  11. de Souza, Cleidson; Froehlich, Jon; Dourish, Paul (2005). "Seeking the Source: Software Source Code as a Social and Technical Artifact" (PDF). Proceedings of the 2005 International ACM SIGGROUP Conference on Supporting Group Work. pp. 197–206. doi:10.1145/1099203.1099239. ISBN   1595932232.
  12. 1 2 Rozlog, Michael (28 January 2008). "Software Archeology: What Is It and Why Should Java Developers Care?". java.sys-con.com.
  13. Sharwood, Simon (3 November 2004). "Raiders of the Lost Code". ZDNet.
  14. Rees, Gareth (2013-06-12). "Software archaeology and technical debt".