Software aging

Last updated February 13, 2024

In software engineering, software aging is the tendency for software to fail or cause a system failure after running continuously for a certain time, or because of ongoing changes in systems surrounding the software. Software aging has several causes, including the inability of old software to adapt to changing needs or changing technology platforms, and the tendency of software patches to introduce further errors. As the software gets older it becomes less well-suited to its purpose and will eventually stop functioning as it should. Rebooting or reinstalling the software can act as a short-term fix.^[1] A proactive fault management method to deal with the software aging incident is software rejuvenation. This method can be classified as an environment diversity technique that usually is implemented through software rejuvenation agents (SRA).

Proactive management of software aging

Software aging

Software failures are a more likely cause of unplanned systems outages compared to hardware failures.^[5]^[6] This is because software exhibits over time an increasing failure rate due to data corruption, numerical error accumulation and unlimited resource consumption. In widely used and specialized software, a common action to clear a problem is rebooting because aging occurs due to the complexity of software which is never free of errors. It is almost impossible to fully verify that a piece of software is bug-free. Even high-profile software such as Windows and macOS must receive continual updates to improve performance and fix bugs. Software development tends to be driven by the need to meet release deadlines rather than to ensure long-term reliability.^[7] Designing software that can be immune to aging is difficult. Not all software will age at the same rate as some users use the system more intensively than others.^[8]

Rejuvenation

To prevent crashes or degradation, software rejuvenation can be employed proactively as inevitable aging leads to failures in software systems. This proactive technique was identified as a cost-effective solution during research at the AT&T Bell Laboratories on fault-tolerant software in the 1990s.^[9] Software rejuvenation works by removing accumulated error conditions and freeing up system resources, for example by flushing operating system kernel tables, using garbage collection, reinitializing internal data structures, and perhaps the most well known rejuvenation method is to reboot the system.

There are simple techniques and complex techniques to achieve rejuvenation. The method most individuals are familiar with is the hardware or software reboot. A more technical example would be the web server software Apache's rejuvenation method. Apache implements one form of rejuvenation by killing and recreating processes after serving a certain number of requests.^[10] Another technique is to restart virtual machines running in a cloud computing environment.^[11]

The multinational telecommunications corporation AT&T has implemented software rejuvenation in the real time system collecting billing data in the United States for most telephone exchanges.^[12]

Some systems which have employed software rejuvenation methods include:^[13]

Transaction processing systems
Web servers
Spacecraft systems

The IEEE International Symposium on Software Reliability Engineering (ISSRE) hosted the 5th annual International Workshop on Software Aging and Rejuvenation (woSAR) in 2013. Topics included:

Design, implementation, and evaluation of rejuvenation mechanisms
Modeling, analysis, and implementation of rejuvenation scheduling
Software rejuvenation benchmarking

Memory leaks

Some programming languages, like C and C++, allow the programmer to allocate heap memory. Moreover, the programmer may be required to free the memory when the memory is no longer needed. Freeing the memory is necessary because some operating systems (OS) don't perform garbage collection when a process finishes. Over time, this is likely to consume more and more memory, eventually causing the computer to run out of memory.^[14] In low memory conditions, the computer usually functions slower due to intense swapping and thrashing. When this happens, applications become sluggish or even unresponsive. If the computer runs out of both memory and swap space, the OS might automatically reboot — or even worse hang.^[15]

Programs written in programming languages that use a garbage collector (e.g. Java) are less prone to memory leaks, since memory that is no longer referenced will be freed up by the garbage collector. This however does not mean it's impossible to write code that leaks memory in such languages.

Sometimes critical components of the OS itself can be a source of memory leaks. In Microsoft Windows, for example, the memory use of a Windows Explorer plug-in might drain the available memory to the point of making the entire computer unusable. A reboot might be needed.^[16]

Implementation

Two methods for implementing rejuvenation are:

Time based rejuvenation
Prediction based rejuvenation

Memory bloating

Garbage collection is a form of automatic memory management whereby the system automatically recovers unused memory. For example, the .NET Framework manages the allocation and release of memory for software running under it. But automatically tracking these objects takes time and is not perfect.

.NET based web services manage several logical types of memory such as stack, unmanaged and managed heap (free space). As the physical memory gets full, the OS writes rarely used parts of it to disk, so that it can reallocate it to another application, a process known as paging or swapping. But if the memory does need to be used, it must be reloaded from disk. If several applications are all making large demands, the OS can spend much of its time merely moving data between main memory and disk, a process known as disk thrashing.^[17] Since the garbage collector has to examine all of the allocations to decide which are in use, it may exacerbate this thrashing. As a result, extensive swapping can lead to garbage collection cycles extended from milliseconds to tens of seconds. This results in usability problems.

Related Research Articles

Software architecture is the set of structures needed to reason about a software system and the discipline of creating such structures and systems. Each structure comprises software elements, relations among them, and properties of both elements and relations.

In computing, a crash, or system crash, occurs when a computer program such as a software application or an operating system stops functioning properly and exits. On some operating systems or individual applications, a crash reporting service will report the crash and any details relating to it, usually to the developer(s) of the application. If the program is a critical part of the operating system, the entire system may crash or hang, often resulting in a kernel panic or fatal system error.

Code review is a software quality assurance activity in which one or more people check a program, mainly by viewing and reading parts of its source code, either after implementation or as an interruption of implementation. At least one of the persons must not have authored the code. The persons performing the checking, excluding the author, are called "reviewers".

A software requirements specification (SRS) is a description of a software system to be developed. It is modeled after the business requirements specification (CONOPS). The software requirements specification lays out functional and non-functional requirements, and it may include a set of use cases that describe user interactions that the software must provide to the user for perfect interaction.

ASCI Red was the first computer built under the Accelerated Strategic Computing Initiative (ASCI), the supercomputing initiative of the United States government created to help the maintenance of the United States nuclear arsenal after the 1992 moratorium on nuclear testing.

In the context of software engineering, software quality refers to two related but distinct notions:

Software visualization or software visualisation refers to the visualization of information of and related to software systems—either the architecture of its source code or metrics of their runtime behavior—and their development process by means of static, interactive or animated 2-D or 3-D visual representations of their structure, execution, behavior, and evolution.

A software regression is a type of software bug where a feature that has worked before stops working. This may happen after changes are applied to the software's source code, including the addition of new features and bug fixes. They may also be introduced by changes to the environment in which the software is running, such as system upgrades, system patching or a change to daylight saving time. A software performance regression is a situation where the software still functions correctly, but performs more slowly or uses more memory or resources than before. Various types of software regressions have been identified in practice, including the following:

Thread Level Speculation (TLS), also known as Speculative Multi-threading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

<span class="mw-page-title-main">Özalp Babaoğlu</span> Turkish computer scientist (born 1955)

Özalp Babaoğlu, is a Turkish computer scientist. He is currently professor of computer science at the University of Bologna, Italy. He received a Ph.D. in 1981 from the University of California at Berkeley. He is the recipient of 1982 Sakrison Memorial Award, 1989 UNIX InternationalRecognition Award and 1993 USENIX AssociationLifetime Achievement Award for his contributions to the UNIX system community and to Open Industry Standards. Before moving to Bologna in 1988, Babaoğlu was an associate professor in the Department of Computer Science at Cornell University. He has participated in several European research projects in distributed computing and complex systems. Babaoğlu is an ACM Fellow and has served as a resident fellow of the Institute of Advanced Studies at the University of Bologna and on the editorial boards for ACM Transactions on Computer Systems, ACM Transactions on Autonomous and Adaptive Systems and Springer-Verlag Distributed Computing.

Search-based software engineering (SBSE) applies metaheuristic search techniques such as genetic algorithms, simulated annealing and tabu search to software engineering problems. Many activities in software engineering can be stated as optimization problems. Optimization techniques of operations research such as linear programming or dynamic programming are often impractical for large scale software engineering problems because of their computational complexity or their assumptions on the problem structure. Researchers and practitioners use metaheuristic search techniques, which impose little assumptions on the problem structure, to find near-optimal or "good-enough" solutions.

<span class="mw-page-title-main">Jeff Offutt</span> American academic computer scientist

Jeff Offutt is a professor of Software Engineering at the University at Albany, SUNY. His primary interests are software testing and analysis, web software engineering, and software evolution and change-impact analysis.

Kathryn S. McKinley is an American computer scientist noted for her research on compilers, runtime systems, and computer architecture. She is also known for her leadership in broadening participation in computing. McKinley was co-chair of CRA-W from 2011 to 2014.

<span class="mw-page-title-main">Roy Billinton</span>

Roy Billinton is a Canadian scholar and a Distinguished Emeritus Professor at the University of Saskatchewan, Saskatoon, Saskatchewan, Canada. In 2008, Billinton won the IEEE Canada Electric Power Medal for his research and application of reliability concepts in electric power system. In 2007, Billinton was elected a Foreign Associate of the United States National Academy of Engineering for "contributions to teaching, research and application of reliability engineering in electric power generation, transmission, and distribution systems."

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.

Sergiy A. Vilkomir was a Ukrainian-born computer scientist.

Arun K. Somani is Associate Dean for Research of College of Engineering, Distinguished Professor of Electrical and Computer Engineering and Philip and Virginia Sproul Professor at Iowa State University. Somani is Elected Fellow of Institute of Electrical and Electronics Engineers (IEEE) for “contributions to theory and applications of computer networks” from 1999 to 2017 and Life Fellow of IEEE since 2018. He is Distinguished Engineer of Association for Computing Machinery(ACM) and Elected Fellow of The American Association for the Advancement of Science(AAAS).

Kishor Shridharbhai Trivedi is an Indian-American computer scientist who is currently the Hudson Chaired Professor in department of electrical and computer engineering at Duke University.

Trevor Mudge is a computer scientist, academic and researcher. He is the Bredt Family Chair of Computer Science and Engineering, and Professor of Electrical Engineering and Computer Science at the University of Michigan.

References

↑ Shereshevsky, M.; Crowell, J.; Cukic, B.; Gandikota, V.; Yan Liu (2003). "Software aging and multifractality of memory resources". 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings. pp. 721–730. doi:10.1109/DSN.2003.1209987. ISBN 978-0-7695-1952-4. S2CID 18697750.
↑ Parnas, D.L. (1994). "Software aging". Proceedings of 16th International Conference on Software Engineering. pp. 279–287. doi:10.1109/ICSE.1994.296790. ISBN 978-0-8186-5855-6. S2CID 790287.
↑ "Software Aging | the morning paper". 2014-10-14. Retrieved 2024-02-12.
↑ Grottke, Michael; Matias, Rivalino; Trivedi, Kishor S. (2008). "The fundamentals of software aging". 2008 IEEE International Conference on Software Reliability Engineering Workshops (ISSRE WKSP). pp. 1–6. doi:10.1109/ISSREW.2008.5355512. ISBN 978-1-4244-3416-9. S2CID 11527276.
↑ "Oatd: -".
↑ Garg, S.; Van Moorsel, A.; Vaidyanathan, K.; Trivedi, K.S. (1998). "A methodology for detection and estimation of software aging". Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257). pp. 283–292. doi:10.1109/ISSRE.1998.730892. ISBN 978-0-8186-8991-8. S2CID 8696313.
↑ Castelli, V.; Harper, R.E.; Heidelberger, P.; Hunter, S.W.; Trivedi, K.S.; Vaidyanathan, K.; Zeggert, W.P. (2001-03-01). "Proactive management of software aging". IBM Journal of Research and Development. 45 (2): 311–332. CiteSeerX 10.1.1.28.7273 . doi:10.1147/rd.452.0311. ISSN 0018-8646.
↑ Gross, K.C.; Bhardwaj, V.; Bickford, R. (2003). "Proactive detection of software aging mechanisms in performance critical computers". 27th Annual NASA Goddard/IEEE Software Engineering Workshop, 2002. Proceedings. pp. 17–23. doi:10.1109/SEW.2002.1199445. ISBN 978-0-7695-1855-8. S2CID 17167955.
↑ Cotroneo, D., Natella, R., Pietrantuono, R., and Russo, S. 2014. A survey of software aging and rejuvenation studies. ACM J. Emerg. Technol. Comput. Syst. 10, 1, Article 8 (January 2014), 34 pages.
↑ Trivedi, K. S. and Vaidyanathan, K. 2007. Software Aging and Rejuvenation. Wiley Encyclopedia of Computer Science and Engineering.
↑ Bruneo, Dario; Distefano, Salvatore; Longo, Francesco; Puliafito, Antonio; Scarpa, Marco (2013). "Workload-Based Software Rejuvenation in Cloud Systems". IEEE Transactions on Computers. 62 (6): 1072–1085. doi:10.1109/TC.2013.30. S2CID 23981532.
↑ Trivedi, Kishor S.; Vaidyanathan, Kalyanaraman (2004-01-01). Reis, Ricardo (ed.). Software Rejuvenation - Modeling and Analysis. IFIP International Federation for Information Processing. Springer US. pp. 151–182. doi:10.1007/1-4020-8159-6_6. ISBN 978-1-4020-8158-3.
↑ Lei Li; Vaidyanathan, K.; Trivedi, K.S. (2002). "An approach for estimation of software aging in a Web server". Proceedings International Symposium on Empirical Software Engineering. pp. 91–100. doi:10.1109/ISESE.2002.1166929. ISBN 978-0-7695-1796-4. S2CID 8170010.
↑ "Overview of Memory Leaks". msdn.microsoft.com. Retrieved 2015-11-04.
↑ Martin Brown; Ken Milberg (16 November 2010). "Optimizing AIX 7 memory performance Part 3, Tuning swap space settings". IBM .
↑ "Preventing Memory Leaks in Windows Applications (Windows)". msdn.microsoft.com. Retrieved 2015-11-04.
↑ S.R., Chaitra; Basu, Anirban (2012). "Software Rejuvenation in Web Services". International Journal of Computer Applications. 54 (8): 31–35. Bibcode:2012IJCA...54h..31S. doi: 10.5120/8589-2340 .

Software aging

Contents

Proactive management of software aging

Software aging

Rejuvenation

Memory leaks

Implementation

Memory bloating

See also

Related Research Articles

References

Further reading