Programming complexity

Last updated October 05, 2024

Programming complexity (or software complexity) is a term that includes software properties that affect internal interactions. Several commentators distinguish between the terms "complex" and "complicated". Complicated implies being difficult to understand, but ultimately knowable. Complex, by contrast, describes the interactions between entities. As the number of entities increases, the number of interactions between them increases exponentially, making it impossible to know and understand them all. Similarly, higher levels of complexity in software increase the risk of unintentionally interfering with interactions, thus increasing the risk of introducing defects when changing the software. In more extreme cases, it can make modifying the software virtually impossible.

The idea of linking software complexity to software maintainability has been explored extensively by Professor Manny Lehman, who developed his Laws of Software Evolution. He and his co-author Les Belady explored numerous software metrics that could be used to measure the state of software, eventually concluding that the only practical solution is to use deterministic complexity models.^[1]

Types

The complexity of an existing program determines the complexity of changing the program. Problem complexity can be divided into two categories:^[2]

Accidental complexity relates to difficulties a programmer faces due to the software engineering tools. Selecting a better tool set or a higher-level programming language may reduce it. Accidental complexity often results from not using the domain to frame the form of the solution.^{[ citation needed ]} Domain-driven design can help minimize accidental complexity.
Essential complexity is caused by the characteristics of the problem to be solved and cannot be reduced.

Measures

Several measures of software complexity have been proposed. Many of these, although yielding a good representation of complexity, do not lend themselves to easy measurement. Some of the more commonly used metrics are

McCabe's cyclomatic complexity metric
Halstead's software science metrics
Henry and Kafura introduced "Software Structure Metrics Based on Information Flow" in 1981,^[3] which measures complexity as a function of "fan-in" and "fan-out". They define fan-in of a procedure as the number of local flows into that procedure plus the number of data structures from which that procedure retrieves information. Fan-out is defined as the number of local flows out of that procedure plus the number of data structures that the procedure updates. Local flows relate to data passed to, and from procedures that call or are called by, the procedure in question. Henry and Kafura's complexity value is defined as "the procedure length multiplied by the square of fan-in multiplied by fan-out" (Length ×(fan-in × fan-out)²).
Chidamber and Kemerer introduced "A Metrics Suite for Object-Oriented Design" in 1994,^[4] focusing on metrics for object-oriented code. They introduce six OO complexity metrics: (1) weighted methods per class; (2) coupling between object classes; (3) response for a class; (4) number of children; (5) depth of inheritance tree; and (6) lack of cohesion of methods.

Several other metrics can be used to measure programming complexity:

Branching complexity (Sneed Metric)
Data access complexity (Card Metric)
Data complexity (Chapin Metric)
Data flow complexity (Elshof Metric)
Decisional complexity (McClure Metric)
Path Complexity (Bang Metric)

Tesler's Law is an adage in human–computer interaction stating that every application has an inherent amount of complexity that cannot be removed or hidden.

Chidamber and Kemerer Metrics

Chidamber and Kemerer^[4] proposed a set of programing complexity metrics widely used in measurements and academic articles: weighted methods per class, coupling between object classes, response for a class, number of children, depth of inheritance tree, and lack of cohesion of methods, described below:

Weighted methods per class ("WMC")
- $WMC=\sum _{i=1}^{n}c_{i}$
- n is the number of methods on the class
- $c_{i}$ is the complexity of the method
Coupling between object classes ("CBO")
- number of other class which is coupled (using or being used)
Response for a class ("RFC")
- $RFC=|RS|$ where
- $RS=\{M\}\cup _{all\ i}\{R_{i}\}$
- $R_{i}$ is set of methods called by method i
- $M$ is the set of methods in the class
Number of children ("NOC")
- sum of all classes that inherit this class or a descendant of it
Depth of inheritance tree ("DIT")
- maximum depth of the inheritance tree for this class
Lack of cohesion of methods ("LCOM")
- Measures the intersection of the attributes used in common by the class methods
- $LCOM={\begin{cases}|P|-|Q|,&{\text{if }}|P|>|Q|\\0,&{\text{otherwise }}\end{cases}}$
- Where $P=\{(I_{i},I_{j})|I_{i}\cap I_{j}=\emptyset \}$
- And $Q=\{(I_{i},I_{j})|I_{i}\cap I_{j}\neq \emptyset \}$
- With $I_{i}$ is the set of attributes (instance variables) accessed (read from or written to) by the $i$ -th method of the class

Related Research Articles

<span class="mw-page-title-main">Supervised learning</span> Paradigm in machine learning

Supervised learning (SL) is a paradigm in machine learning where input objects and a desired output value train a model. The training data is processed, building a function that maps new data to expected output values. An optimal scenario will allow for the algorithm to correctly determine output values for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way. This statistical quality of an algorithm is measured through the so-called generalization error.

In software engineering and development, a software metric is a standard of measure of a degree to which a software system or process possesses some property. Even if a metric is not a measurement, often the two terms are used as synonyms. Since quantitative measurements are essential in all sciences, there is a continuous effort by computer science practitioners and theoreticians to bring similar approaches to software development. The goal is obtaining objective, reproducible and quantifiable measurements, which may have numerous valuable applications in schedule and budget planning, cost estimation, quality assurance, testing, software debugging, software performance optimization, and optimal personnel task assignments.

The Liskov substitution principle (LSP) is a particular definition of a subtyping relation, called strong behavioral subtyping, that was initially introduced by Barbara Liskov in a 1987 conference keynote address titled Data abstraction and hierarchy. It is based on the concept of "substitutability" – a principle in object-oriented programming stating that an object may be replaced by a sub-object without breaking the program. It is a semantic rather than merely syntactic relation, because it intends to guarantee semantic interoperability of types in a hierarchy, object types in particular. Barbara Liskov and Jeannette Wing described the principle succinctly in a 1994 paper as follows:

Subtype Requirement: Let ⁠ $⁠$ be a property provable about objects ⁠ $⁠$ of type $T$ . Then ⁠ $⁠$ should be true for objects ⁠ $⁠$ of type $S$ where $S$ is a subtype of $T$ .

<span class="mw-page-title-main">Multidimensional scaling</span> Set of related ordination techniques used in information visualization

Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of $objects in a set into a configuration of points mapped into an abstract Cartesian space.$

In data mining and statistics, hierarchical clustering is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two categories:

In computer programming, cohesion refers to the degree to which the elements inside a module belong together. In one sense, it is a measure of the strength of relationship between the methods and data of a class and some unifying purpose or concept served by that class. In another sense, it is a measure of the strength of relationship between the class's methods and data.

A blower door is a machine used to perform a building air leakage test. It can also be used to measure airflow between building zones, to test ductwork airtightness and to help physically locate air leakage sites in the building envelope.

In computing, an interface is a shared boundary across which two or more separate components of a computer system exchange information. The exchange can be between software, computer hardware, peripheral devices, humans, and combinations of these. Some computer hardware devices, such as a touchscreen, can both send and receive data through the interface, while others such as a mouse or microphone may only provide an interface to send data to a given system.

<span class="mw-page-title-main">Cluster analysis</span> Grouping a set of objects by similarity

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning.

Class-based programming, or more commonly class-orientation, is a style of object-oriented programming (OOP) in which inheritance occurs via defining classes of objects, instead of inheritance occurring via the objects alone.

Cyclomatic complexity is a software metric used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program's source code. It was developed by Thomas J. McCabe, Sr. in 1976.

In the context of software engineering, software quality refers to two related but distinct notions:

<span class="mw-page-title-main">Coupling (computer programming)</span> Degree of interdependence between software modules

In software engineering, coupling is the degree of interdependence between software modules; a measure of how closely connected two routines or modules are; the strength of the relationships between modules. Coupling is not binary but it is multi-dimensional.

This is an alphabetical list of articles pertaining specifically to software engineering.

Glossary of Unified Modeling Language (UML) terms provides a compilation of terminology used in all versions of UML, along with their definitions. Any notable distinctions that may exist between versions are noted with the individual entry it applies to.

IDEF4, or Integrated DEFinition for Object-Oriented Design, is an object-oriented design modeling language for the design of component-based client/server systems. It has been designed to support smooth transition from the application domain and requirements analysis models to the design and to actual source code generation. It specifies design objects with sufficient detail to enable source code generation.

The Davies–Bouldin index (DBI), introduced by David L. Davies and Donald W. Bouldin in 1979, is a metric for evaluating clustering algorithms. This is an internal evaluation scheme, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset. This has a drawback that a good value reported by this method does not imply the best information retrieval.

<span class="mw-page-title-main">Object-oriented programming</span> Programming paradigm based on the concept of objects

Object-oriented programming (OOP) is a programming paradigm based on the concept of objects, which can contain data and code: data in the form of fields, and code in the form of procedures. In OOP, computer programs are designed by making them out of objects that interact with one another.

Weighted Micro Function Points (WMFP) is a modern software sizing algorithm which is a successor to solid ancestor scientific methods as COCOMO, COSYSMO, maintainability index, cyclomatic complexity, function points, and Halstead complexity. It produces more accurate results than traditional software sizing methodologies, while requiring less configuration and knowledge from the end user, as most of the estimation is based on automatic measurements of an existing source code.

Software construction is a software engineering discipline. It is the detailed creation of working meaningful software through a combination of coding, verification, unit testing, integration testing, and debugging. It is linked to all the other software engineering disciplines, most strongly to software design and software testing.

References

↑ MM Lehmam LA Belady; Program Evolution - Processes of Software Change 1985
↑ In software engineering, a problem can be divided into its accidental and essential complexity [1].
↑ Henry, S.; Kafura, D. IEEE Transactions on Software Engineering Volume SE-7, Issue 5, Sept. 1981 Page(s): 510 - 518
1 2 Chidamber, S.R.; Kemerer, C.F. IEEE Transactions on Software Engineering Volume 20, Issue 6, Jun 1994 Page(s):476 - 493

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] MM Lehmam LA Belady; Program Evolution - Processes of Software Change 1985

[2] In software engineering, a problem can be divided into its accidental and essential complexity [1].

[3] Henry, S.; Kafura, D. IEEE Transactions on Software Engineering Volume SE-7, Issue 5, Sept. 1981 Page(s): 510 - 518

[:0-4] 1 2 Chidamber, S.R.; Kemerer, C.F. IEEE Transactions on Software Engineering Volume 20, Issue 6, Jun 1994 Page(s):476 - 493

[1]

[2]

[3]

[4]