Halstead complexity measures

Last updated January 05, 2024

Halstead complexity measures are software metrics introduced by Maurice Howard Halstead in 1977^[1] as part of his treatise on establishing an empirical science of software development. Halstead made the observation that metrics of the software should reflect the implementation or expression of algorithms in different languages, but be independent of their execution on a specific platform. These metrics are therefore computed statically from the code.

Halstead's goal was to identify measurable properties of software, and the relations between them. This is similar to the identification of measurable properties of matter (like the volume, mass, and pressure of a gas) and the relationships between them (analogous to the gas equation). Thus his metrics are actually not just complexity metrics.

Calculation

For a given problem, let:

$\,\eta _{1}$ = the number of distinct operators
$\,\eta _{2}$ = the number of distinct operands
$\,N_{1}$ = the total number of operators
$\,N_{2}$ = the total number of operands

From these numbers, several measures can be calculated:

Program vocabulary: $\eta =\eta _{1}+\eta _{2}\,$
Program length: $N=N_{1}+N_{2}\,$
Calculated estimated program length: ${\hat {N}}=\eta _{1}\log _{2}\eta _{1}+\eta _{2}\log _{2}\eta _{2}$
Volume: $V=N\times \log _{2}\eta$
Difficulty : $D={\eta _{1} \over 2}\times {N_{2} \over \eta _{2}}$
Effort: $E=D\times V$

The difficulty measure is related to the difficulty of the program to write or understand, e.g. when doing code review.

The effort measure translates into actual coding time using the following relation,

Time required to program: $T={E \over 18}$ seconds

Halstead's delivered bugs (B) is an estimate for the number of errors in the implementation.

Number of delivered bugs : $B={E^{2 \over 3} \over 3000}$ or, more recently, $B={V \over 3000}$ is accepted.^[1]

Example

Consider the following C program:

main(){inta,b,c,avg;scanf("%d %d %d",&a,&b,&c);avg=(a+b+c)/3;printf("avg = %d",avg);}

The distinct operators ( $\,\eta _{1}$ ) are: main, (), {}, int, scanf, &, =, +, /, printf, ,, ;

The distinct operands ( $\,\eta _{2}$ ) are: a, b, c, avg, "%d %d %d", 3, "avg = %d"

$\eta _{1}=12$ , $\eta _{2}=7$ , $\eta =19$
$N_{1}=27$ , $N_{2}=15$ , $N=42$
Calculated Estimated Program Length: ${\hat {N}}=12\times log_{2}12+7\times log_{2}7=62.67$
Volume: $V=42\times log_{2}19=178.4$
Difficulty: $D={12 \over 2}\times {15 \over 7}=12.85$
Effort: $E=12.85\times 178.4=2292.44$
Time required to program: $T={2292.44 \over 18}=127.357$ seconds
Number of delivered bugs: $B={2292.44^{2 \over 3} \over 3000}=0.05$

Related Research Articles

Specific detectivity, or D*, for a photodetector is a figure of merit used to characterize performance, equal to the reciprocal of noise-equivalent power (NEP), normalized per square root of the sensor's area and frequency bandwidth.

In category theory, a branch of mathematics, a natural transformation provides a way of transforming one functor into another while respecting the internal structure of the categories involved. Hence, a natural transformation can be considered to be a "morphism of functors". Informally, the notion of a natural transformation states that a particular map between functors can be done consistently over an entire category.

<span class="mw-page-title-main">Johnson–Nyquist noise</span> Electronic noise due to thermal vibration within a conductor

Johnson–Nyquist noise is the electronic noise generated by the thermal agitation of the charge carriers inside an electrical conductor at equilibrium, which happens regardless of any applied voltage. Thermal noise is present in all electrical circuits, and in sensitive electronic equipment can drown out weak signals, and can be the limiting factor on sensitivity of electrical measuring instruments. Thermal noise increases with temperature. Some sensitive electronic equipment such as radio telescope receivers are cooled to cryogenic temperatures to reduce thermal noise in their circuits. The generic, statistical physical derivation of this noise is called the fluctuation-dissipation theorem, where generalized impedance or generalized susceptibility is used to characterize the medium.

In mathematical physics, Minkowski space combines inertial space and time manifolds with a non-inertial reference frame of space and time into a four-dimensional model relating a position to the field.

In mathematics, the Hodge star operator or Hodge star is a linear map defined on the exterior algebra of a finite-dimensional oriented vector space endowed with a nondegenerate symmetric bilinear form. Applying the operator to an element of the algebra produces the Hodge dual of the element. This map was introduced by W. V. D. Hodge.

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. Sometimes loosely referred to as "the" exponential family, this class of distributions is distinct because they all possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

In mathematics, the conformal group of an inner product space is the group of transformations from the space to itself that preserve angles. More formally, it is the group of transformations that preserve the conformal geometry of the space.

In mathematics, the Dedekind eta function, named after Richard Dedekind, is a modular form of weight 1/2 and is a function defined on the upper half-plane of complex numbers, where the imaginary part is positive. It also occurs in bosonic string theory.

In mathematics, in the area of analytic number theory, the Dirichlet eta function is defined by the following Dirichlet series, which converges for any complex number having real part > 0:

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

Bosonic string theory is the original version of string theory, developed in the late 1960s and named after Satyendra Nath Bose. It is so called because it contains only bosons in the spectrum.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

The maximal total range is the maximum distance an aircraft can fly between takeoff and landing. Powered aircraft range is limited by the aviation fuel energy storage capacity considering both weight and volume limits. Unpowered aircraft range depends on factors such as cross-country speed and environmental conditions. The range can be seen as the cross-country ground speed multiplied by the maximum time in the air. The fuel time limit for powered aircraft is fixed by the available fuel and rate of consumption.

In mathematics, the classical Kronecker limit formula describes the constant term at s = 1 of a real analytic Eisenstein series in terms of the Dedekind eta function. There are many generalizations of it to more complicated Eisenstein series. It is named for Leopold Kronecker.

The cover tree is a type of data structure in computer science that is specifically designed to facilitate the speed-up of a nearest neighbor search. It is a refinement of the Navigating Net data structure, and related to a variety of other data structures developed for indexing intrinsically low-dimensional data.

In mathematics and mathematical physics, raising and lowering indices are operations on tensors which change their type. Raising and lowering indices are a form of index manipulation in tensor expressions.

In graph theory, a random geometric graph (RGG) is the mathematically simplest spatial network, namely an undirected graph constructed by randomly placing N nodes in some metric space and connecting two nodes by a link if and only if their distance is in a given range, e.g. smaller than a certain neighborhood radius, r.

The iterative proportional fitting procedure is the operation of finding the fitted matrix $which is the closest to an initial matrix but with the row and column totals of a target matrix . The fitted matrix being of the form, where and are diagonal matrices such that has the margins of . Some algorithms can be chosen to perform biproportion. We have also the entropy maximization, information loss minimization or RAS which consists of factoring the matrix rows to match the specified row totals, then factoring its columns to match the specified column totals; each step usually disturbs the previous step's match, so these steps are repeated in cycles, re-adjusting the rows and columns in turn, until all specified marginal totals are satisfactorily approximated. However, all algorithms give the same solution. In three- or more-dimensional cases, adjustment steps are applied for the marginals of each dimension in turn, the steps likewise repeated in cycles.$

An ordinary fractal string $is a bounded, open subset of the real number line. Such a subset can be written as an at-most-countable union of connected open intervals with associated lengths written in non-increasing order; we also refer to as a fractal string. For example, is a fractal string corresponding to the Cantor set. A fractal string is the analogue of a one-dimensional "fractal drum," and typically the set has a boundary which corresponds to a fractal such as the Cantor set. The heuristic idea of a fractal string is to study a (one-dimensional) fractal using the "space around the fractal." It turns out that the sequence of lengths of the set itself is "intrinsic," in the sense that the fractal string itself contains information about the fractal to which it corresponds.$

A proper reference frame in the theory of relativity is a particular form of accelerated reference frame, that is, a reference frame in which an accelerated observer can be considered as being at rest. It can describe phenomena in curved spacetime, as well as in "flat" Minkowski spacetime in which the spacetime curvature caused by the energy–momentum tensor can be disregarded. Since this article considers only flat spacetime—and uses the definition that special relativity is the theory of flat spacetime while general relativity is a theory of gravitation in terms of curved spacetime—it is consequently concerned with accelerated frames in special relativity.

References

1 2 Halstead, Maurice H. (1977). Elements of Software Science. Amsterdam: Elsevier North-Holland, Inc. ISBN 0-444-00205-7.

External links

The Halstead metrics - Extensive discussion on the calculation and use of Halstead Metrics in an object-oriented environment (with specific reference to Java).
Calculation of Halstead metrics - Measurement of Halstead Metrics.
Explanation with a Sample Program - Example (on Page 6 of the PDF)
Script computing Halstead Metrics and using them for commented code detection
IBM
Calculator for computing Halstead metrics

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Elements_of_Software_Science-1] 1 2 Halstead, Maurice H. (1977). Elements of Software Science. Amsterdam: Elsevier North-Holland, Inc. ISBN 0-444-00205-7.

[1]