Monoculture (computer science)

Last updated December 15, 2023

In computer science, a monoculture is a community of computers that all run identical software. All the computer systems in the community thus have the same vulnerabilities, and, like agricultural monocultures, are subject to catastrophic failure in the event of a successful attack.^[1]

Overview

With the global trend of increased usage and reliance on computerized systems, some vendors supply solutions that are used throughout the industry (such as Microsoft Windows) - this forms algorithmic monocultures. Monocultures form naturally since they utilize economies of scale, it is cheaper to manufacture and distribute a single solution. Furthermore, by being used by a large community bugs are discovered relativity fast.

Like agricultural monocultures, algorithmic monocultures are not diverse, thus susceptible to correlated failures - a failure of many parts participating in the monoculture. In complete non-monocultures, where the outcome of all components are mutually independent thus un-correlated, the chance of catastrophic event (failure of all the parts in the monoculture) is the multiplication of each component failure probability (exponentially decreasing).

On the other end, perfect monocultures are completely correlated, thus have a single point of failure. This means that the chance of a catastrophic event is constant - the failure probably of the single component.

Examples

Since operating systems are used in almost every workstation they form monocultures. For example Dan Geer has argued that Microsoft is a monoculture, since a majority of the overall number of workstations connected to the Internet are running versions of the Microsoft Windows operating system, many of which are vulnerable to the same attacks.

Large monocultures can also arise from software libraries, for example the Log4Shell exploit in the popular Log4j library estimated to affect hundreds of millions of devices.^[2]

Individual level concerns

The concept is significant when discussing computer security and viruses, the main threat is exposure to security vulnerabilities. Since monocultures are not diverse, any vulnerability found exists in all the individual members of the monoculture increasing the risk of exploitation.^[3] An example to that is exploit Wednesday in which after Windows security patches are released there is an increase exploitation events on not updated machines.

Clifford Stoll wrote in 1989 after dealing with the Morris worm:^[4]

A computer virus is specialized: a virus that works on an IBM PC cannot do anything to a Macintosh or a Unix computer. Similarly, the Arpanet virus could only strike at systems running Berkeley Unix. Computers running other operating systems—like AT&T Unix, VMS, or DOS—were totally immune.
Diversity, then, works against viruses. If all the systems on the Arpanet ran Berkeley Unix, the virus would have disabled all fifty thousand of them. Instead, it infected only a couple thousand. Biological viruses are just as specialized: we can't catch the flu from dogs.
Bureaucrats and managers will forever urge us to standardize on a single type of system: "Let's only use Sun workstations" or "Only buy IBM systems." Yet somehow our communities of computers are a diverse population—with Data General machines sitting next to Digital Vaxes; IBMs connected to Sonys. Like our neighborhoods, electronic communities thrive through diversity.

Another main concern is increased spread of algorithmic bias. In the light of increased usage of machine learning there is a growing awareness of the biases introduced by algorithms. The nature of monocultures exacerbate this problem since it makes the bias systemic and spreading unfair decisions.

Social level concerns

Monocultures may lead to Braess's like paradoxes in which introducing a "better option" (such as a more accurate algorithm) leads to suboptimal monocultural convergence - a monoculture whose correlated nature results in degraded overall quality of the decisions. Since monocultures form in areas of high-stakes decisions such as credit scoring and automated hiring, it is important to achieve optimal decision making.

This scenario can be studied throw the lens of mechanism design, in which agents are choosing between a set of algorithms, some of which return correlated outputs. The overall impact of the decision making is measured by social welfare.

Suboptimal monocultures convergence in automated hiring

This section demonstrates the concern of suboptimal monoculture convergence using automated hiring as a case study. Hiring is the process of ranking a group of candidates and hiring the top-valued. In recent years automated hiring (automatically ranking candidates based on their interaction with an AI powered system) became popular.

As shown by Kleinberg,^[5] under some assumptions, suboptimal automated hiring monocultures naturally form, namely, choosing the correlated algorithm is a dominant strategy, thus converging to monoculture that leads suboptimal social welfare.

Framework

In this scenario we will consider two firms and a group $S$ of $n$ candidate with hidden utilities of $x_{i}$ . For hiring process - each firm will produce a noisy-ranking of the candidates, then each firm (in a random order) hires the first available candidate in their ranking. Each firm can choose to use either an independent human rankers or use a common algorithmic ranking.

The ranking algorithm ${\mathcal {F}}_{\theta }$ is modeled as a noisy distribution above permutations of $S$ parametrized by an accuracy parameter $\theta >0$ .

In order for ${\mathcal {F}}_{\theta }$ to make sense it should satisfy these conditions:

Differentiability: The probability of each permutation $\pi$ is continues and differentiable in $\theta$
Asymptotic optimality: For the true ranking $\pi ^{*}$ : $\lim _{\theta \to \infty }Pr[\pi ^{*}]=1$
Monotonicity: The expected utility of the top-ranked candidate gets better as $\theta$ increases, even if any subset of $S$ is removed.

These conditions state that a firm should always prefer higher values of $\theta$ , even if it is not first in the selection order.

Both the algorithmic and human ranking methods are of the form of ${\mathcal {F}}_{\theta }$ and differ by the accuracy parameters $\theta _{A},\theta _{H}$ . The algorithmic ranking output is corotated - it always outputs the same permutation. In contrast, a human ranked premutation is drawn from ${\mathcal {F}}_{\theta _{H}}$ independently for each of firms.

For $s_{1},s_{2}\in \{A,H\}$ strategies of the first and second firm, Social welfare $W_{s_{1},s_{2}}$ is defied as the sum of utilities of the hired candidates.

Conditions to suboptimal convergence

The Braess's like paradox in this framework is suboptimal monocultures converges. That is, using the algorithmic ranking is dominant strategy thus converging toward monoculture yet it yields suboptimal welfare $W_{A,A}<W_{H,H}$ (welfare in a world without algorithmic ranking is higher).

The main theorem proved by Kleinberg^[5] of this model is that for any $\theta _{H}$ and any noisy ranking family ${\mathcal {F}}_{\theta }$ that satisfy these conditions:

Preference for the first position: For all $\theta >0$ if $\pi ,\sigma \sim {\mathcal {F}}_{\theta }$ then $\mathbb {E} [\pi _{1}-\pi _{2}|\pi _{1}\neq \sigma _{1}]>0$ .
Preference for weaker competition: For all $\theta _{1}>\theta _{2},\sigma \sim {\mathcal {F}}_{\theta _{1}}and\ \pi ,\tau \sim {\mathcal {F}}_{\theta _{2}}:\mathbb {E} [\pi _{1}^{(-\sigma _{1})}]<\mathbb {E} [\pi _{1}^{(-\tau _{1})}]$ .

there exists a $\theta _{A}>\theta _{H}$ such that both firms prefer using the sherd algorithmic ranking even though the social welfare is higher when both use the human evaluators. In other words - regardless of the accuracy of the human rankers there exists a more accurate algorithm whose introduction leads to suboptimal monoculture convergence.

The implications of this theorem is that under these conditions, firms will choose to use the algorithmic ranking even though that the correlated nature of algorithmic monocultures degrades total social welfare. Even though algorithmic rankings are more accurate.

The first condition on ${\mathcal {F}}_{\theta }$ (Preference for the first position) is equivalent to a preference of firms to have independent ranking (in our setting - non algorithmic). This means that a firm should prefers independent ranking methods given all else is equal.

The intuition behind preference for weaker competition is that when a candidate is removed (hired by a different firm), the best remaining candidate is better in expectation when the removed candidate is chosen based on a less accurate ranking. Thus, a firm should always prefer that its competitors would be less accurate.

These conditions are met for ${\mathcal {F}}_{\theta }$ that is the Mallows Model distributions and some types of random utility models (Gaussian or Laplacian noise).

Related Research Articles

In statistics and statistical physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution from which direct sampling is difficult. This sequence can be used to approximate the distribution or to compute an integral. Metropolis–Hastings and other MCMC algorithms are generally used for sampling from multi-dimensional distributions, especially when the number of dimensions is high. For single-dimensional distributions, there are usually other methods that can directly return independent samples from the distribution, and these are free from the problem of autocorrelated samples that is inherent in MCMC methods.

The Box–Muller transform, by George Edward Pelham Box and Mervin Edgar Muller, is a random number sampling method for generating pairs of independent, standard, normally distributed random numbers, given a source of uniformly distributed random numbers. The method was in fact first mentioned explicitly by Raymond E. A. C. Paley and Norbert Wiener in 1934.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In mathematics, the Mellin transform is an integral transform that may be regarded as the multiplicative version of the two-sided Laplace transform. This integral transform is closely connected to the theory of Dirichlet series, and is often used in number theory, mathematical statistics, and the theory of asymptotic expansions; it is closely related to the Laplace transform and the Fourier transform, and the theory of the gamma function and allied special functions.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is difficult. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

In differential topology, the jet bundle is a certain construction that makes a new smooth fiber bundle out of a given smooth fiber bundle. It makes it possible to write differential equations on sections of a fiber bundle in an invariant form. Jets may also be seen as the coordinate free versions of Taylor expansions.

In quantum field theory, the theta vacuum is the semi-classical vacuum state of non-abelian Yang–Mills theories specified by the vacuum angleθ that arises when the state is written as a superposition of an infinite set of topologically distinct vacuum states. The dynamical effects of the vacuum are captured in the Lagrangian formalism through the presence of a θ-term which in quantum chromodynamics leads to the fine tuning problem known as the strong CP problem. It was discovered in 1976 by Curtis Callan, Roger Dashen, and David Gross, and independently by Roman Jackiw and Claudio Rebbi.

In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of maximum likelihood (ML) estimation, but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of maximum likelihood estimation.

In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC).

In mathematics, a local system on a topological space X is a tool from algebraic topology which interpolates between cohomology with coefficients in a fixed abelian group A, and general sheaf cohomology in which coefficients vary from point to point. Local coefficient systems were introduced by Norman Steenrod in 1943.

In mathematics, a $π$ -system on a set $is a collection of certain subsets of such that$

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) $from observations an estimator is called minimax if its maximal risk is minimal among all estimators of . In a sense this means that is an estimator which performs best in the worst possible case allowed in the problem.$

Amplitude amplification is a technique in quantum computing which generalizes the idea behind Grover's search algorithm, and gives rise to a family of quantum algorithms. It was discovered by Gilles Brassard and Peter Høyer in 1997, and independently rediscovered by Lov Grover in 1998.

<span class="mw-page-title-main">Wrapped normal distribution</span>

In probability theory and directional statistics, a wrapped normal distribution is a wrapped probability distribution that results from the "wrapping" of the normal distribution around the unit circle. It finds application in the theory of Brownian motion and is a solution to the heat equation for periodic boundary conditions. It is closely approximated by the von Mises distribution, which, due to its mathematical simplicity and tractability, is the most commonly used distribution in directional statistics.

The Brownian motion models for financial markets are based on the work of Robert C. Merton and Paul A. Samuelson, as extensions to the one-period market models of Harold Markowitz and William F. Sharpe, and are concerned with defining the concepts of financial assets and markets, portfolios, gains and wealth in terms of continuous-time stochastic processes.

The concept of angles between lines, between two planes or between a line and a plane can be generalized to arbitrary dimensions. This generalization was first discussed by Camille Jordan. For any pair of flats in a Euclidean space of arbitrary dimension one can define a set of mutual angles which are invariant under isometric transformation of the Euclidean space. If the flats do not intersect, their shortest distance is one more invariant. These angles are called canonical or principal. The concept of angles can be generalized to pairs of flats in a finite-dimensional inner product space over the complex numbers.

In representation theory of mathematics, the Waldspurger formula relates the special values of two L-functions of two related admissible irreducible representations. Let $k$ be the base field, $f$ be an automorphic form over $k$ , $π$ be the representation associated via the Jacquet–Langlands correspondence with f. Goro Shimura (1976) proved this formula, when $and f is a cusp form; Günter Harder made the same discovery at the same time in an unpublished paper. Marie-France Vignéras (1980) proved this formula, when and f is a newform. Jean-Loup Waldspurger, for whom the formula is named, reproved and generalized the result of Vignéras in 1985 via a totally different method which was widely used thereafter by mathematicians to prove similar formulas.$

The Coate–Loury model of affirmative action was developed by Stephen Coate and Glenn Loury in 1993. The model seeks to answer the question of whether, by mandating expanded opportunities for minorities in the present, these policies are rendered unnecessary in the future. Affirmative action may lead to one of two outcomes:

By improving employers’ perceptions of minorities or improving minorities’ skills, or both, affirmative action policies would eventually cause employers to want to hire minorities regardless of the presence of affirmative action policies.
By dampening incentives for minorities, affirmative action policies would reduce minority skill investment, thus leading to an equilibrium where employers correctly believe minorities to be less productive than majorities, thus perpetuating the need for affirmative action in order to achieve parity in the labor market.

A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo samplers, but has since been used in diverse settings in statistics, machine learning and computer science.

References

↑ Goth, G. (2003). "Addressing the monoculture". IEEE Security & Privacy. 1 (6): 8–10. doi:10.1109/msecp.2003.1253561. ISSN 1540-7993. S2CID 16965084.
↑ Murphy, Hannah (2021-12-14). "Hackers launch more than 1.2m attacks through Log4J flaw". Financial Times . Retrieved 2021-12-17.
↑ Stamp, Mark (2004). "Risks of monoculture". Communications of the ACM. 47 (3): 120. doi:10.1145/971617.971650. ISSN 0001-0782. S2CID 16746625.
↑ Stoll, Clifford (1989). The Cuckoo's Egg . Doubleday. pp. 320–321. ISBN 978-0-307-81942-0.
1 2 Kleinberg, Jon (2021). "Algorithmic monoculture and social welfare". Proceedings of the National Academy of Sciences. 118 (22). arXiv: 2101.05853 . doi: 10.1073/pnas.2018340118 . PMC 8179131 . PMID 34035166.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Goth, G. (2003). "Addressing the monoculture". IEEE Security & Privacy. 1 (6): 8–10. doi:10.1109/msecp.2003.1253561. ISSN 1540-7993. S2CID 16965084.

[2] Murphy, Hannah (2021-12-14). "Hackers launch more than 1.2m attacks through Log4J flaw". Financial Times . Retrieved 2021-12-17.

[3] Stamp, Mark (2004). "Risks of monoculture". Communications of the ACM. 47 (3): 120. doi:10.1145/971617.971650. ISSN 0001-0782. S2CID 16746625.

[stoll1989-4] Stoll, Clifford (1989). The Cuckoo's Egg . Doubleday. pp. 320–321. ISBN 978-0-307-81942-0.

[:0-5] 1 2 Kleinberg, Jon (2021). "Algorithmic monoculture and social welfare". Proceedings of the National Academy of Sciences. 118 (22). arXiv: 2101.05853 . doi: 10.1073/pnas.2018340118 . PMC 8179131 . PMID 34035166.

[1]

[2]

[3]

[4]

[5]