Data envelopment analysis

Last updated

Data envelopment analysis (DEA) is a nonparametric method in operations research and economics for the estimation of production frontiers. [1] DEA has been applied in a large range of fields including international banking, economic sustainability, police department operations, and logistical applications [2] [3] [4] Additionally, DEA has been used to assess the performance of natural language processing models, and it has found other applications within machine learning. [5] [6] [7]

Contents

Description

DEA is used to empirically measure productive efficiency of decision-making units (DMUs). Although DEA has a strong link to production theory in economics, the method is also used for benchmarking in operations management, whereby a set of measures is selected to benchmark the performance of manufacturing and service operations. [8] In benchmarking, the efficient DMUs, as defined by DEA, may not necessarily form a “production frontier”, but rather lead to a “best-practice frontier.” [1] [9] :243–285

In contrast to parametric methods that require the ex-ante specification of a production- or cost-function, non-parametric approaches compare feasible input and output combinations based on the available data only. [10] DEA, one of the most commonly used non-parametric methods, owes its name to its enveloping property of the dataset's efficient DMUs, where the empirically observed, most efficient DMUs constitute the production frontier against which all DMUs are compared. DEA's popularity stems from its relative lack of assumptions, the ability to benchmark multi-dimensional inputs and outputs as well as its computational ease owing to it being expressable as a linear program, despite its task to calculate efficiency ratios. [11]

History

Building on the ideas of Farrell, [12] the 1978 work "Measuring the efficiency of decision-making units" by Charnes, Cooper & Rhodes [1] applied linear programming to estimate, for the first time, an empirical, production-technology frontier. In Germany, the procedure had earlier been used to estimate the marginal productivity of R&D and other factors of production. Since then, there have been a large number of books and journal articles written on DEA or about applying DEA to various sets of problems.

Starting with the CCR model, named after Charnes, Cooper, and Rhodes, [1] many extensions to DEA have been proposed in the literature. They range from adapting implicit model assumptions such as input and output orientation, distinguishing technical and allocative efficiency, [13] adding limited disposability [14] of inputs/outputs or varying returns-to-scale [15] to techniques that utilize DEA results and extend them for more sophisticated analyses, such as stochastic DEA [16] or cross-efficiency analysis. [17]

Techniques

In a one-input, one-output scenario, efficiency is merely the ratio of output over input that can be produced, while comparing several entities/DMUs based on it is trivial. However, when adding more inputs or outputs the efficiency computation becomes more complex. Charnes, Cooper, and Rhodes (1978) [1] in their basic DEA model (the CCR) define the objective function to find efficiency as:

where the known outputs are multiplied by their respective weights and divided by the inputs multiplied by their respective weights .

The efficiency score is sought to be maximized, under the constraints that using those weights on each , no efficiency score exceeds one:

and all inputs, outputs and weights have to be non-negative. To allow for linear optimization, one typically constrains either the sum of outputs or the sum of inputs to equal a fixed value (typically 1. See later for an example).

Because this optimization problem's dimensionality is equal to the sum of its inputs and outputs, selecting the smallest number of inputs/outputs that collectively, accurately capture the process one attempts to characterize is crucial. And because the production frontier envelopment is done empirically, several guidelines exist on the minimum required number of DMUs for good discriminatory power of the analysis, given homogeneity of the sample. This minimum number of DMUs varies between twice the sum of inputs and outputs () and twice the product of inputs and outputs ().

Some advantages of the DEA approach are:

Some of the disadvantages of DEA are:

Example

Assume that we have the following data:

To calculate the efficiency of unit 1, we define the objective function (OF) as

which is subject to (ST) all efficiency of other units (efficiency cannot be larger than 1):

and non-negativity:

A fraction with decision variables in the numerator and denominator is nonlinear. Since we are using a linear programming technique, we need to linearize the formulation, such that the denominator of the objective function is constant (in this case 1), then maximize the numerator.

The new formulation would be:

Extensions

A desire to improve upon DEA by reducing its disadvantages or strengthening its advantages has been a major cause for discoveries in the recent literature. The currently most often DEA-based method to obtain unique efficiency rankings is called "cross-efficiency." Originally developed by Sexton et al. in 1986, [17] it found widespread application ever since Doyle and Green's 1994 publication. [18] Cross-efficiency is based on the original DEA results, but implements a secondary objective where each DMU peer-appraises all other DMU's with its own factor weights. The average of these peer-appraisal scores is then used to calculate a DMU's cross-efficiency score. This approach avoids DEA's disadvantages of having multiple efficient DMUs and potentially non-unique weights. [19] Another approach to remedy some of DEA's drawbacks is Stochastic DEA, [16] which synthesizes DEA and Stochastic Frontier Analysis (SFA). [20]

Footnotes

  1. 1 2 3 4 5 Charnes et al (1978)
  2. Charnes et al (1995)
  3. Emrouznejad et al (2016)
  4. Thanassoulis (1995)
  5. Koronakos and Sotiropoulos (2020)
  6. Zhou et al (2022)
  7. Guerrero et al (2022)
  8. Mahmoudi et al (2021)
  9. Sickles et al (2019)
  10. Cooper et al (2007)
  11. Cooper et al (2011)
  12. Farrell (1957)
  13. Fried et al (2008)
  14. Cooper et al (2000)
  15. Banker et al (1984)
  16. 1 2 Olesen (2016)
  17. 1 2 Sexton (1986)
  18. Doyle (1994)
  19. Dyson (2001)
  20. Olesen et al (2016)

Related Research Articles

<span class="mw-page-title-main">Analysis of algorithms</span> Study of resources used by an algorithm

In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms—the amount of time, storage, or other resources needed to execute them. Usually, this involves determining a function that relates the size of an algorithm's input to the number of steps it takes or the number of storage locations it uses. An algorithm is said to be efficient when this function's values are small, or grow slowly compared to a growth in the size of the input. Different inputs of the same size may cause the algorithm to have different behavior, so best, worst and average case descriptions might all be of practical interest. When not otherwise specified, the function describing the performance of an algorithm is usually an upper bound, determined from the worst case inputs to the algorithm.

Comparative advantage in an economic model is the advantage over others in producing a particular good. A good can be produced at a lower relative opportunity cost or autarky price, i.e. at a lower relative marginal cost prior to trade. Comparative advantage describes the economic reality of the work gains from trade for individuals, firms, or nations, which arise from differences in their factor endowments or technological progress.

Growth accounting is a procedure used in economics to measure the contribution of different factors to economic growth and to indirectly compute the rate of technological progress, measured as a residual, in an economy. Growth accounting decomposes the growth rate of an economy's total output into that which is due to increases in the contributing amount of the factors used—usually the increase in the amount of capital and labor—and that which cannot be accounted for by observable changes in factor utilization. The unexplained part of growth in GDP is then taken to represent increases in productivity or a measure of broadly defined technological progress.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

<span class="mw-page-title-main">X-inefficiency</span> Internal inefficiency of a firm

X-inefficiency is a concept used in economics to describe instances where firms go through internal inefficiency resulting in higher production costs than required for a given output. This inefficiency is a result of various factors such as outdated technology, Inefficient production processes, poor management and lack of competition resulting in lower profits and higher prices for consumers. The concept of X-inefficiency was introduced by Harvey Leibenstein.

<span class="mw-page-title-main">Production function</span> Used to define marginal product and to distinguish allocative efficiency

In economics, a production function gives the technological relation between quantities of physical inputs and quantities of output of goods. The production function is one of the key concepts of mainstream neoclassical theories, used to define marginal product and to distinguish allocative efficiency, a key focus of economics. One important purpose of the production function is to address allocative efficiency in the use of factor inputs in production and the resulting distribution of income to those factors, while abstracting away from the technological problems of achieving technical efficiency, as an engineer or professional manager might understand it.

Productivity is the efficiency of production of goods or services expressed by some measure. Measurements of productivity are often expressed as a ratio of an aggregate output to a single input or an aggregate input used in a production process, i.e. output per unit of input, typically over a specific period of time. The most common example is the (aggregate) labour productivity measure, one example of which is GDP per worker. There are many different definitions of productivity and the choice among them depends on the purpose of the productivity measurement and data availability. The key source of difference between various productivity measures is also usually related to how the outputs and the inputs are aggregated to obtain such a ratio-type measure of productivity.

In economics, an input–output model is a quantitative economic model that represents the interdependencies between different sectors of a national economy or different regional economies. Wassily Leontief (1906–1999) is credited with developing this type of analysis and earned the Nobel Prize in Economics for his development of this model.

In economics, the concept of returns to scale arises in the context of a firm's production function. It explains the long-run linkage of increase in output (production) relative to associated increases in the inputs.

<span class="mw-page-title-main">Feedforward neural network</span> One of two broad types of artificial neural network

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks.

The Solow–Swan model or exogenous growth model is an economic model of long-run economic growth. It attempts to explain long-run economic growth by looking at capital accumulation, labor or population growth, and increases in productivity largely driven by technological progress. At its core, it is an aggregate production function, often specified to be of Cobb–Douglas type, which enables the model "to make contact with microeconomics". The model was developed independently by Robert Solow and Trevor Swan in 1956, and superseded the Keynesian Harrod–Domar model.

<span class="mw-page-title-main">Productive efficiency</span> When one must decrease production of one good to increase another in an economy

In microeconomic theory, productive efficiency is a situation in which the economy or an economic system operating within the constraints of current industrial technology cannot increase production of one good without sacrificing production of another good. In simple terms, the concept is illustrated on a production possibility frontier (PPF), where all points on the curve are points of productive efficiency. An equilibrium may be productively efficient without being allocatively efficient — i.e. it may result in a distribution of goods where social welfare is not maximized.

Stochastic frontier analysis (SFA) is a method of economic modeling. It has its starting point in the stochastic production frontier models simultaneously introduced by Aigner, Lovell and Schmidt (1977) and Meeusen and Van den Broeck (1977).

Production is the process of combining various inputs, both material and immaterial in order to create output. Ideally this output will be a good or service which has value and contributes to the utility of individuals. The area of economics that focuses on production is called production theory, and it is closely related to the consumption theory of economics.

<span class="mw-page-title-main">Overlap–save method</span>

In signal processing, overlap–save is the traditional name for an efficient way to evaluate the discrete convolution between a very long signal and a finite impulse response (FIR) filter :

In computational complexity the decision tree model is the model of computation in which an algorithm is considered to be basically a decision tree, i.e., a sequence of queries or tests that are done adaptively, so the outcome of previous tests can influence the tests performed next.

Gradient boosting is a machine learning technique based on boosting in a functional space, where the target is pseudo-residuals rather than the typical residuals used in traditional boosting. It gives a prediction model in the form of an ensemble of weak prediction models, i.e., models that make very few assumptions about the data, which are typically simple decision trees. When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest. A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes the other methods by allowing optimization of an arbitrary differentiable loss function.

William Wager Cooper was an American operations researcher, known as a father of management science and as "Mr. Linear Programming". He was the founding president of The Institute of Management Sciences, founding editor-in-chief of Auditing: A Journal of Practice and Theory, a founding faculty member of the Graduate School of Industrial Administration at the Carnegie Institute of Technology, founding dean of the School of Urban and Public Affairs at CMU, the former Arthur Lowes Dickinson Professor of Accounting at Harvard University, and the Foster Parker Professor Emeritus of Management, Finance and Accounting at the University of Texas at Austin.

Edwardo Lao Rhodes is an American management science scholar and author. An Emeritus Professor at the Indiana University School of Public and Environmental Affairs, Rhodes is best known for his seminal work in data envelopment analysis, as well as his applications of management science to policy analysis and environmental policy.

<span class="mw-page-title-main">Rajiv Banker</span>

Rajiv D. Banker(born 1953, deceased March 1, 2023) was an accounting researcher and educator, recognized by the Institute for Scientific Information as one of the 150 most influential researchers in economics and business. He was the Director of the Center for Accounting and Information Technology at the Fox School of Business and Management, Temple University. He was also President of the International Data Envelopment Analysis Society and Editor-in-Chief of the Data Envelopment Analysis Journal.

References

Further reading