Data envelopment analysis

Last updated March 29, 2024

Data envelopment analysis (DEA) is a nonparametric method in operations research and economics for the estimation of production frontiers.^[1] DEA has been applied in a large range of fields including international banking, economic sustainability, police department operations, and logistical applications^[2]^[3]^[4] Additionally, DEA has been used to assess the performance of natural language processing models, and it has found other applications within machine learning.^[5]^[6]^[7]

Description

DEA is used to empirically measure productive efficiency of decision-making units (DMUs). Although DEA has a strong link to production theory in economics, the method is also used for benchmarking in operations management, whereby a set of measures is selected to benchmark the performance of manufacturing and service operations.^[8] In benchmarking, the efficient DMUs, as defined by DEA, may not necessarily form a “production frontier”, but rather lead to a “best-practice frontier.”^[1]^[9]^{: 243–285}

In contrast to parametric methods that require the ex-ante specification of a production- or cost-function, non-parametric approaches compare feasible input and output combinations based on the available data only.^[10] DEA, one of the most commonly used non-parametric methods, owes its name to its enveloping property of the dataset's efficient DMUs, where the empirically observed, most efficient DMUs constitute the production frontier against which all DMUs are compared. DEA's popularity stems from its relative lack of assumptions, the ability to benchmark multi-dimensional inputs and outputs as well as its computational ease owing to it being expressable as a linear program, despite its task to calculate efficiency ratios.^[11]

History

Building on the ideas of Farrell,^[12] the 1978 work "Measuring the efficiency of decision-making units" by Charnes, Cooper & Rhodes ^[1] applied linear programming to estimate, for the first time, an empirical, production-technology frontier. In Germany, the procedure had earlier been used to estimate the marginal productivity of R&D and other factors of production. Since then, there have been a large number of books and journal articles written on DEA or about applying DEA to various sets of problems.

Starting with the CCR model, named after Charnes, Cooper, and Rhodes,^[1] many extensions to DEA have been proposed in the literature. They range from adapting implicit model assumptions such as input and output orientation, distinguishing technical and allocative efficiency,^[13] adding limited disposability^[14] of inputs/outputs or varying returns-to-scale^[15] to techniques that utilize DEA results and extend them for more sophisticated analyses, such as stochastic DEA^[16] or cross-efficiency analysis.^[17]

Techniques

In a one-input, one-output scenario, efficiency is merely the ratio of output over input that can be produced, while comparing several entities/DMUs based on it is trivial. However, when adding more inputs or outputs the efficiency computation becomes more complex. Charnes, Cooper, and Rhodes (1978)^[1] in their basic DEA model (the CCR) define the objective function to find $DMU_{j}'s$ efficiency $(\theta _{j})$ as:

\max \quad \theta _{j}={\frac {\sum \limits _{m=1}^{M}y_{m}^{j}u_{m}^{j}}{\sum \limits _{n=1}^{N}x_{n}^{j}v_{n}^{j}}},

where the $DMU_{j}'s$ known $M$ outputs $y_{1}^{j},...,y_{m}^{j}$ are multiplied by their respective weights $u_{1}^{j},...,u_{m}^{j}$ and divided by the $N$ inputs $x_{1}^{j},...,x_{n}^{j}$ multiplied by their respective weights $v_{1}^{j},...,v_{n}^{j}$ .

The efficiency score $\theta _{j}$ is sought to be maximized, under the constraints that using those weights on each $DMU_{k}\quad k=1,...,K$ , no efficiency score exceeds one:

{\frac {\sum \limits _{m=1}^{M}y_{m}^{k}u_{m}^{j}}{\sum \limits _{n=1}^{N}x_{n}^{k}v_{n}^{j}}}\leq 1\qquad k=1,...,K,

and all inputs, outputs and weights have to be non-negative. To allow for linear optimization, one typically constrains either the sum of outputs or the sum of inputs to equal a fixed value (typically 1. See later for an example).

Because this optimization problem's dimensionality is equal to the sum of its inputs and outputs, selecting the smallest number of inputs/outputs that collectively, accurately capture the process one attempts to characterize is crucial. And because the production frontier envelopment is done empirically, several guidelines exist on the minimum required number of DMUs for good discriminatory power of the analysis, given homogeneity of the sample. This minimum number of DMUs varies between twice the sum of inputs and outputs ( $2(M+N)$ ) and twice the product of inputs and outputs ( $2MN$ ).

Some advantages of the DEA approach are:

no need to explicitly specify a mathematical form for the production function
capable of handling multiple inputs and outputs
capable of being used with any input-output measurement, although ordinal variables remain tricky
the sources of inefficiency can be analysed and quantified for every evaluated unit
using the dual of the optimization problem identifies which DMU is evaluating itself against which other DMUs

Some of the disadvantages of DEA are:

results are sensitive to the selection of inputs and outputs
high-efficiency values can be obtained by being truly efficient or having a niche combination of inputs/outputs
the number of efficient firms on the frontier increases with the number of inputs and output variables
a DMU's efficiency scores may be obtained by using non-unique combinations of weights on the input and/or output factors

Example

Assume that we have the following data:

Unit 1 produces 100 items per day, and the inputs per item are 10 dollars for materials and 2 labour-hours
Unit 2 produces 80 items per day, and the inputs are 8 dollars for materials and 4 labour-hours
Unit 3 produces 120 items per day, and the inputs are 12 dollars for materials and 1.5 labour-hours

To calculate the efficiency of unit 1, we define the objective function (OF) as

$MaxEfficiency:(100u_{1})/(10v_{1}+2v_{2})$

which is subject to (ST) all efficiency of other units (efficiency cannot be larger than 1):

Efficiency of unit 1: $(100u_{1})/(10v_{1}+2v_{2})\leq 1$
Efficiency of unit 2: $(80u_{1})/(8v_{1}+4v_{2})\leq 1$
Efficiency of unit 3: $(120u_{1})/(12v_{1}+1.5v_{2})\leq 1$

and non-negativity:

$u,v\geq 0$

A fraction with decision variables in the numerator and denominator is nonlinear. Since we are using a linear programming technique, we need to linearize the formulation, such that the denominator of the objective function is constant (in this case 1), then maximize the numerator.

The new formulation would be:

OF
- $MaxEfficiency:100u_{1}$
ST
- Efficiency of unit 1: $100u_{1}-(10v_{1}+2v_{2})\leq 0$
- Efficiency of unit 2: ${\textstyle 80u_{1}-(8v_{1}+4v_{2})\leq 0}$
- Efficiency of unit 3: $120u_{1}-(12v_{1}+1.5v_{2})\leq 0$
- Denominator of nonlinear OF: $10v_{1}+2v_{2}=1$
- Non-negativity: $u,v\geq 0$

Extensions

A desire to improve upon DEA by reducing its disadvantages or strengthening its advantages has been a major cause for discoveries in the recent literature. The currently most often DEA-based method to obtain unique efficiency rankings is called "cross-efficiency." Originally developed by Sexton et al. in 1986,^[17] it found widespread application ever since Doyle and Green's 1994 publication.^[18] Cross-efficiency is based on the original DEA results, but implements a secondary objective where each DMU peer-appraises all other DMU's with its own factor weights. The average of these peer-appraisal scores is then used to calculate a DMU's cross-efficiency score. This approach avoids DEA's disadvantages of having multiple efficient DMUs and potentially non-unique weights.^[19] Another approach to remedy some of DEA's drawbacks is Stochastic DEA,^[16] which synthesizes DEA and Stochastic Frontier Analysis (SFA).^[20]

Footnotes

1 2 3 4 5 Charnes et al (1978)
↑ Charnes et al (1995)
↑ Emrouznejad et al (2016)
↑ Thanassoulis (1995)
↑ Koronakos and Sotiropoulos (2020)
↑ Zhou et al (2022)
↑ Guerrero et al (2022)
↑ Mahmoudi et al (2021)
↑ Sickles et al (2019)
↑ Cooper et al (2007)
↑ Cooper et al (2011)
↑ Farrell (1957)
↑ Fried et al (2008)
↑ Cooper et al (2000)
↑ Banker et al (1984)
1 2 Olesen (2016)
1 2 Sexton (1986)
↑ Doyle (1994)
↑ Dyson (2001)
↑ Olesen et al (2016)

Related Research Articles

In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms—the amount of time, storage, or other resources needed to execute them. Usually, this involves determining a function that relates the size of an algorithm's input to the number of steps it takes or the number of storage locations it uses. An algorithm is said to be efficient when this function's values are small, or grow slowly compared to a growth in the size of the input. Different inputs of the same size may cause the algorithm to have different behavior, so best, worst and average case descriptions might all be of practical interest. When not otherwise specified, the function describing the performance of an algorithm is usually an upper bound, determined from the worst case inputs to the algorithm.

Comparative advantage in an economic model is the advantage over others in producing a particular good. A good can be produced at a lower relative opportunity cost or autarky price, i.e. at a lower relative marginal cost prior to trade. Comparative advantage describes the economic reality of the work gains from trade for individuals, firms, or nations, which arise from differences in their factor endowments or technological progress.

Growth accounting is a procedure used in economics to measure the contribution of different factors to economic growth and to indirectly compute the rate of technological progress, measured as a residual, in an economy. Growth accounting decomposes the growth rate of an economy's total output into that which is due to increases in the contributing amount of the factors used—usually the increase in the amount of capital and labor—and that which cannot be accounted for by observable changes in factor utilization. The unexplained part of growth in GDP is then taken to represent increases in productivity or a measure of broadly defined technological progress.

In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

X-inefficiency is a concept used in economics to describe instances where firms go through internal inefficiency resulting in higher production costs than required for a given output. This inefficiency is a result of various factors such as outdated technology, Inefficient production processes, poor management and lack of competition resulting in lower profits and higher prices for consumers. The concept of X-inefficiency was introduced by Harvey Leibenstein.

In economics, a production function gives the technological relation between quantities of physical inputs and quantities of output of goods. The production function is one of the key concepts of mainstream neoclassical theories, used to define marginal product and to distinguish allocative efficiency, a key focus of economics. One important purpose of the production function is to address allocative efficiency in the use of factor inputs in production and the resulting distribution of income to those factors, while abstracting away from the technological problems of achieving technical efficiency, as an engineer or professional manager might understand it.

Productivity is the efficiency of production of goods or services expressed by some measure. Measurements of productivity are often expressed as a ratio of an aggregate output to a single input or an aggregate input used in a production process, i.e. output per unit of input, typically over a specific period of time. The most common example is the (aggregate) labour productivity measure, one example of which is GDP per worker. There are many different definitions of productivity and the choice among them depends on the purpose of the productivity measurement and data availability. The key source of difference between various productivity measures is also usually related to how the outputs and the inputs are aggregated to obtain such a ratio-type measure of productivity.

In economics, an input–output model is a quantitative economic model that represents the interdependencies between different sectors of a national economy or different regional economies. Wassily Leontief (1906–1999) is credited with developing this type of analysis and earned the Nobel Prize in Economics for his development of this model.

In economics, the concept of returns to scale arises in the context of a firm's production function. It explains the long-run linkage of increase in output (production) relative to associated increases in the inputs.

A feedforward neural network (FNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. Its flow is uni-directional, meaning that the information in the model flows in only one direction—forward—from the input nodes, through the hidden nodes and to the output nodes, without any cycles or loops, in contrast to recurrent neural networks, which have a bi-directional flow. Modern feedforward networks are trained using the backpropagation method and are colloquially referred to as the "vanilla" neural networks.

The Solow–Swan model or exogenous growth model is an economic model of long-run economic growth. It attempts to explain long-run economic growth by looking at capital accumulation, labor or population growth, and increases in productivity largely driven by technological progress. At its core, it is an aggregate production function, often specified to be of Cobb–Douglas type, which enables the model "to make contact with microeconomics". The model was developed independently by Robert Solow and Trevor Swan in 1956, and superseded the Keynesian Harrod–Domar model.

In microeconomic theory, productive efficiency is a situation in which the economy or an economic system operating within the constraints of current industrial technology cannot increase production of one good without sacrificing production of another good. In simple terms, the concept is illustrated on a production possibility frontier (PPF), where all points on the curve are points of productive efficiency. An equilibrium may be productively efficient without being allocatively efficient — i.e. it may result in a distribution of goods where social welfare is not maximized.

Stochastic frontier analysis (SFA) is a method of economic modeling. It has its starting point in the stochastic production frontier models simultaneously introduced by Aigner, Lovell and Schmidt (1977) and Meeusen and Van den Broeck (1977).

Production is the process of combining various inputs, both material and immaterial in order to create output. Ideally this output will be a good or service which has value and contributes to the utility of individuals. The area of economics that focuses on production is called production theory, and it is closely related to the consumption theory of economics.

In signal processing, overlap–save is the traditional name for an efficient way to evaluate the discrete convolution between a very long signal $and a finite impulse response (FIR) filter :$

In computational complexity the decision tree model is the model of computation in which an algorithm is considered to be basically a decision tree, i.e., a sequence of queries or tests that are done adaptively, so the outcome of previous tests can influence the tests performed next.

Gradient boosting is a machine learning technique based on boosting in a functional space, where the target is pseudo-residuals rather than the typical residuals used in traditional boosting. It gives a prediction model in the form of an ensemble of weak prediction models, i.e., models that make very few assumptions about the data, which are typically simple decision trees. When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest. A gradient-boosted trees model is built in a stage-wise fashion as in other boosting methods, but it generalizes the other methods by allowing optimization of an arbitrary differentiable loss function.

William Wager Cooper was an American operations researcher, known as a father of management science and as "Mr. Linear Programming". He was the founding president of The Institute of Management Sciences, founding editor-in-chief of Auditing: A Journal of Practice and Theory, a founding faculty member of the Graduate School of Industrial Administration at the Carnegie Institute of Technology, founding dean of the School of Urban and Public Affairs at CMU, the former Arthur Lowes Dickinson Professor of Accounting at Harvard University, and the Foster Parker Professor Emeritus of Management, Finance and Accounting at the University of Texas at Austin.

Edwardo Lao Rhodes is an American management science scholar and author. An Emeritus Professor at the Indiana University School of Public and Environmental Affairs, Rhodes is best known for his seminal work in data envelopment analysis, as well as his applications of management science to policy analysis and environmental policy.

<span class="mw-page-title-main">Rajiv Banker</span>

Rajiv D. Banker(born 1953, deceased March 1, 2023) was an accounting researcher and educator, recognized by the Institute for Scientific Information as one of the 150 most influential researchers in economics and business. He was the Director of the Center for Accounting and Information Technology at the Fox School of Business and Management, Temple University. He was also President of the International Data Envelopment Analysis Society and Editor-in-Chief of the Data Envelopment Analysis Journal.

References

Charnes, Abraham; Cooper, William Wager; Rhodes, E. (1978). "Measuring the Efficiency of Decision Making Units" (PDF). European Journal of Operational Research . 2 (6): 429–444. doi:10.1016/0377-2217(78)90138-8 . Retrieved 27 January 2022.
Charnes, Abraham; Cooper, William; Lewin, Arie; Seiford, Lawrence (1995). Data Envelopment Analysis: Theory, Methodology, and Applications. Springer Science & Business Media. ISBN 9780792394808.
Mahmoudi, Amin; Abbasi, Mehdi; Deng, Xiaopeng (2021). "Evaluating the Performance of the Suppliers Using Hybrid DEA-OPA Model: A Sustainable Development Perspective". Group Decision and Negotiation. 31 (2): 335–362. doi:10.1007/s10726-021-09770-x. ISSN 0926-2644. S2CID 254498857.
Banker, R. D.; Charnes, A.; Cooper, William Wager (September 1984). "Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis" (PDF). Management Science . 30 (9): 1078–1092. doi:10.1287/mnsc.30.9.1078. S2CID 51901687 . Retrieved 27 January 2022.
Brockhoff K. (1970). "On the Quantification of the Marginal Productivity of Industrial Research by Estimating a Production Function for a Single Firm". German Economic Review . 8: 202–229.
Banker, R. D.; Charnes, A.; Cooper, William Wager (September 1984). "Some Models for Estimating Technical and Scale Inefficiencies in Data Envelopment Analysis" (PDF). Management Science . 30 (9): 1078–1092. doi:10.1287/mnsc.30.9.1078. S2CID 51901687 . Retrieved 27 January 2022.
Cook, Wade D.; Hababou, Moez; Tuenter, Hans J. H. (November 2000). "Multicomponent Efficiency Measurement and Shared Inputs in Data Envelopment Analysis: An Application to Sales and Service Performance in Bank Branches". Journal of Productivity Analysis. 14 (3): 209–224. doi:10.1023/A:1026598803764. JSTOR 41781515.

Cook, Wade D.; Tone, Kaoru; Zhu, Joe (April 2014). "Data envelopment analysis: Prior to choosing a model". Omega. 44 (C): 1–4. doi:10.1016/j.omega.2013.09.004.

Cooper, William Wager; Seiford, Lawrence; Zhu, Joe (2000). "A unified additive model approach for evaluating inefficiency and congestion with associated measures in DEA". Socio-Economic Planning Sciences . 34 (1): 1–25. doi:10.1016/S0038-0121(99)00010-5.
Cooper, William Wager; Seiford, Lawrence; Zhu, Joe (2000). "A unified additive model approach for evaluating inefficiency and congestion with associated measures in DEA". Socio-Economic Planning Sciences . 34 (1): 1–25. doi:10.1016/S0038-0121(99)00010-5.
Cooper, William Wager; Seiford, Lawrence M.; Tone, Kaoru (2007). Data Envelopment Analysis: A Comprehensive Text with Models, Applications, References and DEA-Solver Software (2 ed.). Springer Publishing.
Cooper, William Wager; Seiford, Lawrence M.; Zhu, Joe, eds. (2011). Handbook on Data Envelopment Analysis. International Series in Operations Research & Management Science. Vol. 164 (2 ed.). Springer Publishing. ISBN 978-1441961501.
Dyson, R. G.; Allen, R.; Camanho, A. S.; Podinovski, V. V.; Sarrico, C. S.; Shale, E. A. (2001-07-16). "Pitfalls and protocols in DEA". European Journal of Operational Research. Data Envelopment Analysis. 132 (2): 245–259. doi:10.1016/S0377-2217(00)00149-1.
Doyle, John; Green, Rodney (1994). "Efficiency and Cross-efficiency in DEA: Derivations, Meanings and Uses". Journal of the Operational Research Society . 45 (5): 567–578. doi:10.1057/jors.1994.84. ISSN 0160-5682. S2CID 122161456.
Emrouznejad, Ali; Banker, Rajiv; Ray, Subhash; Chen, Lei (2016). "Recent Applications of Data Envelopment Analysis". Proceedings of the 14th International Conference on Data Envelopment Analysis.{{cite journal}}: Cite journal requires |journal= (help)
Farrell, Michael James (1957). "The Measurement of Productive Efficiency". Journal of the Royal Statistical Society. 120 (3): 253–290. doi:10.2307/2343100. JSTOR 2343100.
Fried, Harold O.; Lovell, C. A. Knox; Schmidt, Shelton S. (2008). The Measurement of Productive Efficiency and Productivity Growth. Oxford University Press. ISBN 978-0-19-804050-7.
Guerrero, Nadia; Aparicio, Juan; Valero-Carreras, Daniel (2022). "Combining data envelopment analysis and machine learning". Mathematics. 10 (6): 909. doi: 10.3390/math10060909 .
Lovell, C.A.L., & P. Schmidt (1988) "A Comparison of Alternative Approaches to the Measurement of Productive Efficiency, in Dogramaci, A., & R. Färe (eds.) Applications of Modern Production Theory: Efficiency and Productivity, Kluwer: Boston.
Olesen, Ole B.; Petersen, Niels Christian (2016). "Stochastic Data Envelopment Analysis—A review". European Journal of Operational Research . 251 (1): 2–21. doi:10.1016/j.ejor.2015.07.058. ISSN 0377-2217.
Ramanathan, R. (2003). An Introduction to Data Envelopment Analysis: A tool for Performance Measurement. N.Delhi: SAGE Publishing.
Sexton, Thomas R. (1986). "Data envelopment analysis: Critique and extension". New Directions for Program Evaluation. 1986 (32): 73–105. doi:10.1002/ev.1441.
Sickles, Robin; Zelenyuk, Valentin (2019). Measurement of Productivity and Efficiency - Theory and Practice (PDF). Cambridge University Press. ISBN 978-1-107-68765-3 . Retrieved 27 January 2022.
Thanassoulis, Emmanuel (1995). "Assessing police forces in England and Wales using data envelopment analysis". European Journal of Operational Research . 87 (3): 641–657. doi:10.1016/0377-2217(95)00236-7.
Zhou, Zachary; Zachariah, Alisha; Conathan, Devin; Kline, Jeffery (2022). "Assessing Resource-Performance Trade-off of Natural Language Models using Data Envelopment Analysis". Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems. Association for Computational Linguistics: 11–20. arXiv: 2211.01486 .
Koronakos, Gregory; Sotiropoulos, Dionysios (2020). "A Neural Network approach for Non-parametric Performance Assessment". 2020 11th International Conference on Information, Intelligence, Systems and Applications (IISA. IEEE. pp. 1–8. doi:10.1109/IISA50023.2020.9284346. ISBN 978-1-6654-2228-4. S2CID 228097834.

External links

Data Envelopment Analysis official website
Journal of Productivity Analysis official website

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] 1 2 3 4 5 Charnes et al (1978)

[Charnes95-2] Charnes et al (1995)

[:15-3] Emrouznejad et al (2016)

[:16-4] Thanassoulis (1995)

[5] Koronakos and Sotiropoulos (2020)

[:18-6] Zhou et al (2022)

[:17-7] Guerrero et al (2022)

[8] Mahmoudi et al (2021)

[9] Sickles et al (2019)

[cooper2007-10] Cooper et al (2007)

[ease-11] Cooper et al (2011)

[farrell-12] Farrell (1957)

[allo-13] Fried et al (2008)

[14] Cooper et al (2000)

[15] Banker et al (1984)

[:1-16] 1 2 Olesen (2016)

[:2-17] 1 2 Sexton (1986)

[Doyle-18] Doyle (1994)

[nonu-19] Dyson (2001)

[olesen-20] Olesen et al (2016)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]