Optimal facility location

Last updated

The study of facility location problems (FLP), also known as location analysis, is a branch of operations research and computational geometry concerned with the optimal placement of facilities to minimize transportation costs while considering factors like avoiding placing hazardous materials near housing, and competitors' facilities. The techniques also apply to cluster analysis.

Contents

Minimum facility location

A simple facility location problem is the Weber problem, in which a single facility is to be placed, with the only optimization criterion being the minimization of the weighted sum of distances from a given set of point sites. More complex problems considered in this discipline include the placement of multiple facilities, constraints on the locations of facilities, and more complex optimization criteria.

In a basic formulation, the facility location problem consists of a set of potential facility sites L where a facility can be opened, and a set of demand points D that must be serviced. The goal is to pick a subset F of facilities to open, to minimize the sum of distances from each demand point to its nearest facility, plus the sum of opening costs of the facilities.

The facility location problem on general graphs is NP-hard to solve optimally, by reduction from (for example) the set cover problem. A number of approximation algorithms have been developed for the facility location problem and many of its variants.

Without assumptions on the set of distances between clients and sites (in particular, without assuming that the distances satisfy the triangle inequality), the problem is known as non-metric facility location and can be approximated to within a factor O(log n). [1] This factor is tight, via an approximation-preserving reduction from the set cover problem.

If we assume distances between clients and sites are undirected and satisfy the triangle inequality, we are talking about a metric facility location (MFL) problem. The MFL is still NP-hard and hard to approximate within factor better than 1.463. [2] The currently best known approximation algorithm achieves approximation ratio of 1.488. [3]

Minimax facility location

The minimax facility location problem seeks a location which minimizes the maximum distance to the sites, where the distance from one point to the sites is the distance from the point to its nearest site. A formal definition is as follows:

Given a point set P, find a point set S, |S| = k, so that maxpP(minqS(d(p, q)) ) is minimized.

In the case of the Euclidean metric for k = 1, it is known as the smallest enclosing sphere problem or 1-center problem. Its study traced at least to the year of 1860.

NP hardness

It has been proven that exact solution of k-center problem is NP hard. [4] [5] [6] Approximation to the problem was found to be also NP hard when the error is small. The error level in the approximation algorithm is measured as an approximation factor, which is defined as the ratio between the approximation and the optimum. It's proved that the k-center problem approximation is NP hard when approximation factor is less than 1.822 (dimension = 2) [7] or 2 (dimension > 2). [6]

Algorithms

Exact solver

There exist algorithms to produce exact solutions to this problem. One exact solver runs in time . [8] [9]

1 + ε approximation

1 + ε approximation is to find a solution with approximation factor no greater than 1 + ε. This approximation is NP hard as ε is arbitrary. One approach based on the coreset concept is proposed with execution complexity of . [10] As an alternative, another algorithm also based on core sets is available. It runs in . [11] The author claims that the running time is much less than the worst case and thus it's possible to solve some problems when k is small (say k < 5).

Farthest-point clustering

For the hardness of the problem, it's impractical to get an exact solution or precise approximation. Instead, an approximation with factor = 2 is widely used for large k cases. The approximation is referred to as the farthest-point clustering (FPC) algorithm, or farthest-first traversal. [6] The algorithm is quite simple: pick any point from the set as one center; search for the farthest point from remaining set as another center; repeat the process until k centers are found.

It is easy to see that this algorithm runs in linear time. As approximation with factor less than 2 is proved to be NP hard, FPC was regarded as the best approximation one can find.

As per the performance of execution, the time complexity is later improved to O(n log k) with box decomposition technique. [7]

Maxmin facility location

The maxmin facility location or obnoxious facility location problem seeks a location which maximizes the minimum distance to the sites. In the case of the Euclidean metric, it is known as the largest empty sphere problem. The planar case (largest empty circle problem) may be solved in optimal time Θ(n log n). [12] [13]

Integer programming formulations

Facility location problems are often solved as integer programs. In this context, facility location problems are often posed as follows: suppose there are facilities and customers. We wish to choose (1) which of the facilities to open, and (2) which (open) facilities to use to supply the customers, in order to satisfy some fixed demand at minimum cost. We introduce the following notation: let denote the (fixed) cost of opening facility , for . Let denote the cost to ship a product from facility to customer for and . Let denote the demand of customer for . Further suppose that each facility has a maximum output. Let denote the maximum amount of product that can be produced by facility , that is, let denote the capacity of facility . The remainder of this section follows [14]

Capacitated facility location

In our initial formulation, introduce a binary variable for , where if facility is open, and otherwise. Further introduce the variable for and which represents the fraction of the demand filled by facility . The so-called capacitated facility location problem is then given by

Note that the second set of constraints ensures that if , that is, facility isn't open, then for all , that is, no demand for any customer can be filled from facility .

Uncapacitated facility location

A common case of the capacitated facility location problem above is the case when for all . In this case, it is always optimal to satisfy all of the demand from customer from the nearest open facility. Because of this, we may replace the continuous variables from above with the binary variables , where if customer is supplied by facility , and otherwise. The uncapacitated facility location problem is then given by

where is a constant chosen to be suitably large. The choice of can affect computation results—the best choice in this instance is obvious: take . Then, if , any choice of the will satisfy the second set of constraints.

Another formulation possibility for the uncapacitated facility location problem is to disaggregate the capacity constraints (the big- constraints). That is, replace the constraintswith the constraintsIn practice, this new formulation performs significantly better, in the sense that it has a tighter Linear programming relaxation than the first formulation. [14] Notice that summing the new constraints together yields the original big- constraints. In the capacitated case, these formulations are not equivalent. More information about the uncapacitated facility location problem can be found in Chapter 3 of "Discrete location theory". [15]

Applications

Healthcare

In healthcare, incorrect facility location decisions have a serious impact on the community beyond simple cost and service metrics; for instance, hard-to-access healthcare facilities are likely to be associated with increased morbidity and mortality. From this perspective, facility location modeling for healthcare is more critical than similar modeling for other areas. [16]

Solid waste management

Municipal solid waste management still remains a challenge for developing countries because of increasing waste production and high costs associated with waste management. Through the formulation and exact resolution of a facility location problem it is possible to optimize the location of landfills for waste disposal. [17]

Clustering

A particular subset of cluster analysis problems can be viewed as facility location problems. In a centroid-based clustering problem, the objective is to partition data points (elements of a common metric space) into equivalence classes—often called colors—such that points of the same color are close to one another (equivalently, such that points of different colors are far from one another). [18]

To see how one might view (read "transform" or "reduce") a centroid-based clustering problem as a (metric) facility location problem, view each data point in the former as a demand point in the latter. Suppose that the data to be clustered are elements of a metric space (e.g. let be -dimensional Euclidean space for some fixed ). In the facility location problem that we are constructing, we permit facilities to be placed at any point within this metric space ; this defines the set of allowed facility locations . We define the costs to be the pairwise distances between location-demand point pairs (e.g., see metric k-center). In a centroid-based clustering problem, one partitions the data into equivalence classes (i.e. colors) each of which has a centroid. Let us see how a solution to our constructed facility location problem also achieves such a partition. A feasible solution is a non-empty subset of locations. These locations in our facility location problem comprise a set of centroids in our centroid-based clustering problem. Now, assign each demand point to the location that minimizes its servicing-cost; that is, assign the data point to the centroid (break ties arbitrarily). This achieves the partitioning provided that the facility location problem's costs are defined such that they are the images of the centroid-based clustering problem's distance function.

The popular algorithms textbook Algorithm Design [19] provides a related problem-description and an approximation algorithm. The authors refer to the metric facility location problem (i.e. the centroid-based clustering problem or the metric -center problem) as the center selection problem, thereby growing the list of synonyms.

Furthermore, see that in our above definition of the facility location problem that the objective function is general. Specific choices of yield different variants of the facility location problem, and hence different variants of the centroid-based clustering problem. For example, one might choose to minimize the sum of distances from each location to each of its assigned demand points (à la the Weber problem), or one might elect to minimize the maximum of all such distances (à la the 1-center problem).

See also

Related Research Articles

<span class="mw-page-title-main">Travelling salesman problem</span> NP-hard problem in combinatorial optimization

The travelling salesman problem, also known as the travelling salesperson problem (TSP), asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It is an NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.

<span class="mw-page-title-main">Assignment problem</span> Combinatorial optimization problem

The assignment problem is a fundamental combinatorial optimization problem. In its most general form, the problem is as follows:

<span class="mw-page-title-main">Cluster analysis</span> Grouping a set of objects by similarity

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning.

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances, but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.

Medoids are representative objects of a data set or a cluster within a data set whose sum of dissimilarities to all the objects in the cluster is minimal. Medoids are similar in concept to means or centroids, but medoids are always restricted to be members of the data set. Medoids are most commonly used on data when a mean or centroid cannot be defined, such as graphs. They are also used in contexts where the centroid is not representative of the dataset like in images, 3-D trajectories and gene expression. These are also of interest while wanting to find a representative using some distance other than squared euclidean distance.

Semidefinite programming (SDP) is a subfield of mathematical programming concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.

In mathematics, a Euclidean distance matrix is an n×n matrix representing the spacing of a set of n points in Euclidean space. For points in k-dimensional space k, the elements of their Euclidean distance matrix A are given by squares of distances between them. That is

In applied mathematics, the maximum generalized assignment problem is a problem in combinatorial optimization. This problem is a generalization of the assignment problem in which both tasks and agents have a size. Moreover, the size of each task might vary from one agent to the other.

<span class="mw-page-title-main">Geometric median</span> Point minimizing sum of distances to given points

In geometry, the geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances for one-dimensional data, and provides a central tendency in higher dimensions. It is also known as the spatial median, Euclidean minisum point, Torricelli point, or 1-median.

<span class="mw-page-title-main">Maximum cut</span> Problem of finding a maximum cut in a graph

In a graph, a maximum cut is a cut whose size is at least the size of any other cut. That is, it is a partition of the graph's vertices into two complementary sets S and T, such that the number of edges between S and T is as large as possible. Finding such a cut is known as the max-cut problem.

Quadratic unconstrained binary optimization (QUBO), also known as unconstrained binary quadratic programming (UBQP), is a combinatorial optimization problem with a wide range of applications from finance and economics to machine learning. QUBO is an NP hard problem, and for many classical problems from theoretical computer science, like maximum cut, graph coloring and the partition problem, embeddings into QUBO have been formulated. Embeddings for machine learning models include support-vector machines, clustering and probabilistic graphical models. Moreover, due to its close connection to Ising models, QUBO constitutes a central problem class for adiabatic quantum computation, where it is solved through a physical process called quantum annealing.

<span class="mw-page-title-main">David Shmoys</span> American mathematician

David Bernard Shmoys is a Professor in the School of Operations Research and Information Engineering and the Department of Computer Science at Cornell University. He obtained his Ph.D. from the University of California, Berkeley in 1984. His major focus has been in the design and analysis of algorithms for discrete optimization problems.

In natural language processing and information retrieval, cluster labeling is the problem of picking descriptive, human-readable labels for the clusters produced by a document clustering algorithm; standard clustering algorithms do not typically produce any such labels. Cluster labeling algorithms examine the contents of the documents per cluster to find a labeling that summarize the topic of each cluster and distinguish the clusters from each other.

In graph theory, the metric k-center problem is a combinatorial optimization problem studied in theoretical computer science. Given n cities with specified distances, one wants to build k warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of k vertices for which the largest distance of any point to its closest vertex in the k-set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality.

Large margin nearest neighbor (LMNN) classification is a statistical machine learning algorithm for metric learning. It learns a pseudometric designed for k-nearest neighbor classification. The algorithm is based on semidefinite programming, a sub-class of convex optimization.

The Dunn index (DI) (introduced by J. C. Dunn in 1974) is a metric for evaluating clustering algorithms. This is part of a group of validity indices including the Davies–Bouldin index or Silhouette index, in that it is an internal evaluation scheme, where the result is based on the clustered data itself. As do all other such indices, the aim is to identify sets of clusters that are compact, with a small variance between members of the cluster, and well separated, where the means of different clusters are sufficiently far apart, as compared to the within cluster variance. For a given assignment of clusters, a higher Dunn index indicates better clustering. One of the drawbacks of using this is the computational cost as the number of clusters and dimensionality of the data increase.

In mathematics, low-rank approximation is a minimization problem, in which the cost function measures the fit between a given matrix and an approximating matrix, subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.

t-distributed stochastic neighbor embedding Technique for dimensionality reduction

t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map. It is based on Stochastic Neighbor Embedding originally developed by Geoffrey Hinton and Sam Roweis, where Laurens van der Maaten proposed the t-distributed variant. It is a nonlinear dimensionality reduction technique for embedding high-dimensional data for visualization in a low-dimensional space of two or three dimensions. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points with high probability.

The quadratic knapsack problem (QKP), first introduced in 19th century, is an extension of knapsack problem that allows for quadratic terms in the objective function: Given a set of items, each with a weight, a value, and an extra profit that can be earned if two items are selected, determine the number of items to include in a collection without exceeding capacity of the knapsack, so as to maximize the overall profit. Usually, quadratic knapsack problems come with a restriction on the number of copies of each kind of item: either 0, or 1. This special type of QKP forms the 0-1 quadratic knapsack problem, which was first discussed by Gallo et al. The 0-1 quadratic knapsack problem is a variation of knapsack problems, combining the features of unbounded knapsack problem, 0-1 knapsack problem and quadratic knapsack problem.

Quantum optimization algorithms are quantum algorithms that are used to solve optimization problems. Mathematical optimization deals with finding the best solution to a problem from a set of possible solutions. Mostly, the optimization problem is formulated as a minimization problem, where one tries to minimize an error which depends on the solution: the optimal solution has the minimal error. Different optimization techniques are applied in various fields such as mechanics, economics and engineering, and as the complexity and amount of data involved rise, more efficient ways of solving optimization problems are needed. Quantum computing may allow problems which are not practically feasible on classical computers to be solved, or suggest a considerable speed up with respect to the best known classical algorithm.

References

  1. Hochbaum, D. S. (1982). "Heuristics for the fixed cost median problem". Mathematical Programming . 22: 148–162. doi:10.1007/BF01581035. S2CID   3451944.
  2. Guha, S.; Khuller, S. (1999). "Greedy Strikes Back: Improved Facility Location Algorithms". Journal of Algorithms. 31: 228–248. CiteSeerX   10.1.1.47.2033 . doi:10.1006/jagm.1998.0993. S2CID   5363214.
  3. Li, S. (2011). "A 1.488 Approximation Algorithm for the Uncapacitated Facility Location Problem". Automata, Languages and Programming. LNCS. Vol. 6756. pp. 77–88. CiteSeerX   10.1.1.225.6387 . doi:10.1007/978-3-642-22012-8_5. ISBN   978-3-642-22011-1.
  4. Fowler, R. J.; Paterson, M. S.; Tanimoto, S. L. (1981), "Optimal packing and covering in the plane are NP-complete", Information Processing Letters, 12 (3): 133–137, doi:10.1016/0020-0190(81)90111-3 .
  5. Megiddo, Nimrod; Tamir, Arie (1982), "On the complexity of locating linear facilities in the plane" (PDF), Operations Research Letters, 1 (5): 194–197, doi:10.1016/0167-6377(82)90039-6 .
  6. 1 2 3 Gonzalez, Teofilo (1985), "Clustering to minimize the maximum intercluster distance", Theoretical Computer Science , 38: 293–306, doi: 10.1016/0304-3975(85)90224-5 .
  7. 1 2 Feder, Tomás; Greene, Daniel (1988), "Optimal algorithms for approximate clustering", Proceedings of the twentieth annual ACM symposium on Theory of computing - STOC '88, pp. 434–444, doi:10.1145/62212.62255, ISBN   0897912640, S2CID   658151
  8. HWang, R. Z.; Lee, R. C. T.; Chang, R. C. (1993), "The slab dividing approach to solve the Euclidean p-center problem", Algorithmica, 9 (1): 1–22, doi:10.1007/BF01185335, S2CID   5680676
  9. HWang, R. Z.; Chang, R. C.; Lee, R. C. T. (1993), "The generalized searching over separators strategy to solve some NP-Hard problems in subexponential time", Algorithmica, 9 (4): 398–423, doi:10.1007/bf01228511, S2CID   2722869
  10. Bādoiu, Mihai; Har-Peled, Sariel; Indyk, Piotr (2002), "Approximate clustering via core-sets", Proceedings of the thiry-fourth annual ACM symposium on Theory of computing (PDF), pp. 250–257, doi:10.1145/509907.509947, ISBN   1581134959, S2CID   5409535
  11. Kumar, Pankaj; Kumar, Piyush (2010), "Almost optimal solutions to k-clustering problems" (PDF), International Journal of Computational Geometry & Applications, 20 (4): 431–447, doi:10.1142/S0218195910003372
  12. Franco P. Preparata and Michael Ian Shamos (1985). Computational Geometry – An Introduction . Springer-Verlag. ISBN   978-0-387-96131-6. 1st edition; 2nd printing, corrected and expanded, 1988; Russian translation, 1989., p. 256
  13. G. T. Toussaint, "Computing largest empty circles with location constraints," International Journal of Computer and Information Sciences, vol. 12, No. 5, October, 1983, pp. 347–358.
  14. 1 2 Conforti, Michele; Cornuéjols, Gérard; Zambelli, Giacomo (2014). Integer Programming. Graduate Texts in Mathematics. Vol. 271. doi:10.1007/978-3-319-11008-0. ISBN   978-3-319-11007-3.
  15. Discrete location theory. Mirchandani, Pitu B., Francis, R. L. New York: Wiley. 1990. ISBN   9780471892335. OCLC   19810449.{{cite book}}: CS1 maint: others (link)
  16. Ahmadi-Javid, A.; Seyedi, P.; Syam, S. (2017). "A Survey of Healthcare Facility Location". Computers & Operations Research. 79: 223–263. doi:10.1016/j.cor.2016.05.018.
  17. Franco, D. G. B.; Steiner, M. T. A.; Assef, F. M. (2020). "Optimization in waste landfilling partitioning in Paraná State, Brazil". Journal of Cleaner Production. 283: 125353. doi:10.1016/j.jclepro.2020.125353. S2CID   229429742.
  18. Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). The elements of statistical learning (Second ed.). Springer.
  19. Kleinberg, Jon; Tardos, Éva (2006). Algorithm Design. Pearson.