Automatic label placement

Last updated

Automatic label placement, sometimes called text placement or name placement, comprises the computer methods of placing labels automatically on a map or chart. This is related to the typographic design of such labels.

Contents

The typical features depicted on a geographic map are line features (e.g. roads), area features (countries, parcels, forests, lakes, etc.), and point features (villages, cities, etc.). In addition to depicting the map's features in a geographically accurate manner, it is of critical importance to place the names that identify these features, in a way that the reader knows instantly which name describes which feature.

Automatic text placement is one of the most difficult, complex, and time-consuming problems in mapmaking and GIS (Geographic Information System). Other kinds of computer-generated graphics – like charts, graphs etc. – require good placement of labels as well, not to mention engineering drawings, and professional programs which produce these drawings and charts, like spreadsheets (e.g. Microsoft Excel) or computational software programs (e.g. Mathematica).

Naively placed labels overlap excessively, resulting in a map that is difficult or even impossible to read. Therefore, a GIS must allow a few possible placements of each label, and often also an option of resizing, rotating, or even removing (suppressing) the label. Then, it selects a set of placements that results in the least overlap, and has other desirable properties. For all but the most trivial setups, the problem is NP-hard.

Rule-based algorithms

Rule-based algorithms try to emulate an experienced human cartographer. Over centuries, cartographers have developed the art of mapmaking and label placement. For example, an experienced cartographer repeats road names several times for long roads, instead of placing them once, or in the case of Ocean City depicted by a point very close to the shore, the cartographer would place the label "Ocean City" over the land to emphasize that it is a coastal town. [1]

Cartographers work based on accepted conventions and rules, such as those itemized by Swiss cartographer Eduard Imhof in 1962. [2] For example, New York City, Vienna, Berlin, Paris, or Tokyo must show up on country maps because they are high-priority labels. Once those are placed, the cartographer places the next most important class of labels, for example major roads, rivers, and other large cities. In every step they ensure that (1) the text is placed in a way that the reader easily associates it with the feature, and (2) the label does not overlap with those already placed on the map.

However, if a particular label placement problem can be formulated as a mathematical optimization problem, using mathematics to solve the problem is usually better than using a rule-based algorithm. [3]

Local optimization algorithms

The simplest greedy algorithm places consecutive labels on the map in positions that result in minimal overlap of labels. Its results are not perfect even for very simple problems, but it is extremely fast.

Slightly more complex algorithms rely on local optimization to reach a local optimum of a placement evaluation function – in each iteration placement of a single label is moved to another position, and if it improves the result, the move is preserved. It performs reasonably well for maps that are not too densely labelled. Slightly more complex variations try moving 2 or more labels at the same time. The algorithm ends after reaching some local optimum.

A simple algorithm – simulated annealing – yields good results with relatively good performance. It works like local optimization, but it may keep a change even if it worsens the result. The chance of keeping such a change is , where is the change in the evaluation function, and is the temperature. The temperature is gradually lowered according to the annealing schedule. When the temperature is high, simulated annealing performs almost random changes to the label placement, being able to escape a local optimum. Later, when hopefully a very good local optimum has been found, it behaves in a manner similar to local optimization. The main challenges in developing a simulated annealing solution are choosing a good evaluation function and a good annealing schedule. Generally too fast cooling will degrade the solution, and too slow cooling will degrade the performance, but the schedule is usually quite a complex algorithm, with more than just one parameter.

Another class of direct search algorithms are the various evolutionary algorithms, e.g. genetic algorithms.

Divide-and-conquer algorithms

One simple optimization that is important on real maps is dividing a set of labels into smaller sets that can be solved independently. Two labels are rivals if they can overlap in one of the possible placements. Transitive closure of this relation divides the set of labels into possibly much smaller sets. On uniformly and densely labelled maps, usually the single set will contain the majority of labels, and on maps for which the labelling is not uniform it may bring very big performance benefits. For example, when labelling a map of the world, America is labelled independently from Eurasia etc.

2-satisfiability algorithms

If a map labeling problem can be reduced to a situation in which each remaining label has only two potential positions in which it can be placed, then it may be solved efficiently by using an instance of 2-satisfiability to find a placement avoiding any conflicting pairs of placements; several exact and approximate label placement algorithms for more complex types of problems are based on this principle. [4]

Other algorithms

Automatic label placement algorithms can use any of the algorithms for finding the maximum disjoint set from the set of potential labels. Other algorithms can also be used, like various graph solutions, integer programming etc.

Integer Programming

Some versions of the map label placement problem can be formulated as multiple choices integer programming (MCIP) problems where the objective function is to minimize the sum of numerical penalties for moving individual labels away from their optimal placement to avoid overlaps. The problem constraints are that each label be placed in one of a finite number of allowed positions on the map. (Or deleted from the map to allow other labels to be placed.)

A close to optimal solution to this MCIP can usually be found in a practical amount of computer time using Lagrangian relaxation to solve the dual formulation of the optimization problem. [5]

The first commercial solution to the map label problem, formulated as a MCIP problem and solved by Lagrangian relaxation, was to place well and seismic shot point labels on petroleum industry base maps. [6]

Since that first solution was published there have many other mathematical optimization algorithms proposed and used to solve this MCIP for other cartographic applications.

Notes

  1. Slocum, Terry (2010). Thematic Cartography and Geovisualization. Upper Saddle River, NJ: Pearson. p. 576. ISBN   978-0-13-801006-5.
  2. Imhof, Eduard, “Die Anordnung der Namen in der Karte,” Annuaire International de Cartographie II, Orell-Füssli Verlag, Zürich, 93-129, 1962. English Translation: "Positioning Names on Maps," The American Cartographer, V.2 #2 (1975), pp.128-144
  3. Zoraster, Steven, "Expert Systems and the Map Label Placement Problem, Cartographia, https://utpjournals.press/doi/10.3138/P75V-T152-7U53-4170
  4. Doddi, Srinivas; Marathe, Madhav V.; Mirzaian, Andy; Moret, Bernard M. E.; Zhu, Binhai (1997), "Map labeling and its generalizations", Proc. 8th ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 148–157, ISBN   9780898713909 ; Formann, M.; Wagner, F. (1991), "A packing problem with applications to lettering of maps", Proc. 7th ACM Symp. Computational Geometry, pp. 281–288; Poon, Chung Keung; Zhu, Binhai; Chin, Francis (1998), "A polynomial time solution for labeling a rectilinear map", Information Processing Letters, 65 (4): 201–207, doi:10.1016/S0020-0190(98)00002-7 ; Wagner, Frank; Wolff, Alexander (1997), "A practical map labeling algorithm", Computational Geometry: Theory and Applications , 7 (5–6): 387–404, doi:10.1016/S0925-7721(96)00007-7 .
  5. James Bean, "A Lagrangian Algorithm for Multiple Choice Integer Programming", https://www.jstor.org/stable/170661
  6. Zoraster, Steven; Bayer, Stephen. "Practical Experience With a Map Label Placement Program" (PDF). CaGIS. Cartography and Geographic Information Society.

Related Research Articles

<span class="mw-page-title-main">Genetic algorithm</span> Competitive algorithm for searching a problem space

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection. Some examples of GA applications include optimizing decision trees for better performance, solving sudoku puzzles, hyperparameter optimization, causal inference, etc.

<span class="mw-page-title-main">Mathematical optimization</span> Study of mathematical algorithms for optimization problems

Mathematical optimization or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.

<span class="mw-page-title-main">Greedy algorithm</span> Sequence of locally optimal choices

A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time.

<span class="mw-page-title-main">Simulated annealing</span> Probabilistic optimization technique and metaheuristic

Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. For large numbers of local optima, SA can find the global optima. It is often used when the search space is discrete. For problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time, simulated annealing may be preferable to exact algorithms such as gradient descent or branch and bound.

Tabu search (TS) is a metaheuristic search method employing local search methods used for mathematical optimization. It was created by Fred W. Glover in 1986 and formalized in 1989.

<span class="mw-page-title-main">Combinatorial optimization</span> Subfield of mathematical optimization

Combinatorial optimization is a subfield of mathematical optimization that consists of finding an optimal object from a finite set of objects, where the set of feasible solutions is discrete or can be reduced to a discrete set. Typical combinatorial optimization problems are the travelling salesman problem ("TSP"), the minimum spanning tree problem ("MST"), and the knapsack problem. In many such problems, such as the ones previously mentioned, exhaustive search is not tractable, and so specialized algorithms that quickly rule out large parts of the search space or approximation algorithms must be resorted to instead.

In computer science, 2-satisfiability, 2-SAT or just 2SAT is a computational problem of assigning values to variables, each of which has two possible values, in order to satisfy a system of constraints on pairs of variables. It is a special case of the general Boolean satisfiability problem, which can involve constraints on more than two variables, and of constraint satisfaction problems, which can allow more than two choices for the value of each variable. But in contrast to those more general problems, which are NP-complete, 2-satisfiability can be solved in polynomial time.

<span class="mw-page-title-main">Image segmentation</span> Partitioning a digital image into segments

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems with provable guarantees on the distance of the returned solution to the optimal one. Approximation algorithms naturally arise in the field of theoretical computer science as a consequence of the widely believed P ≠ NP conjecture. Under this conjecture, a wide class of optimization problems cannot be solved exactly in polynomial time. The field of approximation algorithms, therefore, tries to understand how closely it is possible to approximate optimal solutions to such problems in polynomial time. In an overwhelming majority of the cases, the guarantee of such algorithms is a multiplicative one expressed as an approximation ratio or approximation factor i.e., the optimal solution is always guaranteed to be within a (predetermined) multiplicative factor of the returned solution. However, there are also many approximation algorithms that provide an additive guarantee on the quality of the returned solution. A notable example of an approximation algorithm that provides both is the classic approximation algorithm of Lenstra, Shmoys and Tardos for scheduling on unrelated parallel machines.

In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, tune, or select a heuristic that may provide a sufficiently good solution to an optimization problem or a machine learning problem, especially with incomplete or imperfect information or limited computation capacity. Metaheuristics sample a subset of solutions which is otherwise too large to be completely enumerated or otherwise explored. Metaheuristics may make relatively few assumptions about the optimization problem being solved and so may be usable for a variety of problems.

In the field of mathematical optimization, Lagrangian relaxation is a relaxation method which approximates a difficult problem of constrained optimization by a simpler problem. A solution to the relaxed problem is an approximate solution to the original problem, and provides useful information.

Multi-objective optimization or Pareto optimization is an area of multiple-criteria decision making that is concerned with mathematical optimization problems involving more than one objective function to be optimized simultaneously. Multi-objective is a type of vector optimization that has been applied in many fields of science, including engineering, economics and logistics where optimal decisions need to be taken in the presence of trade-offs between two or more conflicting objectives. Minimizing cost while maximizing comfort while buying a car, and maximizing performance whilst minimizing fuel consumption and emission of pollutants of a vehicle are examples of multi-objective optimization problems involving two and three objectives, respectively. In practical problems, there can be more than three objectives.

As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems, such as image smoothing, the stereo correspondence problem, image segmentation, object co-segmentation, and many other computer vision problems that can be formulated in terms of energy minimization. Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph. Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph, the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization.

In mathematical optimization and computer science, heuristic is a technique designed for problem solving more quickly when classic methods are too slow for finding an exact or approximate solution, or when classic methods fail to find any exact solution in a search space. This is achieved by trading optimality, completeness, accuracy, or precision for speed. In a way, it can be considered a shortcut.

Guided local search is a metaheuristic search method. A meta-heuristic method is a method that sits on top of a local search algorithm to change its behavior.

Lateral computing is a lateral thinking approach to solving computing problems. Lateral thinking has been made popular by Edward de Bono. This thinking technique is applied to generate creative ideas and solve problems. Similarly, by applying lateral-computing techniques to a problem, it can become much easier to arrive at a computationally inexpensive, easy to implement, efficient, innovative or unconventional solution.

Graduated optimization is a global optimization technique that attempts to solve a difficult optimization problem by initially solving a greatly simplified problem, and progressively transforming that problem until it is equivalent to the difficult optimization problem.

<span class="mw-page-title-main">Typography (cartography)</span> Text used to label maps

Typography, as an aspect of cartographic design, is the craft of designing and placing text on a map in support of the map symbols, together representing geographic features and their properties. It is also often called map labeling or lettering, but typography is more in line with the general usage of typography. Throughout the history of maps to the present, their labeling has been dependent on the general techniques and technologies of typography.

References