Automatic label placement

Automatic label placement (sometimes called text placement or name placement) refers to the computer methods of placing labels automatically on a map or chart. This is related to the typographic design of such labels.

The typical features depicted on a geographic map are line features (e.g. roads), area features (countries, parcels, forests, lakes, etc.), and point features (villages, cities, etc.). In addition to depicting the map's features in a geographically accurate manner, it is of critical importance to place the names that identify these features, in a way that the reader knows instantly which name describes which feature.

Automatic text placement is one of the most difficult, complex, and time-consuming problems in mapmaking and GIS (Geographic Information System). Other kinds of computer-generated graphics - like charts, graphs etc. - require good placement of labels as well, not to mention engineering drawings, and professional programs which produce these drawings and charts, like spreadsheets (e.g. Microsoft Excel) or computational software programs (e.g. Mathematica).

Naively placed labels overlap excessively, resulting in a map that is difficult or even impossible to read. Therefore, a GIS must allow a few possible placements of each label, and often also an option of resizing, rotating, or even removing (suppressing) the label. Then, it selects a set of placements that results in the least overlap, and has other desirable properties. For all but the most trivial setups, the problem is NP-Hard.

Rule-based algorithms

Rule-based algorithms try to emulate an experienced human cartographer. Over centuries, cartographers have developed the art of mapmaking and label placement. For example, an experienced cartographer repeats road names several times for long roads, instead of placing them once, or in the case of Ocean City depicted by a point very close to the shore, the cartographer would place the label "Ocean City" over the water to emphasize that it is a coastal town.^[1]

Cartographers work based on accepted conventions and rules and they place labels in order of importance. For example, New York City, Vienna, Berlin, Paris, or Tokyo must show up on country maps because they are high-priority labels. Once those are placed, the cartographer places the next most important class of labels, for example major roads, rivers, and other large cities. In every step they ensure that (1) the text is placed in a way that the reader easily associates it with the feature, and (2) the label does not overlap with those already placed on the map.

Local optimization algorithms

The simplest greedy algorithm places consecutive labels on the map in positions that result in minimal overlap of labels. Its results are not satisfactory even for very simple problems, but it is extremely fast.

Slightly more complex algorithms rely on local optimization to reach a local optimum of a placement evaluation function - in each iteration placement of a single label is moved to another position, and if it improves the result, the move is preserved. It performs reasonably well for maps that are not too densely labelled. Slightly more complex variations try moving 2 or more labels at the same time. The algorithm ends after reaching some local optimum.

A simple algorithm - simulated annealing - yields good results with relatively good performance. It works like local optimization, but it may keep a change even if it worsens the result. The chance of keeping such a change is $\exp {\frac {-\Delta E}{T}}$ , where $\Delta E$ is the change in the evaluation function, and $T$ is the temperature. The temperature is gradually lowered according to the annealing schedule. When the temperature is high, simulated annealing performs almost random changes to the label placement, being able to escape a local optimum. Later, when hopefully a very good local optimum has been found, it behaves in a manner similar to local optimization. The main challenges in developing a simulated annealing solution are choosing a good evaluation function and a good annealing schedule. Generally too fast cooling will degrade the solution, and too slow cooling will degrade the performance, but the schedule is usually quite a complex algorithm, with more than just one parameter.

Another class of direct search algorithms are the various evolutionary algorithms, e.g. genetic algorithms.

Divide-and-conquer algorithms

One simple optimization that is important on real maps is dividing a set of labels into smaller sets that can be solved independently. Two labels are rivals if they can overlap in one of the possible placements. Transitive closure of this relation divides the set of labels into possibly much smaller sets. On uniformly and densely labelled maps, usually the single set will contain the majority of labels, and on maps for which the labelling is not uniform it may bring very big performance benefits. For example when labelling a map of the world, America is labelled independently from Eurasia etc.

2-satisfiability algorithms

If a map labeling problem can be reduced to a situation in which each remaining label has only two potential positions in which it can be placed, then it may be solved efficiently by using an instance of 2-satisfiability to find a placement avoiding any conflicting pairs of placements; several exact and approximate label placement algorithms for more complex types of problems are based on this principle.^[2]

Intersection graph algorithms

We can create an intersection graph, in which the nodes are all possible labels, and there is an edge between two nodes iff the corresponding labels overlap. An independent set in this graph is a set of non-overlapping labels. The label placement problem is thus to find a maximum independent set - a largest set of non-overlapping labels.

In general, the maximum independent set problem is NP complete, and the best known exact algorithms are exponential (see independent set). However, in geometric intersection graphs of fat objects with bounded thickness, there are sub-exponential algorithms based on geometric separators, with a run time of 2^O(sqrt(n)).^[3]

In general, the maximum independent set problem is hard to approximate. However, in some geometric intersection graphs, efficient approximation algorithms are known.

If the labels are all squares or circles of identical size, there is a polynomial-time approximation scheme based on a simple shifted-grid strategy. It finds a solution within (1-e) of the maximum in time n^O(1/e^2) time and linear space.^[4] The strategy generalizes to any collection of fat objects of roughly the same size (i.e., when the maximum-to-minimum size ratio is bounded by a constant).

If the labels have the same size in all dimensions except one (e.g. rectangles with identical height but varying lengths), it is possible to find a similar approximation by applying dynamic programming along one of the dimensions. This also reduces the time to n^O(1/e).^[5]

shifted-quatree algorithm

^[6] If the labels are fat objects of arbitrary sizes, it is possible to find an approximation of factor (1-e) in time n^O(1/e) using shifted quadtrees.

The key concept of the algorithm is alignment to the quadtree grid. An object of size r is called k-aligned (where k>=1 is a constant) if it is inside a quadtree cell of size at most kr (R <= kr).

By definition, a k-aligned object that intersects the boundary of a quatree cell of size R must have a size of at least R/k (r > R/k). The boundary of a cell of size R can be covered by 4k squares of size R/k; hence the number of fat objects intersecting the boundary of that cell is at most 4kc, where c is a constant measuring the fatness of the labels.

Therefore, if all labels are k-aligned, it is possible to find the exact maximum disjoint set in time n^O(k) using a divide-and-conquer algorithm. Start with a quadtree cell that contains all objects. Then recursively divide it to smaller quadtree cells, find the maximum in each smaller cell, and combine the results to get the maximum in the larger cell. Since the number of disjoint fat labels intersecting the boundary of every quadtree cell is bounded by 4kc, we can simply "guess" which objects intersect the boundary in the optimal solution, and then apply divide-and-conquer to the objects inside.

If almost all labels are k-aligned, we can just discard the labels that are not k-aligned, and find a maximum disjoint set of the remaining labels in time n^O(k). This results in a (1-e) approximation, where e is the fraction of labels that are not k-aligned.

If most labels are not k-aligned, we can try to make them k-aligned by shifting the grid in multiples of (1/k,1/k). First, scale the labels such that they are all contained in the unit square. Then, consider k shifts of the grid: (0,0), (1/k,1/k), (2/k,2/k), ..., ((k-1)/k,(k-1)/k). I.e., for each j in {0,...,k-1}, consider a shift of the grid in (j/k,j/k). It is possible to prove that every label will be 2k-aligned for at least k-2 values of j. Now, for every j, discard the labels that are not k-aligned in the (j/k,j/k) shift, and find a maximum disjoint set of the remaining labels. Call that set A(j). Call the real maximum disjoint set is A*. Then:

$\sum _{j=0..k-1}{|A(j)|}\geq (k-2)|A*|$

Therefore, the largest A(j) has a size of at least: (1-2/k)|A*|. The return value of the algorithm is the largest A(j); the approximation factor is (1-2/k), and the run time is n^O(k). We can make the approximation factor as small as we want, i.e., we have a PTAS.

geometric separator algorithm

A different algorithm achieves a similar approximation ratio based on a geometric separator theorem.^[6]

Other algorithms

Other algorithms are also used, like various graph solutions, integer programming etc.

Notes

^ Slocum, Terry (2010). Thematic Cartography and Geovisualization. Upper Saddle River, NJ: Pearson. p. 576. ISBN 0-13-801006-4.
^ Doddi, Srinivas; Marathe, Madhav V.; Mirzaian, Andy; Moret, Bernard M. E.; Zhu, Binhai (1997), "Map labeling and its generalizations", Proc. 8th ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 148–157; Formann, M.; Wagner, F. (1991), "A packing problem with applications to lettering of maps", Proc. 7th ACM Symp. Computational Geometry, pp. 281–288; Poon, Chung Keung; Zhu, Binhai; Chin, Francis (1998), "A polynomial time solution for labeling a rectilinear map", Information Processing Letters, 65 (4): 201–207, doi:10.1016/S0020-0190(98)00002-7; Wagner, Frank; Wolff, Alexander (1997), "A practical map labeling algorithm", Computational Geometry: Theory and Applications, 7 (5–6): 387–404, doi:10.1016/S0925-7721(96)00007-7.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/0020-0190(87)90206-7, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/0020-0190(87)90206-7 instead., Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1109/sfcs.1998.743449, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1109/sfcs.1998.743449 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/2455.214106, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/2455.214106 instead.
^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/s0925-7721(98)00028-5, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/s0925-7721(98)00028-5 instead.
^ ^a ^b Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/s0196-6774(02)00294-8, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/s0196-6774(02)00294-8 instead.

References

Imhof, E., “Die Anordnung der Namen in der Karte,” Annuaire International de Cartographie II, Orell-Füssli Verlag, Zürich, 93–129, 1962.
Freeman, H., Map data processing and the annotation problem, Proc. 3rd Scandinavian Conf. on Image Analysis, Chartwell-Bratt Ltd. Copenhagen, 1983.
Ahn, J. and Freeman, H., “A program for automatic name placement,” Proc. AUTO-CARTO 6, Ottawa, 1983. 444–455.
Freeman, H., “Computer Name Placement,” ch. 29, in Geographical Information Systems, 1, D.J. Maguire, M.F. Goodchild, and D.W. Rhind, John Wiley, New York, 1991, 449-460.
Podolskaya N. N. Automatic Label De-Confliction Algorithms for Interactive Graphics Applications. Information technologies (ISSN 1684-6400), 9, 2007, p. 45–50. In Russian: Подольская Н.Н. Алгоритмы автоматического отброса формуляров для интерак тивных графических приложений. Информационные технологии, 9, 2007, с. 45-50.

External links

[1] Slocum, Terry (2010). Thematic Cartography and Geovisualization. Upper Saddle River, NJ: Pearson. p. 576. ISBN 0-13-801006-4.

[2] Doddi, Srinivas; Marathe, Madhav V.; Mirzaian, Andy; Moret, Bernard M. E.; Zhu, Binhai (1997), "Map labeling and its generalizations", Proc. 8th ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 148–157; Formann, M.; Wagner, F. (1991), "A packing problem with applications to lettering of maps", Proc. 7th ACM Symp. Computational Geometry, pp. 281–288; Poon, Chung Keung; Zhu, Binhai; Chin, Francis (1998), "A polynomial time solution for labeling a rectilinear map", Information Processing Letters, 65 (4): 201–207, doi:10.1016/S0020-0190(98)00002-7; Wagner, Frank; Wolff, Alexander (1997), "A practical map labeling algorithm", Computational Geometry: Theory and Applications, 7 (5–6): 387–404, doi:10.1016/S0925-7721(96)00007-7.

[3] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/0020-0190(87)90206-7, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/0020-0190(87)90206-7 instead., Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1109/sfcs.1998.743449, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1109/sfcs.1998.743449 instead.

[4] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/2455.214106, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1145/2455.214106 instead.

[5] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/s0925-7721(98)00028-5, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/s0925-7721(98)00028-5 instead.

[Chan2003-6] Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1016/s0196-6774(02)00294-8, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi=10.1016/s0196-6774(02)00294-8 instead.

[1]

[2]

[3]

[4]

[5]

[6]