Jenks natural breaks optimization

Last updated

The Jenks optimization method, also called the Jenks natural breaks classification method, is a data clustering method designed to determine the best arrangement of values into different classes. This is done by seeking to minimize each class's average deviation from the class mean, while maximizing each class's deviation from the means of the other classes. In other words, the method seeks to reduce the variance within classes and maximize the variance between classes. [1] [2]

Contents

The Jenks optimization method is directly related to Otsu's Method and Fisher's Discriminant Analysis.

History

George Frederick Jenks

George Frederick Jenks was a 20th-century American cartographer. Graduating with his Ph.D. in agricultural geography from Syracuse University in 1947, Jenks began his career under the tutelage of Richard Harrison, cartographer for Time and Fortune magazine. [3] He joined the faculty of the University of Kansas in 1949 and began to build the cartography program. During his 37-year tenure at KU, Jenks developed the Cartography program into one of three programs renowned for their graduate education in the field; the others being the University of Wisconsin and the University of Washington. Much of his time was spent developing and promoting improved cartographic training techniques and programs. He also spent significant time investigating three-dimensional maps, eye-movement research, thematic map communication, and geostatistics. [2] [3] [4]

Background and development

Jenks was a cartographer by profession. His work with statistics grew out of a desire to make choropleth maps more visually accurate for the viewer. In his paper, The Data Model Concept in Statistical Mapping, he claims that by visualizing data in a three dimensional model cartographers could devise a “systematic and rational method for preparing choroplethic maps”. [1] Jenks used the analogy of a “blanket of error” to describe the need to use elements other than the mean to generalize data. The three dimensional models were created to help Jenks visualize the difference between data classes. His aim was to generalize the data using as few planes as possible and maintain a constant “blanket of error”.

Description of method

The method requires an iterative process. That is, calculations must be repeated using different breaks in the dataset to determine which set of breaks has the smallest in-class variance. The process is started by dividing the ordered data into classes in some way which may be arbitrary. There are two steps that must be repeated:

  1. Calculate the sum of squared deviations from the class means (SDCM).
  2. Choose a new way of dividing the data into classes, perhaps by moving one or more data points from one class to a different one.

New class deviations are then calculated, and the process is repeated until the sum of the within class deviations reaches a minimal value. [1] [5]

Alternatively, all break combinations may be examined, SDCM calculated for each combination, and the combination with the lowest SDCM selected. Since all break combinations are examined, this guarantees that the one with the lowest SDCM is found.

Finally the sum of squared deviations from the mean of the complete data set(SDAM), and the goodness of variance fit (GVF) may be calculated. GVF is defined as (SDAM - SDCM) / SDAM. GVF ranges from 0 (worst fit) to 1 (perfect fit).

Use in cartography

Choropleth map showing estimated percent of the population below 150% poverty in the Contiguous United States by county, 2020 that uses the Jenks natural breaks classification USA Contiguous Poverty 2020.jpg
Choropleth map showing estimated percent of the population below 150% poverty in the Contiguous United States by county, 2020 that uses the Jenks natural breaks classification

Jenks’ goal in developing this method was to create a map that was absolutely accurate, in terms of the representation of data's spatial attributes. By following this process, Jenks claims, the “blanket of error” can be uniformly distributed across the mapped surface. He developed this with the intention of using relatively few data classes, less than seven, because that was the limit when using monochromatic shading on a choroplethic map. [1]

The Jenks classification method is commonly used in thematic maps, especially choropleth maps, as one of several available classification methods. When making choropleth maps, the Jenks classification method can be advantageous because if there are clusters in the data values, it will identify them. In fact, in current versions of ArcGIS software from Esri, Jenks is the default classification method. However, the Jenks classification is not recommended for data that have a low variance. The Jenks natural breaks in the data are used to provide a more meaningful visualization of map data based on the "natural breaks" in the data identified by the iterative process.

Alternative methods

Other methods of data classification include Head/tail Breaks, Natural Breaks (without Jenks Optimization), Equal Interval, Quantile, and Standard Deviation.

Further reading

See also

Related Research Articles

<span class="mw-page-title-main">Cluster analysis</span> Grouping a set of objects by similarity

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups (clusters). It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning.

<span class="mw-page-title-main">Cartogram</span> Map distorting size to show another value

A cartogram is a thematic map of a set of features, in which their geographic size is altered to be directly proportional to a selected variable, such as travel time, population, or Gross National Product. Geographic space itself is thus warped, sometimes extremely, in order to visualize the distribution of the variable. It is one of the most abstract types of map; in fact, some forms may more properly be called diagrams. They are primarily used to display emphasis and for analysis as nomographs.

<span class="mw-page-title-main">Waldo R. Tobler</span> American geographer

Waldo Rudolph Tobler was an American-Swiss geographer and cartographer. Tobler is regarded as one of the most influential geographers and cartographers of the late 20th century and early 21st century. He is most well known for coining what has come to be referred to as Tobler's first law of geography. He also coined what has come to be referred to as Tobler's second law of geography.

<span class="mw-page-title-main">Choropleth map</span> Type of data visualization for geographic regions

A choropleth map is a type of statistical thematic map that uses pseudocolor, meaning color corresponding with an aggregate summary of a geographic characteristic within spatial enumeration units, such as population density or per-capita income.

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances, but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.

<span class="mw-page-title-main">Map coloring</span> Differentiating different features of a map using different colours.

In cartographic design, map coloring is the act of choosing colors as a form of map symbol to be used on a map.

<span class="mw-page-title-main">Thematic map</span> Type of map that visualizes data

A thematic map is a type of map that portrays the geographic pattern of a particular subject matter (theme) in a geographic area. This usually involves the use of map symbols to visualize selected properties of geographic features that are not naturally visible, such as temperature, language, or population. In this, they contrast with general reference maps, which focus on the location of a diverse set of physical features, such as rivers, roads, and buildings. Alternative names have been suggested for this class, such as special-subject or special-purpose maps, statistical maps, or distribution maps, but these have generally fallen out of common usage. Thematic mapping is closely allied with the field of Geovisualization.

<span class="mw-page-title-main">Terrain cartography</span> Representation of surface shape on maps

Terrain cartography or relief mapping is the depiction of the shape of the surface of the Earth on a map, using one or more of several techniques that have been developed. Terrain or relief is an essential aspect of physical geography, and as such its portrayal presents a central problem in cartographic design, and more recently geographic information systems and geovisualization.

<span class="mw-page-title-main">Multivariate map</span> Thematic map visualizing multiple variables

A bivariate map or multivariate map is a type of thematic map that displays two or more variables on a single map by combining different sets of symbols. Each of the variables is represented using a standard thematic map technique, such as choropleth, cartogram, or proportional symbols. They may be the same type or different types, and they may be on separate layers of the map, or they may be combined into a single multivariate symbol.

MacChoro was a computer program for choropleth mapping developed for early versions of the Apple Macintosh computer. A choropleth map shades areas, such as states or counties, to represent values and is mainly used for the mapping of statistical data. Released in 1986, MacChoro was the first computer mapping program to implement Macintosh's point-and-click user interface for the analysis and production of thematic maps. MacChoro II, released in 1988, was the first program to incorporate interaction in animated mapping.

Cartographic generalization, or map generalization, includes all changes in a map that are made when one derives a smaller-scale map from a larger-scale map or map data. It is a core part of cartographic design. Whether done manually by a cartographer or by a computer or set of algorithms, generalization seeks to abstract spatial information at a high level of detail to information that can be rendered on a map at a lower level of detail.

Determining the number of clusters in a data set, a quantity often labelled k as in the k-means algorithm, is a frequent problem in data clustering, and is a distinct issue from the process of actually solving the clustering problem.

Multispectral remote sensing is the collection and analysis of reflected, emitted, or back-scattered energy from an object or an area of interest in multiple bands of regions of the electromagnetic spectrum. Subcategories of multispectral remote sensing include hyperspectral, in which hundreds of bands are collected and analyzed, and ultraspectral remote sensing where many hundreds of bands are used. The main purpose of multispectral imaging is the potential to classify the image using multispectral classification. This is a much faster method of image analysis than is possible by human interpretation.

<span class="mw-page-title-main">Dot distribution map</span> Thematic map using dots to visualize distribution

A dot distribution map is a type of thematic map that uses a point symbol to visualize the geographic distribution of a large number of related phenomena. Dot maps are a type of unit visualizations that rely on a visual scatter to show spatial patterns, especially variances in density. The dots may represent the actual locations of individual phenomena, or be randomly placed in aggregation districts to represent a number of individuals. Although these two procedures, and their underlying models, are very different, the general effect is the same.

oneAPI Data Analytics Library, is a library of optimized algorithmic building blocks for data analysis stages most commonly associated with solving Big Data problems.

<span class="mw-page-title-main">Cartographic design</span> Process of designing maps

Cartographic design or map design is the process of crafting the appearance of a map, applying the principles of design and knowledge of how maps are used to create a map that has both aesthetic appeal and practical function. It shares this dual goal with almost all forms of design; it also shares with other design, especially graphic design, the three skill sets of artistic talent, scientific reasoning, and technology. As a discipline, it integrates design, geography, and geographic information science.

<span class="mw-page-title-main">Proportional symbol map</span> Thematic map based on symbol size

A proportional symbol map or proportional point symbol map is a type of thematic map that uses map symbols that vary in size to represent a quantitative variable. For example, circles may be used to show the location of cities within the map, with the size of each circle sized proportionally to the population of the city. Typically, the size of each symbol is calculated so that its area is mathematically proportional to the variable, but more indirect methods are also used.

Land cover maps are tools that provide vital information about the Earth's land use and cover patterns. They aid policy development, urban planning, and forest and agricultural monitoring.

George Frederick Jenks (1916–1996) was an American geographer known for his significant contributions to cartography and geographic information systems (GIS). With a career spanning over three decades, Jenks played a vital role in advancing map-making technologies, was instrumental in enhancing the visualization of spatial data, and played foundational roles in developing modern cartographic curricula. The Jenks natural breaks optimization, based on his work, is still widely used in the creation of thematic maps, such as choropleth maps.

References

  1. 1 2 3 4 Jenks, George F. 1967. "The Data Model Concept in Statistical Mapping", International Yearbook of Cartography 7: 186–190.
  2. 1 2 McMaster, Robert, "In Memoriam: George F. Jenks (1916–1996)". Cartography and Geographic Information Science. 24(1) p.56-59.
  3. 1 2 McMaster, Robert and McMaster, Susanna. 2002. “A History of Twentieth-Century American Academic Cartography”, Cartography and Geographic Information Science. 29(3) p.312-315.
  4. CSUN Cartography Specialty Group, Winter 1997 Newsletter Archived 2010-06-07 at the Wayback Machine
  5. ESRI FAQ, What is the Jenks Optimization method Archived 2007-11-16 at the Wayback Machine .
  6. "Chapter 9". Archived from the original on 2004-08-21.