Treemapping

Last updated
Treemap of soft drink preference in a small group of people. Color and gradients are used to group items, while still identifying individual items. Gradient grouped treemap.jpg
Treemap of soft drink preference in a small group of people. Color and gradients are used to group items, while still identifying individual items.
Treemap showing changes in waiting times for patients of English PCTs. Heatmap incito.png
Treemap showing changes in waiting times for patients of English PCTs.
TreeSize Treemap visualizing hard disk space usage. Tree Map.png
TreeSize Treemap visualizing hard disk space usage.

In information visualization and computing, treemapping is a method for displaying hierarchical data using nested figures, usually rectangles.

Information visualization study of visual representations of data

Information visualization or information visualisation is the study of (interactive) visual representations of abstract data to reinforce human cognition. The abstract data include both numerical and non-numerical data, such as text and geographic information. However, information visualization differs from scientific visualization: "it’s infovis [information visualization] when the spatial representation is chosen, and it’s scivis [scientific visualization] when the spatial representation is given".

Computing activity requiring, benefiting from, or creating computers

Computing is any activity that uses computers. It includes developing hardware and software, and using computers to manage, process, and communicate information for various purposes. Computing is a critically important, integral component of modern industrial technology. Major computing disciplines include computer engineering, software engineering, computer science, information systems, and information technology.

In computing science and informatics, nesting is where information is organized in layers, or where objects contain other similar objects. It almost always refers to self-similar or recursive structures in some sense.

Contents

Main idea

Treemaps display hierarchical (tree-structured) data as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches. A leaf node's rectangle has an area proportional to a specified dimension of the data. Often the leaf nodes are colored to show a separate dimension of the data.

Tree (data structure) Abstract data type simulating a hierarchical tree structure and represented as a set of linked nodes

In computer science, a tree is a widely used abstract data type (ADT)—or data structure implementing this ADT—that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes.

In metadata, dimension is a set of equivalent units of measure, where equivalence between two units of measure is determined by the existence of a quantity preserving one-to-one correspondence between values measured in one unit of measure and values measured in the other unit of measure, independent of context, and where characterizing operations are the same.

When the color and size dimensions are correlated in some way with the tree structure, one can often easily see patterns that would be difficult to spot in other ways, such as if a certain color is particularly relevant. A second advantage of treemaps is that, by construction, they make efficient use of space. As a result, they can legibly display thousands of items on the screen simultaneously.

Tiling algorithms

To create a treemap, one must define a tiling algorithm, that is, a way to divide a region into sub-regions of specified areas. Ideally, a treemap algorithm would create regions that satisfy the following criteria:

Tessellation tiling of a plane using one or more geometric shapes, called tiles, with no overlaps and no gaps

A tessellation of a flat surface is the tiling of a plane using one or more geometric shapes, called tiles, with no overlaps and no gaps. In mathematics, tessellations can be generalized to higher dimensions and a variety of geometries.

Algorithm An unambiguous specification of how to solve a class of problems

In mathematics and computer science, an algorithm is an unambiguous specification of how to solve a class of problems. Algorithms can perform calculation, data processing, automated reasoning, and other tasks.

  1. A small aspect ratio—ideally close to one. Regions with a small aspect ratio (i.e, fat objects) are easier to perceive. [1]
  2. Preserve some sense of the ordering in the input data.
  3. Change to reflect changes in the underlying data.

Unfortunately, these properties have an inverse relationship. As the aspect ratio is optimized, the order of placement becomes less predictable. As the order becomes more stable, the aspect ratio is degraded.[ example needed ]

Rectangular treemaps

To date, six primary rectangular treemap algorithms have been developed:

Treemap algorithms [2]
AlgorithmOrderAspect ratiosStability
BinaryTreepartially orderedhighstable
Mixed Treemaps [3] unorderedloweststable
Ordered and Quantum [4] partially orderedmediummedium stability
Slice And Dice [5] orderedvery highstable
Squarified [6] unordered[ further explanation needed ]lowestmedium stability
Strip [7] orderedmediummedium stability

Convex treemaps

Rectangular treemaps have the disadvantage that their aspect ratio might be arbitrarily high in the worst case. As a simple example, if the tree root has only two children, one with weight and one with weight , then the aspect ratio of the smaller child will be , which can be arbitrarily high. To cope with this problem, several algorithms have been proposed that use regions that are general convex polygons, not necessarily rectangular.

Convex polygon convex hull of a finite set of points in a plane

A convex polygon is a simple polygon in which no line segment between two points on the boundary ever goes outside the polygon. Equivalently, it is a simple polygon whose interior is a convex set. In a convex polygon, all interior angles are less than or equal to 180 degrees, while in a strictly convex polygon all interior angles are strictly less than 180 degrees.

Convex treemaps were developed in several steps, each step improved the upper bound on the aspect ratio. The bounds are given as a function of - the total number of nodes in the tree, and - the total depth of the tree.

1. Onak and Sidiropoulos [8] proved an upper bound of .

2. De-Berg and Onak and Sidiropoulos [9] improve the upper bound to , and prove a lower bound of .

3. De-Berg and Speckmann and van-der-Weele [10] improve the upper bound to , matching the theoretical lower bound.

The latter two algorithms operate in two steps (greatly simplified for clarity):

Orthoconvex treemaps

In convex treemaps, the aspect ratio cannot be constant - it grows with the depth of the tree. To attain a constant aspect-ratio, Orthoconvex treemaps [10] can be used. There, all regions are orthoconvex rectilinear polygons with aspect ratio at most 64; and the leaves are either rectangles with aspect ratio at most 8, or L-shapes or S-shapes with aspect ratio at most 32.

Other treemaps

Voronoi Treemaps [11] - based on Voronoi diagram calculations. The algorithm is iterative and does not give any upper bound on the aspect ratio.

Jigsaw Treemaps [12] - based on the geometry of space-filling curves. They assume that the weights are integers and that their sum is a square number. The regions of the map are rectilinear polygons and highly non-ortho-convex. Their aspect ratio is guaranteed to be at most 4.

GosperMaps [13] - based on the geometry of Gosper curves. It is ordered and stable, but has a very high aspect ratio.

History

Area-based visualizations have existed for decades. For example, mosaic plots (also known as Marimekko diagrams) use rectangular tilings to show joint distributions (i.e., most commonly they are essentially stacked column plots where the columns are of different widths). The main distinguishing feature of a treemap, however, is the recursive construction that allows it to be extended to hierarchical data with any number of levels. This idea was invented by professor Ben Shneiderman at the University of Maryland Human – Computer Interaction Lab in the early 1990s. [14] [2] Shneiderman and his collaborators then deepened the idea by introducing a variety of interactive techniques for filtering and adjusting treemaps.

These early treemaps all used the simple "slice-and-dice" tiling algorithm. Despite many desirable properties (it is stable, preserves ordering, and is easy to implement), the slice-and-dice method often produces tilings with many long, skinny rectangles. In 1994 Mountaz Hascoet & Michel Beaudouin-Lafon invented a "squarifying" algorithm, later popularized by Jarke van Wijk, that created tilings whose rectangles were closer to square. In 1999 Martin Wattenberg used a variation of the "squarifying" algorithm that he called "pivot and slice" to create the first Web-based treemap, the SmartMoney Map of the Market, which displayed data on hundreds of companies in the U.S. stock market. Following its launch, treemaps enjoyed a surge of interest, especially in financial contexts.[ citation needed ]

A third wave of treemap innovation came around 2004, after Marcos Weskamp created the Newsmap, a treemap that displayed news headlines. This example of a non-analytical treemap inspired many imitators, and introduced treemaps to a new, broad audience.[ citation needed ] In recent years, treemaps have made their way into the mainstream media, including usage by the New York Times. [15] [16] The Treemap Art Project produced 12 framed images for the National Academies (United States), shown the Every AlgoRiThm has ART in It exhibit in Washington, DC and another set for the collection of Museum of Modern Art in New York.

Treemap of Benin's exports by product category, 2009. The Product Exports Treemaps are one of the most recent applications of these kind of visualizations, developed by the Harvard-MIT Observatory of Economic Complexity Benin English.png
Treemap of Benin's exports by product category, 2009. The Product Exports Treemaps are one of the most recent applications of these kind of visualizations, developed by the Harvard-MIT Observatory of Economic Complexity

See also

Related Research Articles

A perimeter is a path that surrounds a two-dimensional shape. The term may be used either for the path itself or its length—it can be thought of as the length of the outline of a shape. The perimeter of a circle or ellipse is called its circumference.

Tree structure a way of representing the hierarchical nature of a structure in a graphical form

A tree structure or tree diagram is a way of representing the hierarchical nature of a structure in a graphical form. It is named a "tree structure" because the classic representation resembles a tree, even though the chart is generally upside down compared to an actual tree, with the "root" at the top and the "leaves" at the bottom.

Rectangle Quadrilateral with four right angles

In Euclidean plane geometry, a rectangle is a quadrilateral with four right angles. It can also be defined as an equiangular quadrilateral, since equiangular means that all of its angles are equal. It can also be defined as a parallelogram containing a right angle. A rectangle with four sides of equal length is a square. The term oblong is occasionally used to refer to a non-square rectangle. A rectangle with vertices ABCD would be denoted as  ABCD.

Binary space partitioning Method for recursively subdividing a space into two subsets using hyperplanes

In computer science, binary space partitioning (BSP) is a method for recursively subdividing a space into two convex sets by using hyperplanes as partitions. This process of subdividing gives rise to a representation of objects within the space in the form of a tree data structure known as a BSP tree.

Quadtree geometric data structure based on the subdivision of squares onto four smaller squares

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. The data associated with a leaf cell varies by application, but the leaf cell represents a "unit of interesting spatial information".

Bounding volume

In computer graphics and computational geometry, a bounding volume for a set of objects is a closed volume that completely contains the union of the objects in the set. Bounding volumes are used to improve the efficiency of geometrical operations by using simple volumes to contain more complex objects. Normally, simpler volumes have simpler ways to test for overlap.

R-tree A tree-based datastructure, used to index spatial information

R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons. The R-tree was proposed by Antonin Guttman in 1984 and has found significant use in both theoretical and applied contexts. A common real-world usage for an R-tree might be to store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries such as "Find all museums within 2 km of my current location", "retrieve all road segments within 2 km of my location" or "find the nearest gas station". The R-tree can also accelerate nearest neighbor search for various distance metrics, including great-circle distance.

The art gallery problem or museum problem is a well-studied visibility problem in computational geometry. It originates from a real-world problem of guarding an art gallery with the minimum number of guards who together can observe the whole gallery. In the geometric version of the problem, the layout of the art gallery is represented by a simple polygon and each guard is represented by a point in the polygon. A set of points is said to guard a polygon if, for every point in the polygon, there is some such that the line segment between and does not leave the polygon.

<i>k</i>-d tree multidimensional search tree for points in k dimensional space

In computer science, a k-d tree is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key. k-d trees are a special case of binary space partitioning trees.

Ben Shneiderman American computer scientist

Ben Shneiderman is an American computer scientist, a Distinguished University Professor in the Department of Computer Science, which is part of the University of Maryland College of Computer, Mathematical, and Natural Sciences at the University of Maryland, College Park, and the founding director (1983-2000) of the University of Maryland Human-Computer Interaction Lab. He conducted fundamental research in the field of human–computer interaction, developing new ideas, methods, and tools such as the direct manipulation interface, and his eight rules of design.

Martin M. Wattenberg is an American scientist and artist known for his work with data visualization. Along with Fernanda Viégas, he worked at the Cambridge location of IBM's Thomas J. Watson Research Center as part of the Visual Communication Lab, and created Many Eyes. In April 2010, Wattenberg and Viégas started a new venture called Flowing Media, Inc., to focus on visualization aimed at consumers and mass audiences. Four months later, both of them joined Google as the co-leaders of the Google's "Big Picture" data visualization group in Cambridge, Massachusetts.

The Hive Group is a software company that applies visualization technology in operational intelligence (OI), business intelligence (BI), and complex event processing (CEP) contexts. The company primarily develops enterprise treemapping software used by major corporations and public agencies such as Intel Corporation, Procter & Gamble, Sun Microsystems, the United States Army, the United States Marine Corps, and the United States Coast Guard.

Marc Frons is the Chief Technology Officer of News Corp. He was named to the role in May 2017 after serving in an interim capacity since October 2016. Prior to that he was SVP, Deputy Head of Technology and Global Head of Mobile Platform at the company. Mr. Frons was previously Chief Information Officer of the New York Times and former head of both NYTimes.com Digital Technology and The Times's corporate systems and technology. From July 2006 to March 2012, Mr. Frons served as the Chief Technology Officer of the New York Times. Since 2006, he has overseen technology and product development at NYTimes.com while continuing to be involved in broader digital strategy initiatives at the company. Before he joined The Times, Mr. Frons was the chief technology officer for The Wall Street Journal Online and other Dow Jones consumer Web sites. He is credited for the development of Map of the Market, an innovative financial data visualization interface for smartmoney.com and more recently the latest advancements in the customization algorithms that introduce readers to content based on their archived readings as well as the influx of interactivity within a media rich foundation of NYTimes.com.

Jarke J. (Jack) van Wijk is a Dutch computer scientist, a professor in the Department of Mathematics and Computer Science at the Eindhoven University of Technology, and an expert in information visualization.

In computer science, one approach to the dynamic optimality problem on online algorithms for binary search trees involves reformulating the problem geometrically, in terms of augmenting a set of points in the plane with as few additional points as possible in order to avoid rectangles with only two points on their boundary.

NodeXL

NodeXL Basic is a free and open-source network analysis and visualization software package for Microsoft Excel 2007/2010/2013/2016. NodeXL Pro is a fee-based fully featured version of NodeXL that includes access to social media network data importers, advanced network metrics, and automation. It is a popular package similar to other network visualization tools such as Pajek, UCINet, and Gephi.

A software map represents static, dynamic, and evolutionary information of software systems and their software development processes by means of 2D or 3D map-oriented information visualization. It constitutes a fundamental concept and tool in software visualization, software analytics, and software diagnosis. Its primary applications include risk analysis for and monitoring of code quality, team activity, or software development progress and, generally, improving effectiveness of software engineering with respect to all related artifacts, processes, and stakeholders throughout the software engineering process and software maintenance.

Opaque forest problem

In computational geometry, the opaque forest problem can be stated as follows: "Given a convex polygon C in the plane, determine the minimal forest T of closed, bounded line segments such that every line through C also intersects T". T is said to be the opaque forest, or barrier of C. C is said to be the coverage of T. While any forest that covers C is a barrier of C, we wish to find the one with shortest length.

A covering of a polygon is a set of primitive units whose union equals the polygon. A polygon covering problem is a problem of finding a covering with a smallest number of units for a given polygon. This is an important class of problems in computational geometry. There are many different polygon covering problems, depending on the type of polygon being covered and on the types of units allowed in the covering. An example polygon covering problem is: given a rectilinear polygon, find a smallest set of squares whose union equals the polygon.

A partition of a polygon is a set of primitive units, which do not overlap and whose union equals the polygon. A polygon partition problem is a problem of finding a partition which is minimal in some sense, for example a partition with a smallest number of units or with units of smallest total side-length.

References

  1. Kong, N; Heer, J; Agrawala, M (2010). "Perceptual Guidelines for Creating Rectangular Treemaps". IEEE Transactions on Visualization and Computer Graphics. 16 (6): 990–8. CiteSeerX   10.1.1.688.4140 . doi:10.1109/TVCG.2010.186. PMID   20975136.
  2. 1 2 Ben Shneiderman; Catherine Plaisant (June 25, 2009). "Treemaps for space-constrained visualization of hierarchies ~ Including the History of Treemap Research at the University of Maryland" . Retrieved February 23, 2010.
  3. Roel Vliegen; Erik-Jan van der Linden; Jarke J. van Wijk. "Visualizing Business Data with Generalized Treemaps" (PDF). Archived from the original (PDF) on July 24, 2011. Retrieved February 24, 2010.
  4. Bederson, Benjamin B.; Shneiderman, Ben; Wattenberg, Martin (2002). "Ordered and quantum treemaps: Making effective use of 2D space to display hierarchies". ACM Transactions on Graphics. 21 (4): 833. CiteSeerX   10.1.1.145.2634 . doi:10.1145/571647.571649.
  5. Shneiderman, Ben (2001). "Ordered treemap layouts" (PDF). Infovis: 73.
  6. Bruls, Mark; Huizing, Kees; van Wijk, Jarke J. (2000). "Squarified treemaps". In de Leeuw, W.; van Liere, R. (eds.). Data Visualization 2000: Proc. Joint Eurographics and IEEE TCVG Symp. on Visualization (PDF). Springer-Verlag. pp. 33–42{{inconsistent citations}}.
  7. Benjamin, Bederson; Shneiderman, Ben; Wattenberg, Martin (2002). "Ordered and quantum treemaps: Making effective use of 2D space to display hierarchies" (PDF). AcM Transactions on Graphics (TOG). 21 (4): 833–854. CiteSeerX   10.1.1.145.2634 . doi:10.1145/571647.571649.
  8. Krzysztof Onak; Anastasios Sidiropoulos. "Circular Partitions with Applications to Visualization and Embeddings" . Retrieved June 26, 2011.
  9. Mark de Berg; Onak, Krzysztof; Sidiropoulos, Anastasios (2010). "Fat Polygonal Partitions with Applications to Visualization and Embeddings". arXiv: 1009.1866 [cs.CG].
  10. 1 2 De Berg, Mark; Speckmann, Bettina; Van Der Weele, Vincent (2014). "Treemaps with bounded aspect ratio". Computational Geometry. 47 (6): 683. arXiv: 1012.1749 . doi:10.1016/j.comgeo.2013.12.008.. Conference version: Convex Treemaps with Bounded Aspect Ratio (PDF). EuroCG. 2011.
  11. Balzer, Michael; Deussen, Oliver (2005). "Voronoi Treemaps". In Stasko, John T.; Ward, Matthew O. (eds.). IEEE Symposium on Information Visualization (InfoVis 2005), 23-25 October 2005, Minneapolis, MN, USA (PDF). IEEE Computer Society. p. 7..
  12. Wattenberg, Martin (2005). "A Note on Space-Filling Visualizations and Space-Filling Curves". In Stasko, John T.; Ward, Matthew O. (eds.). IEEE Symposium on Information Visualization (InfoVis 2005), 23-25 October 2005, Minneapolis, MN, USA (PDF). IEEE Computer Society. p. 24..
  13. Auber, David; Huet, Charles; Lambert, Antoine; Renoust, Benjamin; Sallaberry, Arnaud; Saulnier, Agnes (2013). "Gosper Map: Using a Gosper Curve for laying out hierarchical data". IEEE Transactions on Visualization and Computer Graphics. 19 (11): 1820–1832. doi:10.1109/TVCG.2013.91. PMID   24029903..
  14. Shneiderman, Ben (1992). "Tree visualization with tree-maps: 2-d space-filling approach". ACM Transactions on Graphics. 11: 92–99. doi:10.1145/102377.115768.
  15. Cox, Amanda; Fairfield, Hannah (February 25, 2007). "The health of the car, van, SUV, and truck market". The New York Times. Retrieved March 12, 2010.
  16. Carter, Shan; Cox, Amanda (February 14, 2011). "Obama's 2012 Budget Proposal: How $3.7 Trillion is Spent". The New York Times. Retrieved February 15, 2011.