Bubble chart

Last updated
Bubble chart displaying the relationship between poverty and violent and property crime rates by state. Larger bubbles indicate higher percentage of state residents at or below the poverty level. Trend suggests higher crime rates in states with higher percentages of people living below the poverty level. Bubble Chart of Crime versus Poverty in 50 states.jpg
Bubble chart displaying the relationship between poverty and violent and property crime rates by state. Larger bubbles indicate higher percentage of state residents at or below the poverty level. Trend suggests higher crime rates in states with higher percentages of people living below the poverty level.

A bubble chart is a type of chart that displays three dimensions of data. Each entity with its triplet (v1, v2, v3) of associated data is plotted as a disk that expresses two of the vi values through the disk's xy location and the third through its size. Bubble charts can facilitate the understanding of social, economical, medical, and other scientific relationships.

Contents

Bubble charts can be considered a variation of the scatter plot, in which the data points are replaced with bubbles. As the documentation for Microsoft Office explains, "You can use a bubble chart instead of a scatter chart if your data has three data series that each contain a set of values. The sizes of the bubbles are determined by the values in the third data series.". [1]

Choosing bubble sizes correctly

Using bubbles to represent scalar (one-dimensional) values can be misleading. The human visual system most naturally experiences a disk's size in terms of its diameter, rather than area. [2] This is why most charting software requests the radius or diameter of the bubble as the third data value (after horizontal and vertical axis data). Scaling the size of bubbles based on area can be misleading [ibid].

This scaling issue can lead to extreme misinterpretations, especially where the range of the data has a large spread. And because many people are unfamiliar with—or do not stop to consider—the issue and its impact on perception, those who are aware of it often have to hesitate in interpreting a bubble chart because they cannot assume that the scaling correction was indeed made. It is therefore important that bubble charts not only be scaled correctly, but also be clearly labeled to document that it is area, rather than radius or diameter, that conveys the data. [3]

Judgments based on bubble sizes can be problematic regardless of whether area or diameter is used. For example, bubble charts can lead to misinterpretations such as the weighted average illusion, [4] where the sizes of bubbles are taken into account when estimating the mean x- and y-values of the scatterplot. The range of bubble sizes used is often arbitrary. For example, the maximum bubble size is often set to some fraction of the total width of the chart, and therefore will not equal the true measurement value.

Displaying zero or negative data values in bubble charts

The metaphoric representation of data values as disk areas cannot be extended for displaying values that are negative or zero. As a fallback, some users of bubble charts resort to graphic symbology to express nonpositive data values. As an example, a negative value can be represented by a disk of area in which is centered some chosen symbol like "×" to indicate that the size of the bubble represents the absolute value of a negative data value. And this approach can be reasonably effective in situations where data values' magnitudes (absolute values) are themselves somewhat important—in other words, where values of and are similar in some context-specific way—so that their being represented by congruent disks makes sense.

To represent zero-valued data, some users dispense with disks altogether, using, say, a square centered at the appropriate location. Others use full circles for positive, and empty circles for negative values.

A series of bubbles on a map is called a proportional symbol map or sometimes "bubble map" Vs figuratief.PNG
A series of bubbles on a map is called a proportional symbol map or sometimes "bubble map"

Incorporating further dimensions of data

Additional information about the entities beyond their three primary values can often be incorporated by rendering their disks in colors and patterns that are chosen in a systematic way. And, of course, supplemental information can be added by annotating disks with textual information, sometimes as simple as unique identifying labels for cross-referencing to explanatory keys and the like.

Other uses

Circular Packing chart, sometimes called a "bubble chart," showing the proportions of professions of people who create programming languages Bubble chart showing the quantity of professions people ,creating programming languages, have.svg
Circular Packing chart, sometimes called a "bubble chart," showing the proportions of professions of people who create programming languages

See also

Related Research Articles

<span class="mw-page-title-main">Chart</span> Graphical representation of data

A chart is a graphical representation for data visualization, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can represent tabular numeric data, functions or some kinds of quality structure and provides different info.

<span class="mw-page-title-main">Bar chart</span> Type of chart

A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a column chart.

<span class="mw-page-title-main">Scatter plot</span> Plot using the dispersal of scattered dots to show the relationship between variables

A scatter plot, also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram, is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.

<span class="mw-page-title-main">Pie chart</span> Circular statistical graph that illustrates numerical proportion

A pie chart is a circular statistical graphic which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice is proportional to the quantity it represents. While it is named for its resemblance to a pie which has been sliced, there are variations on the way it can be presented. The earliest known pie chart is generally credited to William Playfair's Statistical Breviary of 1801.

A stem-and-leaf display or stem-and-leaf plot is a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution. They evolved from Arthur Bowley's work in the early 1900s, and are useful tools in exploratory data analysis. Stemplots became more commonly used in the 1980s after the publication of John Tukey's book on exploratory data analysis in 1977. The popularity during those years is attributable to their use of monospaced (typewriter) typestyles that allowed computer technology of the time to easily produce the graphics. Modern computers' superior graphic capabilities have meant these techniques are less often used.

<span class="mw-page-title-main">Treemapping</span> Visualisation method for hierchical data

In information visualization and computing, treemapping is a method for displaying hierarchical data using nested figures, usually rectangles.

<span class="mw-page-title-main">Tag cloud</span> Visual representation of word frequency

A tag cloud is a visual representation of text data which is often used to depict keyword metadata on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color. When used as website navigation aids, the terms are hyperlinked to items associated with the tag.

<span class="mw-page-title-main">Data and information visualization</span> Visual representation of data

Data and information visualization is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items. Typically based on data and information collected from a certain domain of expertise, these visualizations are intended for a broader audience to help them visually explore and discover, quickly understand, interpret and gain important insights into otherwise difficult-to-identify structures, relationships, correlations, local and global patterns, trends, variations, constancy, clusters, outliers and unusual groupings within data. When intended for the general public to convey a concise version of known, specific information in a clear and engaging manner, it is typically called information graphics.

<span class="mw-page-title-main">Heat map</span> Data visualization technique

A heat map is a 2-dimensional data visualization technique that represents the magnitude of individual values within a dataset as a color. The variation in color may be by hue or intensity.

<span class="mw-page-title-main">Hyperbolic tree</span> Mathematical tree in the hyperbolic plane

A hyperbolic tree is an information visualization and graph drawing method inspired by hyperbolic geometry.

<span class="mw-page-title-main">Fernanda Viégas</span> Brazilian-American computer scientist (born 1971)

Fernanda Bertini Viégas is a Brazilian computer scientist and graphical designer, whose work focuses on the social, collaborative and artistic aspects of information visualization.

<span class="mw-page-title-main">Line chart</span> Chart type

A line chart or line graph, also known as curve chart, is a type of chart that displays information as a series of data points called 'markers' connected by straight line segments. It is a basic type of chart common in many fields. It is similar to a scatter plot except that the measurement points are ordered and joined with straight line segments. A line chart is often used to visualize a trend in data over intervals of time – a time series – thus the line is often drawn chronologically. In these cases they are known as run charts.

<span class="mw-page-title-main">Radar chart</span> Type of chart

A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point. The relative position and angle of the axes is typically uninformative, but various heuristics, such as algorithms that plot data as the maximal total area, can be applied to sort the variables (axes) into relative positions that reveal distinct correlations, trade-offs, and a multitude of other comparative measures.

<span class="mw-page-title-main">Plot (graphics)</span> Graphical technique for data sets

A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values. Given a scale or ruler, graphs can also be used to read off the value of an unknown variable plotted as a function of a known one, but this can also be done with data presented in tabular form. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.

Martin M. Wattenberg is an American scientist and artist known for his work with data visualization. He is currently the Gordon McKay Professor of Computer Science at the Harvard University School of Engineering and Applied Sciences.

D3.js is a JavaScript library for producing dynamic, interactive data visualizations in web browsers. It makes use of Scalable Vector Graphics (SVG), HTML5, and Cascading Style Sheets (CSS) standards. It is the successor to the earlier Protovis framework. Its development was noted in 2011, as version 2.0.0 was released in August 2011. With the release of version 4.0.0 in June 2016, D3 was changed from a single library into a collection of smaller, modular libraries that can be used independently.

<span class="mw-page-title-main">Misleading graph</span> Graph that misrepresents data

In statistics, a misleading graph, also known as a distorted graph, is a graph that misrepresents data, constituting a misuse of statistics and with the result that an incorrect conclusion may be derived from it.

<span class="mw-page-title-main">Proportional symbol map</span> Thematic map based on symbol size

A proportional symbol map or proportional point symbol map is a type of thematic map that uses map symbols that vary in size to represent a quantitative variable. For example, circles may be used to show the location of cities within the map, with the size of each circle sized proportionally to the population of the city. Typically, the size of each symbol is calculated so that its area is mathematically proportional to the variable, but more indirect methods are also used.

<span class="mw-page-title-main">RAWGraphs</span>

RAWGraphs is a web-based open-source data visualization software made in JavaScript. It employs D3.js for the creation of editable visualizations in SVG format.

<span class="mw-page-title-main">Horizon chart</span> Visual representation of data

A horizon chart or horizon graph is a 2-dimensional data visualisation displaying a quantitative data over a continuous interval, most commonly a time period. The horizon chart is valuable for enabling readers to identify trends and extreme values within large datasets. Similar to sparklines and ridgeline plot, horizon chart may not be the most suitable visualisation for precisely pinpointing specific values. Instead, its strength lies in providing an overview and highlighting patterns and outliers in the data.

References

  1. Present your data in a bubble chart Microsoft Office Online. Accessed 16 August 2015.
  2. Raidvee et al., 2020 "Perception of means, sums, and areas"
  3. Edward Tufte, The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press. (2001, 2nd edition, ISBN   0-9613921-4-2)
  4. Hong, M.-H.; Witt, J.K.; Szafir, D.A. (2022). "The Weighted Average Illusion: Biases in Perceived Mean Position in Scatterplots". IEEE Transactions on Visualization and Computer Graphics. 28 (1): 987–997. arXiv: 2108.03766 . doi:10.1109/TVCG.2021.3114783. ISSN   1077-2626. PMID   34596541. S2CID   236956848.
  5. Bryan Lawson (2004). What Designers Know. Elsevier, 2004. ISBN   0-7506-6448-7. p.44.
  6. 2007: Many Eyes: A Site for Visualization at Internet Scale. Fernanda B. Viégas, Martin Wattenberg, Frank van Ham, Jesse Kriss, Matt McKeon. IEEE Symposium on Information Visualization.
  7. "Circular Packing". d3-graph-gallery.com. Retrieved 9 September 2020.
  8. Carter, Shan. "Four Ways to Slice Obama's 2013 Budget Proposal". New York Times.

Wolfer, Tom (5 May 2017). "Bubble Chart Analytics for Business". LinkedIn. Retrieved 20 July 2018.