Part of a series on Statistics |
Data and information visualization |
---|
Major dimensions |
Important figures |
Information graphic types |
Related topics |
In statistics, a misleading graph, also known as a distorted graph, is a graph that misrepresents data, constituting a misuse of statistics and with the result that an incorrect conclusion may be derived from it.
Graphs may be misleading by being excessively complex or poorly constructed. Even when constructed to display the characteristics of their data accurately, graphs can be subject to different interpretations, or unintended kinds of data can seemingly and ultimately erroneously be derived. [1]
Misleading graphs may be created intentionally to hinder the proper interpretation of data or accidentally due to unfamiliarity with graphing software, misinterpretation of data, or because data cannot be accurately conveyed. Misleading graphs are often used in false advertising. One of the first authors to write about misleading graphs was Darrell Huff, publisher of the 1954 book How to Lie with Statistics .
The field of data visualization describes ways to present information that avoids creating misleading graphs.
[A misleading graph] is vastly more effective, however, because it contains no adjectives or adverbs to spoil the illusion of objectivity, there's nothing anyone can pin on you.
There are numerous ways in which a misleading graph may be constructed. [3]
The use of graphs where they are not needed can lead to unnecessary confusion/interpretation. [4] Generally, the more explanation a graph needs, the less the graph itself is needed. [4] Graphs do not always convey information better than tables. [5]
The use of biased or loaded words in the graph's title, axis labels, or caption may inappropriately prime the reader. [4] [6]
Similarly, attempting to draw trend lines through uncorrelated data may mislead the reader into believing a trend exists where there is none. This can be both the result of intentionally attempting to mislead the reader or due to the phenomenon of illusory correlation.
Comparing data on barcharts is generally much easier. In the image below, it is very hard to tell where the blue sector is bigger than the green sector on the piecharts.
A perspective (3D) pie chart is used to give the chart a 3D look. Often used for aesthetic reasons, the third dimension does not improve the reading of the data; on the contrary, these plots are difficult to interpret because of the distorted effect of perspective associated with the third dimension. The use of superfluous dimensions not used to display the data of interest is discouraged for charts in general, not only for pie charts. [10] In a 3D pie chart, the slices that are closer to the reader appear to be larger than those in the back due to the angle at which they're presented. [11] This effect makes readers less performant in judging the relative magnitude of each slice when using 3D than 2D [12]
Item C appears to be at least as large as Item A in the misleading pie chart, whereas in actuality, it is less than half as large. Item D looks a lot larger than item B, but they are the same size.
Edward Tufte, a prominent American statistician, noted why tables may be preferred to pie charts in The Visual Display of Quantitative Information : [5]
Tables are preferable to graphics for many small data sets. A table is nearly always better than a dumb pie chart; the only thing worse than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between pies – Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used.
Using pictograms in bar graphs should not be scaled uniformly, as this creates a perceptually misleading comparison. [13] The area of the pictogram is interpreted instead of only its height or width. [14] This causes the scaling to make the difference appear to be squared. [14]
In the improperly scaled pictogram bar graph, the image for B is actually 9 times as large as A.
The perceived size increases when scaling.
The effect of improper scaling of pictograms is further exemplified when the pictogram has 3 dimensions, in which case the effect is cubed. [15]
The graph of house sales (left) is misleading. It appears that home sales have grown eightfold in 2001 over the previous year, whereas they have actually grown twofold. Besides, the number of sales is not specified.
An improperly scaled pictogram may also suggest that the item itself has changed in size. [16]
Assuming the pictures represent equivalent quantities, the misleading graph shows that there are more bananas because the bananas occupy the most area and are furthest to the right.
Logarithmic (or log) scales are a valid means of representing data. But when used without being clearly labeled as log scales or displayed to a reader unfamiliar with them, they can be misleading. Log scales put the data values in terms of a chosen number (the base of the log) to a particular power. The base is often e (2.71828...) or 10. For example, log scales may give a height of 1 for a value of 10 in the data and a height of 6 for a value of 1,000,000 (106) in the data. Log scales and variants are commonly used, for instance, for the volcanic explosivity index, the Richter scale for earthquakes, the magnitude of stars, and the pH of acidic and alkaline solutions. Even in these cases, the log scale can make the data less apparent to the eye. Often the reason for the use of log scales is that the graph's author wishes to display vastly different scales on the same axis. Without log scales, comparing quantities such as 1000 (103) versus 109 (1,000,000,000) becomes visually impractical. A graph with a log scale that was not clearly labeled as such, or a graph with a log scale presented to a viewer who did not know logarithmic scales, would generally result in a representation that made data values look of similar size, in fact, being of widely differing magnitudes. Misuse of a log scale can make vastly different values (such as 10 and 10,000) appear close together (on a base-10 log scale, they would be only 1 and 4). Or it can make small values appear to be negative due to how logarithmic scales represent numbers smaller than the base.
Misuse of log scales may also cause relationships between quantities to appear linear whilst those relationships are exponentials or power laws that rise very rapidly towards higher values. It has been stated, although mainly in a humorous way, that "anything looks linear on a log-log plot with thick marker pen" . [17]
Both graphs show an identical exponential function of f(x) = 2x. The graph on the left uses a linear scale, showing clearly an exponential trend. The graph on the right, however uses a logarithmic scale, which generates a straight line. If the graph viewer were not aware of this, the graph would appear to show a linear trend.
A truncated graph (also known as a torn graph) has a y axis that does not start at 0. These graphs can create the impression of important change where there is relatively little change.
While truncated graphs can be used to overdraw differences or to save space, their use is often discouraged. Commercial software such as MS Excel will tend to truncate graphs by default if the values are all within a narrow range, as in this example. To show relative differences in values over time, an index chart can be used. Truncated diagrams will always distort the underlying numbers visually. Several studies found that even if people were correctly informed that the y-axis was truncated, they still overestimated the actual differences, often substantially. [18]
These graphs display identical data; however, in the truncated bar graph on the left, the data appear to show significant differences, whereas, in the regular bar graph on the right, these differences are hardly visible.
There are several ways to indicate y-axis breaks:
Changing the y-axis maximum affects how the graph appears. A higher maximum will cause the graph to appear to have less volatility, less growth, and a less steep line than a lower maximum.
Original graph | Half-width, twice the height | Twice width, half-height |
---|---|---|
Changing the ratio of a graph's dimensions will affect how the graph appears.
The scales of a graph are often used to exaggerate or minimize differences. [19] [20]
Less difference | More difference |
---|---|
The lack of a starting value for the y axis makes it unclear whether the graph is truncated. Additionally, the lack of tick marks prevents the reader from determining whether the graph bars are properly scaled. Without a scale, the visual difference between the bars can be easily manipulated.
Though all three graphs share the same data, and hence the actual slope of the (x, y) data is the same, the way that the data is plotted can change the visual appearance of the angle made by the line on the graph. This is because each plot has a different scale on its vertical axis. Because the scale is not shown, these graphs can be misleading.
The intervals and units used in a graph may be manipulated to create or mitigate change expression. [11]
Graphs created with omitted data remove information from which to base a conclusion.
In the scatter plot with missing categories on the left, the growth appears to be more linear with less variation.
In financial reports, negative returns or data that do not correlate with a positive outlook may be excluded to create a more favorable visual impression.[ citation needed ]
The use of a superfluous third dimension, which does not contain information, is strongly discouraged, as it may confuse the reader. [9]
Graphs are designed to allow easier interpretation of statistical data. However, graphs with excessive complexity can obfuscate the data and make interpretation difficult.
Poorly constructed graphs can make data difficult to discern and thus interpret.
Misleading graphs may be used in turn to extrapolate misleading trends. [21]
Several methods have been developed to determine whether graphs are distorted and to quantify this distortion. [22] [23]
where
A graph with a high lie factor (>1) would exaggerate change in the data it represents, while one with a small lie factor (>0, <1) would obscure change in the data. [24] A perfectly accurate graph would exhibit a lie factor of 1.
where
The graph discrepancy index, also known as the graph distortion index (GDI), was originally proposed by Paul John Steinbart in 1998. GDI is calculated as a percentage ranging from −100% to positive infinity, with zero percent indicating that the graph has been properly constructed and anything outside the ±5% margin is considered to be distorted. [22] Research into the usage of GDI as a measure of graphics distortion has found it to be inconsistent and discontinuous, making the usage of GDI as a measurement for comparisons difficult. [22]
The data-ink ratio should be relatively high. Otherwise, the chart may have unnecessary graphics. [24]
The data density should be relatively high, otherwise a table may be better suited for displaying the data. [24]
Graphs are useful in the summary and interpretation of financial data. [25] Graphs allow trends in large data sets to be seen while also allowing the data to be interpreted by non-specialists. [25] [26]
Graphs are often used in corporate annual reports as a form of impression management. [27] In the United States, graphs do not have to be audited, as they fall under AU Section 550 Other Information in Documents Containing Audited Financial Statements. [27]
Several published studies have looked at the usage of graphs in corporate reports for different corporations in different countries and have found frequent usage of improper design, selectivity, and measurement distortion within these reports. [27] [28] [29] [30] [31] [32] [33] The presence of misleading graphs in annual reports has led to requests for standards to be set. [34] [35] [36]
Research has found that while readers with poor levels of financial understanding have a greater chance of being misinformed by misleading graphs, [37] even those with financial understanding, such as loan officers, may be misled. [34]
The perception of graphs is studied in psychophysics, cognitive psychology, and computational visions. [38]
A logarithmic scale is a method used to display numerical data that spans a broad range of values, especially when there are significant differences between the magnitudes of the numbers involved.
A chart is a graphical representation for data visualization, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can represent tabular numeric data, functions or some kinds of quality structure and provides different info.
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a column chart.
A scatter plot, also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram, is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.
A pie chart is a circular statistical graphic which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice is proportional to the quantity it represents. While it is named for its resemblance to a pie which has been sliced, there are variations on the way it can be presented. The earliest known pie chart is generally credited to William Playfair's Statistical Breviary of 1801.
Chartjunk consists of all visual elements in charts and graphs that are not necessary to comprehend the information represented on the graph, or that distract the viewer from this information.
Infographics are graphic visual representations of information, data, or knowledge intended to present information quickly and clearly. They can improve cognition by using graphics to enhance the human visual system's ability to see patterns and trends. Similar pursuits are information visualization, data visualization, statistical graphics, information design, or information architecture. Infographics have evolved in recent years to be for mass communication, and thus are designed with fewer assumptions about the readers' knowledge base than other types of visualizations. Isotypes are an early example of infographics conveying information quickly and easily to the masses.
Data and information visualization is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items. Typically based on data and information collected from a certain domain of expertise, these visualizations are intended for a broader audience to help them visually explore and discover, quickly understand, interpret and gain important insights into otherwise difficult-to-identify structures, relationships, correlations, local and global patterns, trends, variations, constancy, clusters, outliers and unusual groupings within data. When intended for the general public to convey a concise version of known, specific information in a clear and engaging manner, it is typically called information graphics.
In statistics, the frequency or absolute frequency of an event is the number of times the observation has occurred/recorded in an experiment or study. These frequencies are often depicted graphically or in tabular form.
A line chart or line graph, also known as curve chart, is a type of chart that displays information as a series of data points called 'markers' connected by straight line segments. It is a basic type of chart common in many fields. It is similar to a scatter plot except that the measurement points are ordered and joined with straight line segments. A line chart is often used to visualize a trend in data over intervals of time – a time series – thus the line is often drawn chronologically. In these cases they are known as run charts.
A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point. The relative position and angle of the axes is typically uninformative, but various heuristics, such as algorithms that plot data as the maximal total area, can be applied to sort the variables (axes) into relative positions that reveal distinct correlations, trade-offs, and a multitude of other comparative measures.
A dot chart or dot plot is a statistical chart consisting of data points plotted on a fairly simple scale, typically using filled in circles. There are two common, yet very different, versions of the dot chart. The first has been used in hand-drawn graphs to depict distributions going back to 1884. The other version is described by William S. Cleveland as an alternative to the bar chart, in which dots are used to depict the quantitative values associated with categorical variables.
JFreeChart is an open-source framework for the programming language Java, which allows the creation of a wide variety of both interactive and non-interactive charts.
A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values. Given a scale or ruler, graphs can also be used to read off the value of an unknown variable plotted as a function of a known one, but this can also be done with data presented in tabular form. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.
A bubble chart is a type of chart that displays three dimensions of data. Each entity with its triplet (v1, v2, v3) of associated data is plotted as a disk that expresses two of the vi values through the disk's xy location and the third through its size. Bubble charts can facilitate the understanding of social, economical, medical, and other scientific relationships.
Microsoft Office shared tools are software components that are included in all Microsoft Office products.
Estimation statistics, or simply estimation, is a data analysis framework that uses a combination of effect sizes, confidence intervals, precision planning, and meta-analysis to plan experiments, analyze data and interpret results. It complements hypothesis testing approaches such as null hypothesis significance testing (NHST), by going beyond the question is an effect present or not, and provides information about how large an effect is. Estimation statistics is sometimes referred to as the new statistics.
Graphical perception is the human capacity for visually interpreting information on graphs and charts. Both quantitative and qualitative information can be said to be encoded into the image, and the human capacity to interpret it is sometimes called decoding. The importance of human graphical perception, what we discern easily versus what our brains have more difficulty decoding, is fundamental to good statistical graphics design, where clarity, transparency, accuracy and precision in data display and interpretation are essential for understanding the translation of data in a graph to clarify and interpret the science.
Looker Studio, formerly Google Data Studio, is an online tool for converting data into customizable, informative reports and dashboards. Looker Studio was announced by Google on March 15, 2016 as part of the enterprise Google Analytics 360 suite, and a free version was made available for individuals and small teams in May 2016.
A horizon chart or horizon graph is a 2-dimensional data visualization displaying a quantitative data over a continuous interval, most commonly a time period. The horizon chart is valuable for enabling readers to identify trends and extreme values within large datasets. Similar to sparklines and ridgeline plot, horizon chart may not be the most suitable visualization for precisely pinpointing specific values. Instead, its strength lies in providing an overview and highlighting patterns and outliers in the data.
{{cite web}}
: CS1 maint: bot: original URL status unknown (link){{cite journal}}
: CS1 maint: bot: original URL status unknown (link){{cite web}}
: CS1 maint: bot: original URL status unknown (link)