Plot (graphics)

Last updated February 06, 2025

Scatterplot of the eruption interval for the Old Faithful geyser Oldfaithful3.png — Scatterplot of the eruption interval for the Old Faithful geyser

A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values. Given a scale or ruler, graphs can also be used to read off the value of an unknown variable plotted as a function of a known one, but this can also be done with data presented in tabular form. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.

Overview

Plots play an important role in statistics and data analysis. The procedures here can broadly be split into two parts: quantitative and graphical. Quantitative techniques are a set of statistical procedures that yield numeric or tabular output. Examples of quantitative techniques include:^[1]

hypothesis testing
analysis of variance
point estimates and confidence intervals
least squares regression

These and similar techniques are all valuable and are mainstream in terms of classical analysis. There are also many statistical tools generally referred to as graphical techniques. These include:^[1]

Graphical procedures such as plots are a short path to gaining insight into a data set in terms of testing assumptions, model selection, model validation, estimator selection, relationship identification, factor effect determination, outlier detection. Statistical graphics give insight into aspects of the underlying structure of the data.^[1]

Graphs can also be used to solve some mathematical equations, typically by finding where two plots intersect.

Types of plots

Biplot : These are a type of graph used in statistics. A biplot allows information on both samples and variables of a data matrix to be displayed graphically. Samples are displayed as points while variables are displayed either as vectors, linear axes or nonlinear trajectories. In the case of categorical variables, category level points may be used to represent the levels of a categorical variable. A generalised biplot displays information on both continuous and categorical variables.
Bland–Altman plot : In analytical chemistry and biostatistics this plot is a method of data plotting used in analysing the agreement between two different assays. It is identical to a Tukey mean-difference plot, which is what it is still known as in other fields, but was popularised in medical statistics by Bland and Altman.^[2]^[3]
Bode plots are used in control theory.
Box plot : In descriptive statistics, a boxplot, also known as a box-and-whisker diagram or plot, is a convenient way of graphically depicting groups of numerical data through their five-number summaries (the smallest observation, lower quartile (Q1), median (Q2), upper quartile (Q3), and largest observation). A boxplot may also indicate which observations, if any, might be considered outliers.
Carpet plot : A two-dimensional plot that illustrates the interaction between two and three independent variables and one to three dependent variables.
Comet plot : A two- or three-dimensional animated plot in which the data points are traced on the screen.
Contour plot : A two-dimensional plot which shows the one-dimensional curves, called contour lines on which the plotted quantity q is a constant. Optionally, the plotted values can be color-coded.
Dalitz plot : This a scatterplot often used in particle physics to represent the relative frequency of various (kinematically distinct) manners in which the products of certain (otherwise similar) three-body decays may move apart
Drain plot : A two-dimensional plot where the data are presented in a hierarchy with multiple levels. The levels are nested in the sense that the pieces in each pie chart add up to 100%. A waterfall or waterdrop metaphor is used to link each layer to the one below visually conveying the hierarchical structure. Drain Plot.^[4]

Funnel plot : This is a useful graph designed to check the existence of publication bias in meta-analyses. Funnel plots, introduced by Light and Pillemer in 1994^[5] and discussed in detail by Egger and colleagues,^[6] are useful adjuncts to meta-analyses. A funnel plot is a scatterplot of treatment effect against a measure of study size. It is used primarily as a visual aid to detecting bias or systematic heterogeneity.
Dot plot (statistics) : A dot chart or dot plot is a statistical chart consisting of group of data points plotted on a simple scale. Dot plots are used for continuous, quantitative, univariate data. Data points may be labelled if there are few of them. Dot plots are one of the simplest plots available, and are suitable for small to moderate sized data sets. They are useful for highlighting clusters and gaps, as well as outliers.
Forest plot : is a graphical display that shows the strength of the evidence in quantitative scientific studies. It was developed for use in medical research as a means of graphically representing a meta-analysis of the results of randomized controlled trials. In the last twenty years, similar meta-analytical techniques have been applied in observational studies (e.g. environmental epidemiology) and forest plots are often used in presenting the results of such studies also.

Galbraith plot : In statistics, a Galbraith plot (also known as Galbraith's radial plot or just radial plot), is one way of displaying several estimates of the same quantity that have different standard errors.^[7] It can be used to examine heterogeneity in a meta-analysis, as an alternative or supplement to a forest plot.
Heat map
Lollipop plot
Nichols plot : This is a graph used in signal processing in which the logarithm of the magnitude is plotted against the phase of a frequency response on orthogonal axes.
Normal probability plot : The normal probability plot is a graphical technique for assessing whether or not a data set is approximately normally distributed. The data are plotted against a theoretical normal distribution in such a way that the points should form an approximate straight line. Departures from this straight line indicate departures from normality. The normal probability plot is a special case of the probability plot.
Nyquist plot : Plot is used in automatic control and signal processing for assessing the stability of a system with feedback. It is represented by a graph in polar coordinates in which the gain and phase of a frequency response are plotted. The plot of these phasor quantities shows the phase as the angle and the magnitude as the distance from the origin.

Partial regression plot : In applied statistics, a partial regression plot attempts to show the effect of adding another variable to the model (given that one or more independent variables are already in the model). Partial regression plots are also referred to as added variable plots, adjusted variable plots, and individual coefficient plots.
Partial residual plot : In applied statistics, a partial residual plot is a graphical technique that attempts to show the relationship between a given independent variable and the response variable given that other independent variables are also in the model.
Probability plot : The probability plot is a graphical technique for assessing whether or not a data set follows a given distribution such as the normal or Weibull, and for visually estimating the location and scale parameters of the chosen distribution. The data are plotted against a theoretical distribution in such a way that the points should form approximately a straight line. Departures from this straight line indicate departures from the specified distribution.
Ridgeline plot: Several line plots, vertically stacked and slightly overlapping.
Q–Q plot : In statistics, a Q–Q plot (Q stands for quantile) is a graphical method for diagnosing differences between the probability distribution of a statistical population from which a random sample has been taken and a comparison distribution. An example of the kind of differences that can be tested for is non-normality of the population distribution.
Recurrence plot : In descriptive statistics and chaos theory, a recurrence plot (RP) is a plot showing, for a given moment in time, the times at which a phase space. In other words, it is a graph of

{\vec {x}}(i)\approx {\vec {x}}(j),\,

showing

i

on a horizontal axis and

j

on a vertical axis, where

{\vec {x}}

is a phase space trajectory.

Scatterplot : A scatter graph or scatter plot is a type of display using variables for a set of data. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.^[8]
Shmoo plot : In electrical engineering, a shmoo plot is a graphical display of the response of a component or system varying over a range of conditions and inputs. Often used to represent the results of the testing of complex electronic systems such as computers, ASICs or microprocessors. The plot usually shows the range of conditions in which the device under test will operate.
Spaghetti plots are a method of viewing data to visualize possible flows through systems. Flows depicted in this manner appear like noodles, hence the coining of this term.^[9] This method of statistics was first used to track routing through factories. Visualizing flow in this manner can reduce inefficiency within the flow of a system.

Weibull probability plot
A normal Q–Q plot
Scatterplot
Spaghetti plot

Stemplot : A stemplot (or stem-and-leaf plot), in statistics, is a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution. They evolved from Arthur Bowley's work in the early 1900s, and are useful tools in exploratory data analysis. Unlike histograms, stemplots retain the original data to at least two significant digits, and put the data in order, thereby easing the move to order-based inference and non-parametric statistics.
Star plot : A graphical method of displaying multivariate data. Each star represents a single observation. Typically, star plots are generated in a multi-plot format with many stars on each page and each star representing one observation.
Surface plot : In this visualization of the graph of a bivariate function, a surface is plotted to fit a set of data triplets (X, Y, Z), where Z if obtained by the function to be plotted Z=f(X, Y). Usually, the set of X and Y values are equally spaced. Optionally, the plotted values can be color-coded.

Star plot
Surface plot

Ternary plot : A ternary plot, ternary graph, triangle plot, simplex plot, or de Finetti diagram is a barycentric plot on three variables which sum to a constant. It graphically depicts the ratios of the three variables as positions in an equilateral triangle. It is used in petrology, mineralogy, metallurgy, and other physical sciences to show the compositions of systems composed of three species. In population genetics, it is often called a de Finetti diagram. In game theory, it is often called a simplex plot.
Vector field : Vector field plots (or quiver plots) show the direction and the strength of a vector associated with a 2D or 3D points. They are typically used to show the strength of the gradient over the plane or a surface area.
Violin plot : Violin plots are a method of plotting numeric data. They are similar to box plots, except that they also show the probability density of the data at different values (in the simplest case this could be a histogram). Typically violin plots will include a marker for the median of the data and a box indicating the interquartile range, as in standard box plots. Overlaid on this box plot is a kernel density estimation. Violin plots are available as extensions to a number of software packages, including R through the vioplot library, and Stata through the vioplot add-in.^[10]

Plots for specific quantities

Arrhenius plot : This plot compares the logarithm of a reaction rate ( $\ln(k)$ , ordinate axis) plotted against inverse temperature ( $1/T$ , abscissa). Arrhenius plots are often used to analyze the effect of temperature on the rates of chemical reactions.
Dot plot (bioinformatics) : This plot compares two biological sequences and is a graphical method that allows the identification of regions of close similarity between them. It is a kind of recurrence plot.
Lineweaver–Burk plot : This plot compares the reciprocals of reaction rate and substrate concentration. It is used to represent and determine enzyme kinetics.

3D plots

SteamTube plot

Examples

Types of graphs and their uses vary very widely. A few typical examples are:

Simple graph: Supply and demand curves, simple graphs used in economics to relate supply and demand to price. The graphs can be used together to determine the economic equilibrium (essentially, to solve an equation).
Simple graph used for reading values: the bell-shaped normal or Gaussian probability distribution, from which, for example, the probability of a man's height being in a specified range can be derived, given data for the adult male population.
Very complex graph: the psychrometric chart, relating temperature, pressure, humidity, and other quantities.
Non-rectangular coordinates: the above all use two-dimensional rectangular coordinates; an example of a graph using polar coordinates, sometimes in three dimensions, is the antenna radiation pattern chart, which represents the power radiated in all directions by an antenna of specified type.

Related Research Articles

A descriptive statistic is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics is the process of using and analysing those statistics. Descriptive statistics is distinguished from inferential statistics by its aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. This generally means that descriptive statistics, unlike inferential statistics, is not developed on the basis of probability theory, and are frequently nonparametric statistics. Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example, in papers reporting on human subjects, typically a table is included giving the overall sample size, sample sizes in important subgroups, and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex, the proportion of subjects with related co-morbidities, etc.

A histogram is a visual representation of the distribution of quantitative data. To construct a histogram, the first step is to "bin" the range of values— divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) are adjacent and are typically of equal size.

In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference between the 75th and 25th percentiles of the data. To calculate the IQR, the data set is divided into quartiles, or four rank-ordered even parts via linear interpolation. These quartiles are denoted by Q₁ (also called the lower quartile), Q₂ (the median), and Q₃ (also called the upper quartile). The lower quartile corresponds with the 25th percentile and the upper quartile corresponds with the 75th percentile, so IQR = Q₃ − Q₁_.

<span class="mw-page-title-main">Box plot</span> Data visualization

In descriptive statistics, a box plot or boxplot is a method for demonstrating graphically the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines extending from the box indicating variability outside the upper and lower quartiles, thus, the plot is also called the box-and-whisker plot and the box-and-whisker diagram. Outliers that differ significantly from the rest of the dataset may be plotted as individual points beyond the whiskers on the box-plot. Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution. The spacings in each subsection of the box-plot indicate the degree of dispersion (spread) and skewness of the data, which are usually described using the five-number summary. In addition, the box-plot allows one to visually estimate various L-estimators, notably the interquartile range, midhinge, range, mid-range, and trimean. Box plots can be drawn either horizontally or vertically.

A chart is a graphical representation for data visualization, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can represent tabular numeric data, functions or some kinds of quality structure and provides different info.

A scatter plot, also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram, is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.

In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell beyond the formal modeling and thereby contrasts with traditional hypothesis testing, in which a model is supposed to be selected before the data is seen. Exploratory data analysis has been promoted by John Tukey since 1970 to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.

<span class="mw-page-title-main">Normal probability plot</span> Graphical technique in statistics

The normal probability plot is a graphical technique to identify substantive departures from normality. This includes identifying outliers, skewness, kurtosis, a need for transformations, and mixtures. Normal probability plots are made of raw data, residuals from model fits, and estimated parameters.

A stem-and-leaf display or stem-and-leaf plot is a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution. They evolved from Arthur Bowley's work in the early 1900s, and are useful tools in exploratory data analysis. Stemplots became more commonly used in the 1980s after the publication of John Tukey's book on exploratory data analysis in 1977. The popularity during those years is attributable to their use of monospaced (typewriter) typestyles that allowed computer technology of the time to easily produce the graphics. Modern computers' superior graphic capabilities have meant these techniques are less often used.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

<span class="mw-page-title-main">Data and information visualization</span> Visual representation of data

Data and information visualization is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items. Typically based on data and information collected from a certain domain of expertise, these visualizations are intended for a broader audience to help them visually explore and discover, quickly understand, interpret and gain important insights into otherwise difficult-to-identify structures, relationships, correlations, local and global patterns, trends, variations, constancy, clusters, outliers and unusual groupings within data. When intended for the general public to convey a concise version of known, specific information in a clear and engaging manner, it is typically called information graphics.

In statistics, a Q–Q plot (quantile–quantile plot) is a probability plot, a graphical method for comparing two probability distributions by plotting their quantiles against each other. A point $(x, y)$ on the plot corresponds to one of the quantiles of the second distribution ( $y$ -coordinate) plotted against the same quantile of the first distribution ( $x$ -coordinate). This defines a parametric curve where the parameter is the index of the quantile interval.

In statistics, the frequency or absolute frequency of an event $is the number of times the observation has occurred/been recorded in an experiment or study. These frequencies are often depicted graphically or tabular form.$

A dot chart or dot plot is a statistical chart consisting of data points plotted on a fairly simple scale, typically using filled in circles. There are two common, yet very different, versions of the dot chart. The first has been used in hand-drawn graphs to depict distributions going back to 1884. The other version is described by William S. Cleveland as an alternative to the bar chart, in which dots are used to depict the quantitative values associated with categorical variables.

Biplots are a type of exploratory graph used in statistics, a generalization of the simple two-variable scatterplot. A biplot overlays a score plot with a loading plot. A biplot allows information on both samples and variables of a data matrix to be displayed graphically. Samples are displayed as points while variables are displayed either as vectors, linear axes or nonlinear trajectories. In the case of categorical variables, category level points may be used to represent the levels of a categorical variable. A generalised biplot displays information on both continuous and categorical variables.

Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization.

In statistics, regression validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data. The validation process can involve analyzing the goodness of fit of the regression, analyzing whether the regression residuals are random, and checking whether the model's predictive performance deteriorates substantially when applied to data that were not used in model estimation.

In statistics, bivariate data is data on each of two variables, where each value of one of the variables is paired with a value of the other variable. It is a specific but very common case of multivariate data. The association can be studied via a tabular or graphical display, or via sample statistics which might be used for inference. Typically it would be of interest to investigate the possible association between the two variables. The method used to investigate the association would depend on the level of measurement of the variable. This association that involves exactly two variables can be termed a bivariate correlation, or bivariate association.

References

This article incorporates public domain material from the National Institute of Standards and Technology

1 2 3 NIST/SEMATECH (2003). "The Role of Graphics". In: e-Handbook of Statistical Methods 6 January 2003 (Date created).
↑ Altman DG, Bland JM (1983). "Measurement in medicine: the analysis of method comparison studies". The Statistician. 32 (3). Blackwell Publishing: 307–317. doi:10.2307/2987937. JSTOR 2987937.
↑ Bland JM, Altman DG (1986). "Statistical methods for assessing agreement between two methods of clinical measurement". Lancet. 1 (8476): 307–10. doi:10.1016/S0140-6736(86)90837-8. PMID 2868172. S2CID 2844897.
↑ Flegel WA, Srivastava K (2024). "40 years of researching the Del phenotype results in a change of transfusion practice". Transfusion. 64 (7). Wiley: 1187–1190. doi:10.1111/trf.17913.
↑ R. J. Light; D. B. Pillemer (1984). Summing up: The Science of Reviewing Research. Cambridge, Massachusetts.: Harvard University Press.
↑ M. Egger, G. Davey Smith, M. Schneider & C. Minder (September 1997). "Bias in meta-analysis detected by a simple, graphical test". BMJ . 315 (7109): 629–634. doi:10.1136/bmj.315.7109.629. PMC 2127453 . PMID 9310563.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Galbraith, Rex (1988). "Graphical display of estimates having differing standard errors". Technometrics. 30 (3). American Society for Quality: 271–281. doi:10.2307/1270081. JSTOR 1270081.
↑ Utts, Jessica M. Seeing Through Statistics 3rd Edition, Thomson Brooks/Cole, 2005, pp 166–167. ISBN 0-534-39402-7
↑ Theodore T. Allen (2010). Introduction to Engineering Statistics and Lean Sigma: Statistical Quality Control and Design of Experiments and Systems. Springer. p. 128. ISBN 978-1-84882-999-2 . Retrieved 2011-02-17.
↑ Hintze Jerry L.; Nelson Ray D. (1998). "Violin Plots: A Box Plot-Density Trace Synergism". The American Statistician. 52 (2): 181–84. doi:10.1080/00031305.1998.10480559.

External links

Dataplot gallery of some useful graphical techniques at itl.nist.gov.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[NIST03-1] 1 2 3 NIST/SEMATECH (2003). "The Role of Graphics". In: e-Handbook of Statistical Methods 6 January 2003 (Date created).

[Altman1983-2] Altman DG, Bland JM (1983). "Measurement in medicine: the analysis of method comparison studies". The Statistician. 32 (3). Blackwell Publishing: 307–317. doi:10.2307/2987937. JSTOR 2987937.

[Bland1986-3] Bland JM, Altman DG (1986). "Statistical methods for assessing agreement between two methods of clinical measurement". Lancet. 1 (8476): 307–10. doi:10.1016/S0140-6736(86)90837-8. PMID 2868172. S2CID 2844897.

[Flegel2024-4] Flegel WA, Srivastava K (2024). "40 years of researching the Del phenotype results in a change of transfusion practice". Transfusion. 64 (7). Wiley: 1187–1190. doi:10.1111/trf.17913.

[5] R. J. Light; D. B. Pillemer (1984). Summing up: The Science of Reviewing Research. Cambridge, Massachusetts.: Harvard University Press.

[6] M. Egger, G. Davey Smith, M. Schneider & C. Minder (September 1997). "Bias in meta-analysis detected by a simple, graphical test". BMJ . 315 (7109): 629–634. doi:10.1136/bmj.315.7109.629. PMC 2127453 . PMID 9310563.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[7] Galbraith, Rex (1988). "Graphical display of estimates having differing standard errors". Technometrics. 30 (3). American Society for Quality: 271–281. doi:10.2307/1270081. JSTOR 1270081.

[8] Utts, Jessica M. Seeing Through Statistics 3rd Edition, Thomson Brooks/Cole, 2005, pp 166–167. ISBN 0-534-39402-7

[allen-9] Theodore T. Allen (2010). Introduction to Engineering Statistics and Lean Sigma: Statistical Quality Control and Design of Experiments and Systems. Springer. p. 128. ISBN 978-1-84882-999-2 . Retrieved 2011-02-17.

[10] Hintze Jerry L.; Nelson Ray D. (1998). "Violin Plots: A Box Plot-Density Trace Synergism". The American Statistician. 52 (2): 181–84. doi:10.1080/00031305.1998.10480559.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]