Plot (graphics)

Last updated
Scatterplot of the eruption interval for Old Faithful (a geyser) Oldfaithful3.png
Scatterplot of the eruption interval for Old Faithful (a geyser)

A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values. Given a scale or ruler, graphs can also be used to read off the value of an unknown variable plotted as a function of a known one, but this can also be done with data presented in tabular form. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.

Contents

Overview

Plots play an important role in statistics and data analysis. The procedures here can broadly be split into two parts: quantitative and graphical. Quantitative techniques are a set of statistical procedures that yield numeric or tabular output. Examples of quantitative techniques include: [1]

These and similar techniques are all valuable and are mainstream in terms of classical analysis. There are also many statistical tools generally referred to as graphical techniques. These include: [1]

Graphical procedures such as plots are a short path to gaining insight into a data set in terms of testing assumptions, model selection, model validation, estimator selection, relationship identification, factor effect determination, outlier detection. Statistical graphics give insight into aspects of the underlying structure of the data. [1]

Graphs can also be used to solve some mathematical equations, typically by finding where two plots intersect.

Types of plots

showing on a horizontal axis and on a vertical axis, where is a phase space trajectory.

Plots for specific quantities

3D plots

Examples

Types of graphs and their uses vary very widely. A few typical examples are:

See also

Related Research Articles

<span class="mw-page-title-main">Descriptive statistics</span> Type of statistics

A descriptive statistic is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics is the process of using and analysing those statistics. Descriptive statistics is distinguished from inferential statistics by its aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. This generally means that descriptive statistics, unlike inferential statistics, is not developed on the basis of probability theory, and are frequently nonparametric statistics. Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example, in papers reporting on human subjects, typically a table is included giving the overall sample size, sample sizes in important subgroups, and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex, the proportion of subjects with related co-morbidities, etc.

<span class="mw-page-title-main">Interquartile range</span> Measure of statistical dispersion

In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the difference between the 75th and 25th percentiles of the data. To calculate the IQR, the data set is divided into quartiles, or four rank-ordered even parts via linear interpolation. These quartiles are denoted by Q1 (also called the lower quartile), Q2 (the median), and Q3 (also called the upper quartile). The lower quartile corresponds with the 25th percentile and the upper quartile corresponds with the 75th percentile, so IQR = Q3 − Q1.

<span class="mw-page-title-main">Box plot</span> Data visualization

In descriptive statistics, a box plot or boxplot is a method for demonstrating graphically the locality, spread and skewness groups of numerical data through their quartiles. In addition to the box on a box plot, there can be lines extending from the box indicating variability outside the upper and lower quartiles, thus, the plot is also called the box-and-whisker plot and the box-and-whisker diagram. Outliers that differ significantly from the rest of the dataset may be plotted as individual points beyond the whiskers on the box-plot. Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution. The spacings in each subsection of the box-plot indicate the degree of dispersion (spread) and skewness of the data, which are usually described using the five-number summary. In addition, the box-plot allows one to visually estimate various L-estimators, notably the interquartile range, midhinge, range, mid-range, and trimean. Box plots can be drawn either horizontally or vertically.

<span class="mw-page-title-main">Chart</span> Graphical representation of data

A chart is a graphical representation for data visualization, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can represent tabular numeric data, functions or some kinds of quality structure and provides different info.

<span class="mw-page-title-main">Scatter plot</span> Plot using the dispersal of scattered dots to show the relationship between variables

A scatter plot, also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram, is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.

In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling and thereby contrasts traditional hypothesis testing. Exploratory data analysis has been promoted by John Tukey since 1970 to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.

<span class="mw-page-title-main">Normal probability plot</span> Graphical technique in statistics

The normal probability plot is a graphical technique to identify substantive departures from normality. This includes identifying outliers, skewness, kurtosis, a need for transformations, and mixtures. Normal probability plots are made of raw data, residuals from model fits, and estimated parameters.

A stem-and-leaf display or stem-and-leaf plot is a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution. They evolved from Arthur Bowley's work in the early 1900s, and are useful tools in exploratory data analysis. Stemplots became more commonly used in the 1980s after the publication of John Tukey's book on exploratory data analysis in 1977. The popularity during those years is attributable to their use of monospaced (typewriter) typestyles that allowed computer technology of the time to easily produce the graphics. Modern computers' superior graphic capabilities have meant these techniques are less often used.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

<span class="mw-page-title-main">Data and information visualization</span> Visual representation of data

Data and information visualization is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items. Typically based on data and information collected from a certain domain of expertise, these visualizations are intended for a broader audience to help them visually explore and discover, quickly understand, interpret and gain important insights into otherwise difficult-to-identify structures, relationships, correlations, local and global patterns, trends, variations, constancy, clusters, outliers and unusual groupings within data. When intended for the general public to convey a concise version of known, specific information in a clear and engaging manner, it is typically called information graphics.

<span class="mw-page-title-main">Q–Q plot</span> Plot of the empirical distribution of p-values against the theoretical one

In statistics, a Q–Q plot (quantile–quantile plot) is a probability plot, a graphical method for comparing two probability distributions by plotting their quantiles against each other. A point (x, y) on the plot corresponds to one of the quantiles of the second distribution (y-coordinate) plotted against the same quantile of the first distribution (x-coordinate). This defines a parametric curve where the parameter is the index of the quantile interval.

In statistics, the frequency or absolute frequency of an event is the number of times the observation has occurred/recorded in an experiment or study. These frequencies are often depicted graphically or in tabular form.

A dot chart or dot plot is a statistical chart consisting of data points plotted on a fairly simple scale, typically using filled in circles. There are two common, yet very different, versions of the dot chart. The first has been used in hand-drawn graphs to depict distributions going back to 1884. The other version is described by William S. Cleveland as an alternative to the bar chart, in which dots are used to depict the quantitative values associated with categorical variables.

GGobi is a free statistical software tool for interactive data visualization. GGobi allows extensive exploration of the data with Interactive dynamic graphics. It is also a tool for looking at multivariate data. R can be used in sync with GGobi. The GGobi software can be embedded as a library in other programs and program packages using an application programming interface (API) or as an add-on to existing languages and scripting environments, e.g., with the R command line or from a Perl or Python scripts. GGobi prides itself on its ability to link multiple graphs together.

<span class="mw-page-title-main">Biplot</span> Type of exploratory graph used in statistics

Biplots are a type of exploratory graph used in statistics, a generalization of the simple two-variable scatterplot. A biplot overlays a score plot with a loading plot. A biplot allows information on both samples and variables of a data matrix to be displayed graphically. Samples are displayed as points while variables are displayed either as vectors, linear axes or nonlinear trajectories. In the case of categorical variables, category level points may be used to represent the levels of a categorical variable. A generalised biplot displays information on both continuous and categorical variables.

Statistical graphics, also known as statistical graphical techniques, are graphics used in the field of statistics for data visualization.

In statistics, regression validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data. The validation process can involve analyzing the goodness of fit of the regression, analyzing whether the regression residuals are random, and checking whether the model's predictive performance deteriorates substantially when applied to data that were not used in model estimation.

In statistics, bivariate data is data on each of two variables, where each value of one of the variables is paired with a value of the other variable. It is a specific but very common case of multivariate data. The association can be studied via a tabular or graphical display, or via sample statistics which might be used for inference. Typically it would be of interest to investigate the possible association between the two variables. The method used to investigate the association would depend on the level of measurement of the variable. This association that involves exactly two variables can be termed a bivariate correlation, or bivariate association.

References

PD-icon.svg This article incorporates public domain material from the National Institute of Standards and Technology

  1. 1 2 3 NIST/SEMATECH (2003). "The Role of Graphics". In: e-Handbook of Statistical Methods 6 January 2003 (Date created).
  2. Altman DG, Bland JM (1983). "Measurement in medicine: the analysis of method comparison studies". The Statistician. 32 (3). Blackwell Publishing: 307–317. doi:10.2307/2987937. JSTOR   2987937.
  3. Bland JM, Altman DG (1986). "Statistical methods for assessing agreement between two methods of clinical measurement". Lancet. 1 (8476): 307–10. doi:10.1016/S0140-6736(86)90837-8. PMID   2868172. S2CID   2844897.
  4. Flegel WA, Srivastava K (2024). "40 years of researching the Del phenotype results in a change of transfusion practice". Transfusion. 64 (7). Wiley: 1187–1190. doi:10.1111/trf.17913.
  5. 1 2 Simionescu, P.A. (2014). Computer Aided Graphing and Simulation Tools for AutoCAD Users (1st ed.). Boca Raton, FL: CRC Press. ISBN   978-1-4822-5290-3.
  6. R. J. Light; D. B. Pillemer (1984). Summing up: The Science of Reviewing Research. Cambridge, Massachusetts.: Harvard University Press.
  7. M. Egger, G. Davey Smith, M. Schneider & C. Minder (September 1997). "Bias in meta-analysis detected by a simple, graphical test". BMJ . 315 (7109): 629–634. doi:10.1136/bmj.315.7109.629. PMC   2127453 . PMID   9310563.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  8. Galbraith, Rex (1988). "Graphical display of estimates having differing standard errors". Technometrics. 30 (3). American Society for Quality: 271–281. doi:10.2307/1270081. JSTOR   1270081.
  9. Utts, Jessica M. Seeing Through Statistics 3rd Edition, Thomson Brooks/Cole, 2005, pp 166–167. ISBN   0-534-39402-7
  10. Theodore T. Allen (2010). Introduction to Engineering Statistics and Lean Sigma: Statistical Quality Control and Design of Experiments and Systems. Springer. p. 128. ISBN   978-1-84882-999-2 . Retrieved 2011-02-17.
  11. Hintze Jerry L.; Nelson Ray D. (1998). "Violin Plots: A Box Plot-Density Trace Synergism". The American Statistician. 52 (2): 181–84. doi:10.1080/00031305.1998.10480559.