Fan chart (statistics)

Last updated
A dispersion fan diagram (left) in comparison with a box plot Dispersionfan+boxplot-en.pdf
A dispersion fan diagram (left) in comparison with a box plot

A fan chart is made of a group of dispersion fan diagrams, which may be positioned according to two categorising dimensions. A dispersion fan diagram is a circular diagram which reports the same information about a dispersion as a box plot: namely median, quartiles, and two extreme values.

Box plot method for graphically depicting groups of numerical data through their quartiles

In descriptive statistics, a box plot or boxplot is a method for graphically depicting groups of numerical data through their quartiles. Box plots may also have lines extending vertically from the boxes (whiskers) indicating variability outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-and-whisker diagram. Outliers may be plotted as individual points. Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution. The spacings between the different parts of the box indicate the degree of dispersion (spread) and skewness in the data, and show outliers. In addition to the points themselves, they allow one to visually estimate various L-estimators, notably the interquartile range, midhinge, range, mid-range, and trimean. Box plots can be drawn either horizontally or vertically. Box plots received their name from the box in the middle.

Median quantile

The median is the value separating the higher half from the lower half of a data sample. For a data set, it may be thought of as the "middle" value. For example, in the data set {1, 3, 3, 6, 7, 8, 9}, the median is 6, the fourth largest, and also the fifth smallest, number in the sample. For a continuous probability distribution, the median is the value such that a number is equally likely to fall above or below it.

A quartile is a type of quantile. The first quartile (Q1) is defined as the middle number between the smallest number and the median of the data set. The second quartile (Q2) is the median of the data. The third quartile (Q3) is the middle value between the median and the highest value of the data set.

Contents

Elements

The elements of a dispersion fan diagram [1] are:

  1. a circular line as scale
  2. a diameter which indicates the median
  3. a fan (a segment of a circle) which indicates the quartiles
  4. two feathers which indicate the extreme values.

The scale on the circular line begins at the left with the starting value (e. g. with zero). The following values are applicated clockwise. The white tail of diameter indicates the median. The dark fan indicates the dispersion of the middle half of the observed values; thus it encompasses the values from the first to the third quartile. The white feathers indicate the dispersion of the middle 90% of the observed values.

The length of the white part of the diameter corresponds with the number of observations.

Application

A fan chart gives a quick summary of observed values which depend from two variables. This is possible thanks of a dense representation and a constant size which does not depend on the size of the single dispersion fan diagrams.

An essential advantage compared to a sequence of box plots is the possibility to compare dispersion fan diagrams not only within one direction but within two directions (horizontally and vertically).

Example

The following example presents data from the data set MathAchieve which is part of the R package nlme of José Pinheiro et al. [2] It contains mathematics achievement scores of 7185 students. The students are categorised according to sex and membership of a minority ethnic group.

7185 mathematics achievement scores: Results according to sex and membership of a minority ethical group Fanchart-mathachieve-0912-en.gif
7185 mathematics achievement scores: Results according to sex and membership of a minority ethical group

The graphics show the mathematics achievement scores in dependency on the socio-economic status of the students (x axis) and on the average socio-economic status of all students at the same school (y axis). The four graphic panels differentiate the students according to sex and membership of a minority ethnic group.

The fan charts reveals clearly how the median value is partially following a big main tendency while the values of the single subgroups (with the cells) scatter largely what could lead to doubts about a possible correlation.

See also

Related Research Articles

Abbe number material dispersion property

In optics and lens design, the Abbe number, also known as the V-number or constringence of a transparent material, is a measure of the material's dispersion, with high values of V indicating low dispersion. It is named after Ernst Abbe (1840–1905), the German physicist who defined it.

A descriptive statistic is a summary statistic that quantitatively describes or summarizes features of a collection of information, while descriptive statistics in the mass noun sense is the process of using and analyzing those statistics. Descriptive statistics is distinguished from inferential statistics, in that descriptive statistics aims to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. This generally means that descriptive statistics, unlike inferential statistics, is not developed on the basis of probability theory, and are frequently nonparametric statistics. Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. For example, in papers reporting on human subjects, typically a table is included giving the overall sample size, sample sizes in important subgroups, and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex, the proportion of subjects with related comorbidities, etc.

Interquartile range measure of statistical dispersion

In descriptive statistics, the interquartile range (IQR), also called the midspread or middle 50%, or technically H-spread, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles, IQR = Q3 − Q1. In other words, the IQR is the first quartile subtracted from the third quartile; these quartiles can be clearly seen on a box plot on the data. It is a trimmed estimator, defined as the 25% trimmed range, and is a commonly used robust measure of scale.

Quantile cutpoint dividing a set of observations into equal sized groups

In statistics and probability quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one less quantile than the number of groups created. Thus quartiles are the three cut points that will divide a dataset into four equal-sized groups. Common quantiles have special names: for instance quartile, decile. The groups created are termed halves, thirds, quarters, etc., though sometimes the terms for the quantile are used for the groups created, rather than for the cut points.

Chart graphical representation of data

A chart is a graphical representation of data, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can represent tabular numeric data, functions or some kinds of qualitative structure and provides different info.

In fluid dynamics, the Darcy–Weisbach equation is an empirical equation, which relates the head loss, or pressure loss, due to friction along a given length of pipe to the average velocity of the fluid flow for an incompressible fluid. The equation is named after Henry Darcy and Julius Weisbach.

A percentile is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations falls. For example, the 20th percentile is the value below which 20% of the observations may be found.

In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis was promoted by John Tukey to encourage statisticians to explore the data, and possibly formulate hypotheses that could lead to new data collection and experiments. EDA is different from initial data analysis (IDA), which focuses more narrowly on checking assumptions required for model fitting and hypothesis testing, and handling missing values and making transformations of variables as needed. EDA encompasses IDA.

Wilhelm Lexis German statistician and economist

Wilhelm Lexis, full name Wilhelm Hector Richard Albrecht Lexis, was a German statistician, economist, and social scientist. The Oxford Dictionary of Statistics cites him as a "pioneer of the analysis of demographic time series". Lexis is largely remembered for two items that bear his name—the Lexis ratio and the Lexis diagram.

Pie chart circular statistical graphic

A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice, is proportional to the quantity it represents. While it is named for its resemblance to a pie which has been sliced, there are variations on the way it can be presented. The earliest known pie chart is generally credited to William Playfair's Statistical Breviary of 1801.

Most of the terms listed in Wikipedia glossaries are already defined and explained within Wikipedia itself. However, glossaries like this one are useful for looking up, comparing and reviewing large numbers of terms together. You can help enhance this page by adding new terms or writing definitions for existing ones.

Radar chart chart displaying multivariate data with values represented on axes starting from the same point

A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point. The relative position and angle of the axes is typically uninformative.

In mathematics and statistics, deviation is a measure of difference between the observed value of a variable and some other value, often that variable's mean. The sign of the deviation reports the direction of that difference. The magnitude of the value indicates the size of the difference.

Minkowski diagram

The Minkowski diagram, also known as a spacetime diagram, was developed in 1908 by Hermann Minkowski and provides an illustration of the properties of space and time in the special theory of relativity. It allows a qualitative understanding of the corresponding phenomena like time dilation and length contraction without mathematical equations.

Moody chart

In engineering, the Moody chart or Moody diagram is a graph in non-dimensional form that relates the Darcy-Weisbach friction factor fD, Reynolds number Re, and surface roughness for fully developed flow in a circular pipe. It can be used to predict pressure drop or flow rate down such a pipe.

Plot (graphics) graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables

A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a mechanical or electronic plotter. Graphs are a visual representation of the relationship between variables, very useful for humans who can quickly derive an understanding which would not come from lists of values. Graphs can also be used to read off the value of an unknown variable plotted as a function of a known one. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.

Rattle GUI is a free and open source software package providing a graphical user interface (GUI) for data mining using the R statistical programming language. Rattle is used in a variety of situations. Currently there are 15 different government departments in Australia, in addition to various other organisations around the world, which use Rattle in their data mining activities and as a statistical package.

The following comparison of Adobe Flex charts provides charts classification, compares Flex chart products for different chart type availability and for different visual features like 3D versions of charts.

References

  1. Fischer, Wolfram (2010). Neue Grafiken zur Datenvisualisierung. Band 1. Speichengrafiken, Streuungsfächerkarten, Differenz-, Sequenz- und Wechseldiagramme [New Graphics For Data Visualisation. Volume 1. Spoke Plots, Fan Charts, Difference, Sequence and Change Diagrams]. Wolfertswil: ZIM. ISBN   978-3-905764-06-2.
  2. Pinheiro, José; Bates, Douglas; et al. (2013) [1999]. "nlme: Linear and Nonlinear Mixed Effects Models". CRAN (The Comprehensive R Archive Network).