Line chart

Last updated
Line chart showing the population of the town of Pushkin, Saint Petersburg from 1800 to 2010, measured at various intervals Pushkin population history.svg
Line chart showing the population of the town of Pushkin, Saint Petersburg from 1800 to 2010, measured at various intervals

A line chart or line graph, also known as curve chart, [1] is a type of chart that displays information as a series of data points called 'markers' connected by straight line segments. [2] It is a basic type of chart common in many fields. It is similar to a scatter plot except that the measurement points are ordered (typically by their x-axis value) and joined with straight line segments. A line chart is often used to visualize a trend in data over intervals of time – a time series – thus the line is often drawn chronologically. In these cases they are known as run charts.

Contents

History

Some of the earliest known line charts are generally credited to Francis Hauksbee, Nicolaus Samuel Cruquius, Johann Heinrich Lambert and the Scottish engineer William Playfair. [3] Line charts often display time as a variable on the x-axis. Playfair was one of the first to visualize data this way. In 1786, he plotted ten years of money spent by the Royal Navy. He supplemented the chart with a detailed description, telling his readers how to interpret the change over time because they were unfamiliar with this form of abstract visualization. In addition to line charts, Playfair invented and popularized bar charts and pie charts. [4]

Example

In the experimental sciences, data collected from experiments are often visualized by a graph. For example, if one collects data on the speed of an object at certain points in time, one can visualize the data in a data table such as the following:

Graph of speed versus time ScientificGraphSpeedVsTime.svg
Graph of speed versus time
Speed (ms−1)
0.01
0.03
0.05
0.08
0.12
0.16
0.18

Such a table representation of data is a great way to display exact values, but it can prevent the discovery and understanding of patterns in the values. In addition, a table display is often erroneously considered to be an objective, neutral collection or storage of the data (and may in that sense even be erroneously considered to be the data itself) whereas it is in fact just one of various possible visualizations of the data.

Understanding the process described by the data in the table is aided by producing a graph or line chart of speed versus time. Such a visualisation appears in the figure to the right. This visualization can let the viewer quickly understand the entire process at a glance.

This visualization can however be misunderstood, especially when expressed as showing the mathematical function that expresses the speed (the dependent variable) as a function of time . This can be misunderstood as showing speed to be a variable that is dependent only on time. This would however only be true in the case of an object being acted on only by a constant force acting in a vacuum.

Best-fit

A best-fit line chart (simple linear regression) Okuns law quarterly differences.svg
A best-fit line chart (simple linear regression)
A parody line graph (1919) by William Addison Dwiggins. Dwiggins graph.jpg
A parody line graph (1919) by William Addison Dwiggins.

Charts often include an overlaid mathematical function depicting the best-fit trend of the scattered data. This layer is referred to as a best-fit layer and the graph containing this layer is often referred to as a line graph.

It is simple to construct a "best-fit" layer consisting of a set of line segments connecting adjacent data points; however, such a "best-fit" is usually not an ideal representation of the trend of the underlying scatter data for the following reasons:

  1. It is highly improbable that the discontinuities in the slope of the best-fit would correspond exactly with the positions of the measurement values.
  2. It is highly unlikely that the experimental error in the data is negligible, yet the curve falls exactly through each of the data points.

In either case, the best-fit layer can reveal trends in the data. Further, measurements such as the gradient or the area under the curve can be made visually, leading to more conclusions or results from the data table.

A true best-fit layer should depict a continuous mathematical function whose parameters are determined by using a suitable error-minimization scheme, which appropriately weights the error in the data values. Such curve fitting functionality is often found in graphing software or spreadsheets. Best-fit curves may vary from simple linear equations to more complex quadratic, polynomial, exponential, and periodic curves. [5]

See also

Related Research Articles

A histogram is a visual representation of the distribution of quantitative data. To construct a histogram, the first step is to "bin" the range of values— divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) are adjacent and are typically of equal size.

<span class="mw-page-title-main">Interpolation</span> Method for estimating new data within known data points

In the mathematical field of numerical analysis, interpolation is a type of estimation, a method of constructing (finding) new data points based on the range of a discrete set of known data points.

<span class="mw-page-title-main">Chart</span> Graphical representation of data

A chart is a graphical representation for data visualization, in which "the data is represented by symbols, such as bars in a bar chart, lines in a line chart, or slices in a pie chart". A chart can represent tabular numeric data, functions or some kinds of quality structure and provides different info.

<span class="mw-page-title-main">Bar chart</span> Type of chart

A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a column chart.

<span class="mw-page-title-main">Time series</span> Sequence of data points over time

In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

<span class="mw-page-title-main">Scatter plot</span> Plot using the dispersal of scattered dots to show the relationship between variables

A scatter plot, also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram, is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.

In mathematics, a piecewise linear or segmented function is a real-valued function of a real variable, whose graph is composed of straight-line segments.

Linear trend estimation is a statistical technique used to analyze data patterns. Data patterns, or trends, occur when the information gathered tends to increase or decrease over time or is influenced by changes in an external factor. Linear trend estimation essentially creates a straight line on a graph of data that models the general direction that the data is heading.

<span class="mw-page-title-main">Curve fitting</span> Process of constructing a curve that has the best fit to a series of data points

Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function is constructed that approximately fits the data. A related topic is regression analysis, which focuses more on questions of statistical inference such as how much uncertainty is present in a curve that is fitted to data observed with random errors. Fitted curves can be used as an aid for data visualization, to infer values of a function where no data are available, and to summarize the relationships among two or more variables. Extrapolation refers to the use of a fitted curve beyond the range of the observed data, and is subject to a degree of uncertainty since it may reflect the method used to construct the curve as much as it reflects the observed data.

<span class="mw-page-title-main">Contour line</span> Curve along which a 3-D surface is at equal elevation

A contour line of a function of two variables is a curve along which the function has a constant value, so that the curve joins points of equal value. It is a plane section of the three-dimensional graph of the function parallel to the -plane. More generally, a contour line for a function of two variables is a curve connecting points where the function has the same particular value.

<span class="mw-page-title-main">Pie chart</span> Circular statistical graph that illustrates numerical proportion

A pie chart is a circular statistical graphic which is divided into slices to illustrate numerical proportion. In a pie chart, the arc length of each slice is proportional to the quantity it represents. While it is named for its resemblance to a pie which has been sliced, there are variations on the way it can be presented. The earliest known pie chart is generally credited to William Playfair's Statistical Breviary of 1801.

<span class="mw-page-title-main">Regression analysis</span> Set of statistical processes for estimating the relationships among variables

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more error-free independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

<span class="mw-page-title-main">Infographic</span> Graphic visual representation of information

Infographic are graphic visual representations of information, data, or knowledge intended to present information quickly and clearly. They can improve cognition by using graphics to enhance the human visual system's ability to see patterns and trends. Similar pursuits are information visualization, data visualization, statistical graphics, information design, or information architecture. Infographics have evolved in recent years to be for mass communication, and thus are designed with fewer assumptions about the readers' knowledge base than other types of visualizations. Isotypes are an early example of infographics conveying information quickly and easily to the masses.

<span class="mw-page-title-main">Nonlinear regression</span> Regression analysis

In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fitted by a method of successive approximations (iterations).

Random sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates. Therefore, it also can be interpreted as an outlier detection method. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. The algorithm was first published by Fischler and Bolles at SRI International in 1981. They used RANSAC to solve the Location Determination Problem (LDP), where the goal is to determine the points in the space that project onto an image into a set of landmarks with known locations.

<span class="mw-page-title-main">Data and information visualization</span> Visual representation of data

Data and information visualization is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items. Typically based on data and information collected from a certain domain of expertise, these visualizations are intended for a broader audience to help them visually explore and discover, quickly understand, interpret and gain important insights into otherwise difficult-to-identify structures, relationships, correlations, local and global patterns, trends, variations, constancy, clusters, outliers and unusual groupings within data. When intended for the general public to convey a concise version of known, specific information in a clear and engaging manner, it is typically called information graphics.

<span class="mw-page-title-main">Radar chart</span> Type of chart

A radar chart is a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point. The relative position and angle of the axes is typically uninformative, but various heuristics, such as algorithms that plot data as the maximal total area, can be applied to sort the variables (axes) into relative positions that reveal distinct correlations, trade-offs, and a multitude of other comparative measures.

<span class="mw-page-title-main">Data transformation (statistics)</span> Application of a function to each point in a data set

In statistics, data transformation is the application of a deterministic mathematical function to each point in a data set—that is, each data point zi is replaced with the transformed value yi = f(zi), where f is a function. Transforms are usually applied so that the data appear to more closely meet the assumptions of a statistical inference procedure that is to be applied, or to improve the interpretability or appearance of graphs.

<span class="mw-page-title-main">Plot (graphics)</span> Graphical technique for data sets

A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values. Given a scale or ruler, graphs can also be used to read off the value of an unknown variable plotted as a function of a known one, but this can also be done with data presented in tabular form. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

References

  1. Spear, Mary Eleanor (1952). Charting Statistics. New York: McGraw-Hill. p. 41. OCLC   166502.
  2. Burton G. Andreas (1965). Experimental psychology. p.186
  3. Michael Friendly (2008). "Milestones in the history of thematic cartography, statistical graphics, and data visualization". pp 13–14. Retrieved 7 July 2008.
  4. Fry, Hannah (2021-06-14). "When Graphs Are a Matter of Life and Death". The New Yorker. ISSN   0028-792X . Retrieved 2024-11-11.
  5. Curve fitting. 2023.{{cite book}}: |work= ignored (help)