# Descriptive statistics

Last updated

A descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, [1] while descriptive statistics (in the mass noun sense) is the process of using and analysing those statistics. Descriptive statistics is distinguished from inferential statistics (or inductive statistics) by its aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent. [2] This generally means that descriptive statistics, unlike inferential statistics, is not developed on the basis of probability theory, and are frequently non-parametric statistics. [3] Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented. [4] For example, in papers reporting on human subjects, typically a table is included giving the overall sample size, sample sizes in important subgroups (e.g., for each treatment or exposure group), and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex, the proportion of subjects with related co-morbidities, etc.

## Contents

Some measures that are commonly used to describe a data set are measures of central tendency and measures of variability or dispersion. Measures of central tendency include the mean, median and mode, while measures of variability include the standard deviation (or variance), the minimum and maximum values of the variables, kurtosis and skewness. [5]

## Use in statistical analysis

Descriptive statistics provide simple summaries about the sample and about the observations that have been made. Such summaries may be either quantitative, i.e. summary statistics, or visual, i.e. simple-to-understand graphs. These summaries may either form the basis of the initial description of the data as part of a more extensive statistical analysis, or they may be sufficient in and of themselves for a particular investigation.

For example, the shooting percentage in basketball is a descriptive statistic that summarizes the performance of a player or a team. This number is the number of shots made divided by the number of shots taken. For example, a player who shoots 33% is making approximately one shot in every three. The percentage summarizes or describes multiple discrete events. Consider also the grade point average. This single number describes the general performance of a student across the range of their course experiences. [6]

The use of descriptive and summary statistics has an extensive history and, indeed, the simple tabulation of populations and of economic data was the first way the topic of statistics appeared. More recently, a collection of summarisation techniques has been formulated under the heading of exploratory data analysis: an example of such a technique is the box plot.

In the business world, descriptive statistics provides a useful summary of many types of data. For example, investors and brokers may use a historical account of return behaviour by performing empirical and analytical analyses on their investments in order to make better investing decisions in the future.

### Univariate analysis

Univariate analysis involves describing the distribution of a single variable, including its central tendency (including the mean, median, and mode) and dispersion (including the range and quartiles of the data-set, and measures of spread such as the variance and standard deviation). The shape of the distribution may also be described via indices such as skewness and kurtosis. Characteristics of a variable's distribution may also be depicted in graphical or tabular format, including histograms and stem-and-leaf display.

### Bivariate and multivariate analysis

When a sample consists of more than one variable, descriptive statistics may be used to describe the relationship between pairs of variables. In this case, descriptive statistics include:

The main reason for differentiating univariate and bivariate analysis is that bivariate analysis is not only simple descriptive analysis, but also it describes the relationship between two different variables. [7] Quantitative measures of dependence include correlation (such as Pearson's r when both variables are continuous, or Spearman's rho if one or both are not) and covariance (which reflects the scale variables are measured on). The slope, in regression analysis, also reflects the relationship between variables. The unstandardised slope indicates the unit change in the criterion variable for a one unit change in the predictor. The standardised slope indicates this change in standardised (z-score) units. Highly skewed data are often transformed by taking logarithms. Use of logarithms makes graphs more symmetrical and look more similar to the normal distribution, making them easier to interpret intuitively. [8] :47

## Related Research Articles

In probability theory and statistics, kurtosis is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosis describes the shape of a probability distribution and there are different ways of quantifying it for a theoretical distribution and corresponding ways of estimating it from a sample from a population. Different measures of kurtosis may have different interpretations.

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

A statistic (singular) or sample statistic is any quantity computed from values in a sample which is considered for a statistical purpose. Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypothesis. The average of sample values is a statistic. The term statistic is used both for the function and for the value of the function on a given sample. When a statistic is being used for a specific purpose, it may be referred to by a name indicating its purpose.

In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects or a hypothetical and potentially infinite group of objects conceived as a generalization from experience. A common aim of statistical analysis is to produce information about some chosen population.

In descriptive statistics, summary statistics are used to summarize a set of observations, in order to communicate the largest amount of information as simply as possible. Statisticians commonly try to describe the observations in

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

In statistics, a frequency distribution is a list, table or graph that displays the frequency of various outcomes in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval.

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

The following is a glossary of terms used in the mathematical sciences statistics and probability.

In mathematics, a univariate object is an expression, equation, function or polynomial involving only one variable. Objects involving more than one variable are multivariate. In some cases the distinction between the univariate and multivariate cases is fundamental; for example, the fundamental theorem of algebra and Euclid's algorithm for polynomials are fundamental properties of univariate polynomials that cannot be generalized to multivariate polynomials.

In statistics, the mid-range or mid-extreme is a measure of central tendency of a sample (statistics) defined as the arithmetic mean of the maximum and minimum values of the data set:

In statistics, normality tests are used to determine if a data set is well-modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed.

In statistics, the concept of the shape of a probability distribution arises in questions of finding an appropriate distribution to use to model the statistical properties of a population, given a sample from that population. The shape of a distribution may be considered either descriptively, using terms such as "J-shaped", or numerically, using quantitative measures such as skewness and kurtosis.

In statistics, the term higher-order statistics (HOS) refers to functions which use the third or higher power of a sample, as opposed to more conventional techniques of lower-order statistics, which use constant, linear, and quadratic terms. The third and higher moments, as used in the skewness and kurtosis, are examples of HOS, whereas the first and second moments, as used in the arithmetic mean (first), and variance (second) are examples of low-order statistics. HOS are particularly used in estimation of shape parameters, such as skewness and kurtosis, as when measuring the deviation of a distribution from the normal distribution. On the other hand, due to the higher powers, HOS are significantly less robust than lower-order statistics.

In statistics, bivariate data is data on each of two variables, where each value of one of the variables is paired with a value of the other variable. Typically it would be of interest to investigate the possible association between the two variables. The association can be studied via a tabular or graphical display, or via sample statistics which might be used for inference. The method used to investigate the association would depend on the level of measurement of the variable.

In statistics, L-moments are a sequence of statistics used to summarize the shape of a probability distribution. They are linear combinations of order statistics (L-statistics) analogous to conventional moments, and can be used to calculate quantities analogous to standard deviation, skewness and kurtosis, termed the L-scale, L-skewness and L-kurtosis respectively. Standardised L-moments are called L-moment ratios and are analogous to standardized moments. Just as for conventional moments, a theoretical distribution has a set of population L-moments. Sample L-moments can be defined for a sample from the population, and can be used as estimators of the population L-moments.

Univariate analysis is perhaps the simplest form of statistical analysis. Like other forms of statistics, it can be inferential or descriptive. The key fact is that only one variable is involved.

A correlation coefficient is a numerical measure of some type of correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two components of a multivariate random variable with a known distribution.

Bivariate analysis is one of the simplest forms of quantitative (statistical) analysis. It involves the analysis of two variables, for the purpose of determining the empirical relationship between them.

Univariate is a term commonly used in statistics to describe a type of data which consists of observations on only a single characteristic or attribute. A simple example of univariate data would be the salaries of workers in industry. Like all the other data, univariate data can be visualized using graphs, images or other analysis tools after the data is measured, collected, reported, and analyzed.

## References

1. Mann, Prem S. (1995). Introductory Statistics (2nd ed.). Wiley. ISBN   0-471-31009-3.
2. "Drawing Conclusions From Data: Descriptive Statistics, Inferential Statistics, and Hypothesis Testing", Interpreting and Using Statistics in Psychological Research, 2455 Teller Road, Thousand Oaks California 91320: SAGE Publications, Inc, pp. 145–183, 2017, doi:10.4135/9781506304144.n6, ISBN   978-1-5063-0416-8 , retrieved 2021-06-01CS1 maint: location (link)
3. Dodge, Y. (2003). . OUP. ISBN   0-19-850994-4.
4. "Drawing Conclusions From Data: Descriptive Statistics, Inferential Statistics, and Hypothesis Testing", Interpreting and Using Statistics in Psychological Research, 2455 Teller Road, Thousand Oaks California 91320: SAGE Publications, Inc, pp. 145–183, 2017, doi:10.4135/9781506304144.n6, ISBN   978-1-5063-0416-8 , retrieved 2021-06-01CS1 maint: location (link)
5. Investopedia, Descriptive Statistics Terms
6. Trochim, William M. K. (2006). "Descriptive statistics". Research Methods Knowledge Base. Retrieved 14 March 2011.
7. Babbie, Earl R. (2009). (12th ed.). Wadsworth. pp.  436–440. ISBN   978-0-495-59841-1.
8. Nick, Todd G. (2007). "Descriptive Statistics". Topics in Biostatistics. Methods in Molecular Biology. 404. New York: Springer. pp. 33–52. doi:10.1007/978-1-59745-530-5_3. ISBN   978-1-58829-531-6. PMID   18450044.