Statistics and Computing

Last updated


Related Research Articles

<span class="mw-page-title-main">Computer science</span> Study of computation

Computer science is the study of computation, information, and automation. Computer science spans theoretical disciplines to applied disciplines. Though more often considered an academic discipline, computer science is closely related to computer programming.

<span class="mw-page-title-main">Kolmogorov–Smirnov test</span> Non-parametric statistical test between two distributions

In statistics, the Kolmogorov–Smirnov test is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution, or to compare two samples. In essence, the test answers the question "How likely is it that we would see a collection of samples like this if they were drawn from that probability distribution?" or, in the second case, "How likely is it that we would see two sets of samples like this if they were drawn from the same probability distribution?". It is named after Andrey Kolmogorov and Nikolai Smirnov.

<span class="mw-page-title-main">Quantile</span> Statistical method of dividing data into equal-sized intervals for analysis

In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created. Common quantiles have special names, such as quartiles, deciles, and percentiles. The groups created are termed halves, thirds, quarters, etc., though sometimes the terms for the quantile are used for the groups created, rather than for the cut points.

<span class="mw-page-title-main">Correlation</span> Statistical concept

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

<span class="mw-page-title-main">Consumer price index</span> Statistic to indicate the change in typical household expenditure

A consumer price index (CPI) is a price index, the price of a weighted average market basket of consumer goods and services purchased by households. Changes in measured CPI track changes in prices over time. The CPI is calculated by using a representative basket of goods and services. The basket is updated periodically to reflect changes in consumer spending habits. The prices of the goods and services in the basket are collected monthly from a sample of retail and service establishments. The prices are then adjusted for changes in quality or features. Changes in the CPI can be used to track inflation over time and to compare inflation rates between different countries. The CPI is not a perfect measure of inflation or the cost of living, but it is a useful tool for tracking these economic indicators.

In economics, the GDP deflator is a measure of the money price of all new, domestically produced, final goods and services in an economy in a year relative to the real value of them. It can be used as a measure of the value of money. GDP stands for gross domestic product, the total monetary value of all final goods and services produced within the territory of a country over a particular period of time.

<span class="mw-page-title-main">Spearman's rank correlation coefficient</span> Nonparametric measure of rank correlation

In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function.

<span class="mw-page-title-main">Confidence interval</span> Range to estimate an unknown parameter

In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated confidence level; the 95% confidence level is most common, but other levels, such as 90% or 99%, are sometimes used. The confidence level, degree of confidence or confidence coefficient represents the long-run proportion of CIs that theoretically contain the true value of the parameter; this is tantamount to the nominal coverage probability. For example, out of all intervals computed at the 95% level, 95% of them should contain the parameter's true value.

<span class="mw-page-title-main">Theoretical computer science</span> Subfield of computer science and mathematics

Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, lambda calculus, and type theory.

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

In null-hypothesis significance testing, the p-value is the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Even though reporting p-values of statistical tests is common practice in academic publications of many quantitative fields, misinterpretation and misuse of p-values is widespread and has been a major topic in mathematics and metascience. In 2016, the American Statistician Association (ASA) made a formal statement that "p-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone" and that "a p-value, or statistical significance, does not measure the size of an effect or the importance of a result" or "evidence regarding a model or hypothesis." That said, a 2019 task force by ASA has issued a statement on statistical significance and replicability, concluding with: "p-values and significance tests, when properly applied and interpreted, increase the rigor of the conclusions drawn from data."

Computational science, also known as scientific computing, technical computing or scientific computation (SC), is a division of science that uses advanced computing capabilities to understand and solve complex physical problems. This includes

Donald Angus MacKenzie is a Professor of Sociology at the University of Edinburgh, Scotland. His work constitutes a crucial contribution to the field of science and technology studies. He has also developed research in the field of social studies of finance. He has undertaken widely cited work on the history of statistics, eugenics, nuclear weapons, computing and finance, among other things.

The Shapiro–Wilk test is a test of normality. It was published in 1965 by Samuel Sanford Shapiro and Martin Wilk.

In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical dependence based on the τ coefficient. It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897.

<span class="mw-page-title-main">Computational statistics</span> Interface between statistics and computer science

Computational statistics, or statistical computing, is the bond between statistics and computer science. It means statistical methods that are enabled by using computational methods. It is the area of computational science specific to the mathematical science of statistics. This area is also developing rapidly, leading to calls that a broader concept of computing should be taught as part of general statistical education.

In statistics, the t-statistic is the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error. It is used in hypothesis testing via Student's t-test. The t-statistic is used in a t-test to determine whether to support or reject the null hypothesis. It is very similar to the z-score but with the difference that t-statistic is used when the sample size is small or the population standard deviation is unknown. For example, the t-statistic is used in estimating the population mean from a sampling distribution of sample means if the population standard deviation is unknown. It is also used along with p-value when running hypothesis tests where the p-value tells us what the odds are of the results to have happened.

<span class="mw-page-title-main">Data science</span> Interdisciplinary field of study on deriving knowledge and insights from data

Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured, and unstructured data.

<span class="mw-page-title-main">Hadley Wickham</span> New Zealand statistician

Hadley Alexander Wickham is a New Zealand statistician known for his work on open-source software for the R statistical programming environment. He is the chief scientist at Posit, PBC and an adjunct professor of statistics at the University of Auckland, Stanford University, and Rice University. His work includes the data visualisation system ggplot2 and the tidyverse, a collection of R packages for data science based on the concept of tidy data.