Part of a series on statistics |
Probability theory |
---|
In mathematics and statistics, a quantitative variable may be continuous or discrete if they are typically obtained by measuring or counting , respectively. [1] If it can take on two particular real values such that it can also take on all real values between them (including values that are arbitrarily or infinitesimally close together), the variable is continuous in that interval. [2] If it can take on a value such that there is a non-infinitesimal gap on each side of it containing no values that the variable can take on, then it is discrete around that value. [3] In some contexts, a variable can be discrete in some ranges of the number line and continuous in others.
A continuous variable is a variable such that there are possible values between any two values.
For example, a variable over a non-empty range of the real numbers is continuous, if it can take on any value in that range. [4]
Methods of calculus are often used in problems in which the variables are continuous, for example in continuous optimization problems. [5]
In statistical theory, the probability distributions of continuous variables can be expressed in terms of probability density functions. [6]
In continuous-time dynamics, the variable time is treated as continuous, and the equation describing the evolution of some variable over time is a differential equation. [7] The instantaneous rate of change is a well-defined concept that takes the ratio of the change in the dependent variable to the independent variable at a specific instant.
In contrast, a variable is a discrete variable if and only if there exists a one-to-one correspondence between this variable and a subset of , the set of natural numbers. [8] In other words, a discrete variable over a particular interval of real values is one for which, for any value in the range that the variable is permitted to take on, there is a positive minimum distance to the nearest other permissible value. The value of a discrete variable can be obtained by counting, and the number of permitted values is either finite or countably infinite. Common examples are variables that must be integers, non-negative integers, positive integers, or only the integers 0 and 1. [9]
Methods of calculus do not readily lend themselves to problems involving discrete variables. Especially in multivariable calculus, many models rely on the assumption of continuity. [10] Examples of problems involving discrete variables include integer programming.
In statistics, the probability distributions of discrete variables can be expressed in terms of probability mass functions. [6]
In discrete time dynamics, the variable time is treated as discrete, and the equation of evolution of some variable over time is called a difference equation. [11] For certain discrete-time dynamical systems, the system response can be modelled by solving the difference equation for an analytical solution.
In econometrics and more generally in regression analysis, sometimes some of the variables being empirically related to each other are 0-1 variables, being permitted to take on only those two values. [12] The purpose of the discrete values of 0 and 1 is to use the dummy variable as a ‘switch’ that can ‘turn on’ and ‘turn off’ by assigning the two values to different parameters in an equation. A variable of this type is called a dummy variable. If the dependent variable is a dummy variable, then logistic regression or probit regression is commonly employed. In the case of regression analysis, a dummy variable can be used to represent subgroups of the sample in a study (e.g. the value 0 corresponding to a constituent of the control group). [13]
A mixed multivariate model can contain both discrete and continuous variables. For instance, a simple mixed multivariate model could have a discrete variable , which only takes on values 0 or 1, and a continuous variable . [14] An example of a mixed model could be a research study on the risk of psychological disorders based on one binary measure of psychiatric symptoms and one continuous measure of cognitive performance. [15] Mixed models may also involve a single variable that is discrete over some range of the number line and continuous at another range.
In probability theory and statistics, the probability distribution of a mixed random variable consists of both discrete and continuous components. A mixed random variable does not have a cumulative distribution function that is discrete or everywhere-continuous. An example of a mixed type random variable is the probability of wait time in a queue. The likelihood of a customer experiencing a zero wait time is discrete, while non-zero wait times are evaluated on a continuous time scale. [16] In physics (particularly quantum mechanics, where this sort of distribution often arises), dirac delta functions are often used to treat continuous and discrete components in a unified manner. For example, the previous example might be described by a probability density , such that , and .
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable , or just distribution function of , evaluated at , is the probability that will take a value less than or equal to .
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event.
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.
A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers to neither randomness nor variability but instead is a mathematical function in which
In descriptive statistics, the range of a set of data is size of the narrowest interval which contains all the data. It is calculated as the difference between the largest and smallest values. It is expressed in the same units as the data. The range provides an indication of statistical dispersion. Since it only depends on two of the observations, it is most useful in representing the dispersion of small data sets.
In probability theory and related fields, a stochastic or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Stochastic processes are widely used as mathematical models of systems and phenomena that appear to vary in a random manner. Examples include the growth of a bacterial population, an electrical current fluctuating due to thermal noise, or the movement of a gas molecule. Stochastic processes have applications in many disciplines such as biology, chemistry, ecology, neuroscience, physics, image processing, signal processing, control theory, information theory, computer science, and telecommunications. Furthermore, seemingly random changes in financial markets have motivated the extensive use of stochastic processes in finance.
In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete probability density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.
In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression estimates the parameters of a logistic model. In binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.
In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.
In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.
Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the marginal distributions, i.e. the distributions of each of the individual random variables and the conditional probability distributions, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s).
Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.
When classification is performed by a computer, statistical methods are normally used to develop the algorithm.
This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.
In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.
The term kernel is used in statistical analysis to refer to a window function. The term "kernel" has several distinct meanings in different branches of statistics.
In statistics, multivariate adaptive regression splines (MARS) is a form of regression analysis introduced by Jerome H. Friedman in 1991. It is a non-parametric regression technique and can be seen as an extension of linear models that automatically models nonlinearities and interactions between variables.
In mathematical dynamics, discrete time and continuous time are two alternative frameworks within which variables that evolve over time are modeled.
In statistics, linear regression is a model that estimates the linear relationship between a scalar response and one or more explanatory variables. A model with exactly one explanatory variable is a simple linear regression; a model with two or more explanatory variables is a multiple linear regression. This term is distinct from multivariate linear regression, which predicts multiple correlated dependent variables rather than a single dependent variable.