Continuous or discrete variable

Last updated
Variables can be divided into two main categories: qualitative (categorical) and quantitative (numerical). Continuous and discrete variables are subcategories of quantitative variables. Note that this schematic is not exhaustive in terms of the types of variables. Continuous and discrete variables.png
Variables can be divided into two main categories: qualitative (categorical) and quantitative (numerical). Continuous and discrete variables are subcategories of quantitative variables. Note that this schematic is not exhaustive in terms of the types of variables.

In mathematics and statistics, a quantitative variable may be continuous or discrete if they are typically obtained by measuring or counting , respectively. [1] If it can take on two particular real values such that it can also take on all real values between them (including values that are arbitrarily or infinitesimally close together), the variable is continuous in that interval. [2] If it can take on a value such that there is a non-infinitesimal gap on each side of it containing no values that the variable can take on, then it is discrete around that value. [3] In some contexts, a variable can be discrete in some ranges of the number line and continuous in others.

Contents

Continuous variable

A continuous variable is a variable such that there are possible values between any two values.

For example, a variable over a non-empty range of the real numbers is continuous, if it can take on any value in that range. [4]

Methods of calculus are often used in problems in which the variables are continuous, for example in continuous optimization problems. [5]

In statistical theory, the probability distributions of continuous variables can be expressed in terms of probability density functions. [6]

In continuous-time dynamics, the variable time is treated as continuous, and the equation describing the evolution of some variable over time is a differential equation. [7] The instantaneous rate of change is a well-defined concept that takes the ratio of the change in the dependent variable to the independent variable at a specific instant.

This is an image of vials with different amounts of liquid. A continuous variable could be the volume of liquid in the vials. A discrete variable could be the number of vials. Continuous or discrete variables example.jpg
This is an image of vials with different amounts of liquid. A continuous variable could be the volume of liquid in the vials. A discrete variable could be the number of vials.

Discrete variable

In contrast, a variable is a discrete variable if and only if there exists a one-to-one correspondence between this variable and a subset of , the set of natural numbers. [8] In other words, a discrete variable over a particular interval of real values is one for which, for any value in the range that the variable is permitted to take on, there is a positive minimum distance to the nearest other permissible value. The value of a discrete variable can be obtained by counting, and the number of permitted values is either finite or countably infinite. Common examples are variables that must be integers, non-negative integers, positive integers, or only the integers 0 and 1. [9]

Methods of calculus do not readily lend themselves to problems involving discrete variables. Especially in multivariable calculus, many models rely on the assumption of continuity. [10] Examples of problems involving discrete variables include integer programming.

In statistics, the probability distributions of discrete variables can be expressed in terms of probability mass functions. [6]

In discrete time dynamics, the variable time is treated as discrete, and the equation of evolution of some variable over time is called a difference equation. [11] For certain discrete-time dynamical systems, the system response can be modelled by solving the difference equation for an analytical solution.

In econometrics and more generally in regression analysis, sometimes some of the variables being empirically related to each other are 0-1 variables, being permitted to take on only those two values. [12] The purpose of the discrete values of 0 and 1 is to use the dummy variable as a ‘switch’ that can ‘turn on’ and ‘turn off’ by assigning the two values to different parameters in an equation. A variable of this type is called a dummy variable. If the dependent variable is a dummy variable, then logistic regression or probit regression is commonly employed. In the case of regression analysis, a dummy variable can be used to represent subgroups of the sample in a study (e.g. the value 0 corresponding to a constituent of the control group). [13]

Mixture of continuous and discrete variables

A mixed multivariate model can contain both discrete and continuous variables. For instance, a simple mixed multivariate model could have a discrete variable , which only takes on values 0 or 1, and a continuous variable . [14] An example of a mixed model could be a research study on the risk of psychological disorders based on one binary measure of psychiatric symptoms and one continuous measure of cognitive performance. [15] Mixed models may also involve a single variable that is discrete over some range of the number line and continuous at another range.

In probability theory and statistics, the probability distribution of a mixed random variable consists of both discrete and continuous components. A mixed random variable does not have a cumulative distribution function that is discrete or everywhere-continuous. An example of a mixed type random variable is the probability of wait time in a queue. The likelihood of a customer experiencing a zero wait time is discrete, while non-zero wait times are evaluated on a continuous time scale. [16] In physics (particularly quantum mechanics, where this sort of distribution often arises), dirac delta functions are often used to treat continuous and discrete components in a unified manner. For example, the previous example might be described by a probability density , such that , and .

See also

Related Research Articles

<span class="mw-page-title-main">Cumulative distribution function</span> Probability that random variable X is less than or equal to x

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable , or just distribution function of , evaluated at , is the probability that will take a value less than or equal to .

<span class="mw-page-title-main">Probability theory</span> Branch of mathematics concerning probability

Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event.

<span class="mw-page-title-main">Probability distribution</span> Mathematical function for the probability a given outcome occurs in an experiment

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.

<span class="mw-page-title-main">Random variable</span> Variable representing a random phenomenon

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers to neither randomness nor variability but instead is a mathematical function in which

In descriptive statistics, the range of a set of data is size of the narrowest interval which contains all the data. It is calculated as the difference between the largest and smallest values. It is expressed in the same units as the data. The range provides an indication of statistical dispersion. Since it only depends on two of the observations, it is most useful in representing the dispersion of small data sets.

<span class="mw-page-title-main">Stochastic process</span> Collection of random variables

In probability theory and related fields, a stochastic or random process is a mathematical object usually defined as a family of random variables in a probability space, where the index of the family often has the interpretation of time. Stochastic processes are widely used as mathematical models of systems and phenomena that appear to vary in a random manner. Examples include the growth of a bacterial population, an electrical current fluctuating due to thermal noise, or the movement of a gas molecule. Stochastic processes have applications in many disciplines such as biology, chemistry, ecology, neuroscience, physics, image processing, signal processing, control theory, information theory, computer science, and telecommunications. Furthermore, seemingly random changes in financial markets have motivated the extensive use of stochastic processes in finance.

<span class="mw-page-title-main">Probability mass function</span> Discrete-variable probability distribution

In probability and statistics, a probability mass function is a function that gives the probability that a discrete random variable is exactly equal to some value. Sometimes it is also known as the discrete probability density function. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression estimates the parameters of a logistic model. In binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

<span class="mw-page-title-main">Joint probability distribution</span> Type of probability distribution

Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the marginal distributions, i.e. the distributions of each of the individual random variables and the conditional probability distributions, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s).

<span class="mw-page-title-main">Mathematical statistics</span> Branch of statistics

Mathematical statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to techniques for collecting statistical data. Specific mathematical techniques which are used for this include mathematical analysis, linear algebra, stochastic analysis, differential equations, and measure theory.

When classification is performed by a computer, statistical methods are normally used to develop the algorithm.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

The term kernel is used in statistical analysis to refer to a window function. The term "kernel" has several distinct meanings in different branches of statistics.

In statistics, multivariate adaptive regression splines (MARS) is a form of regression analysis introduced by Jerome H. Friedman in 1991. It is a non-parametric regression technique and can be seen as an extension of linear models that automatically models nonlinearities and interactions between variables.

In mathematical dynamics, discrete time and continuous time are two alternative frameworks within which variables that evolve over time are modeled.

In statistics, linear regression is a model that estimates the linear relationship between a scalar response and one or more explanatory variables. A model with exactly one explanatory variable is a simple linear regression; a model with two or more explanatory variables is a multiple linear regression. This term is distinct from multivariate linear regression, which predicts multiple correlated dependent variables rather than a single dependent variable.

References

  1. Ali, Zulfiqar; Bhaskar, S. Bala (September 2016). "Basic statistical tools in research and data analysis". Indian Journal of Anaesthesia. 60 (9): 662–669. doi: 10.4103/0019-5049.190623 . PMC   5037948 . PMID   27729694.
  2. Kaliyadan, Feroze; Kulkarni, Vinay (January 2019). "Types of Variables, Descriptive Statistics, and Sample Size". Indian Dermatology Online Journal. 10 (1): 82–86. doi: 10.4103/idoj.IDOJ_468_18 . PMC   6362742 . PMID   30775310.
  3. K.D. Joshi, Foundations of Discrete Mathematics, 1989, New Age International Limited, , page 7.
  4. Brzychczy, Stanisaw; Gorniewicz, Lech (2011). "Continuous and discrete models of neural systems in infinite-dimensional abstract spaces". Neurocomputing. 74 (17): 2711–2715. doi:10.1016/j.neucom.2010.11.005.
  5. Griva, Igor; Nash, Stephen; Sofer, Ariela (2009). Linear and nonlinear optimization (2nd ed.). Philadelphia: Society for Industrial and Applied Mathematics. p. 7. ISBN   978-0-89871-661-0. OCLC   236082842.
  6. 1 2 Dekking, Frederik Michel; Kraaikamp, Cornelis; Lopuhaä, Hendrik Paul; Meester, Ludolf Erwin (2005). "A Modern Introduction to Probability and Statistics". Springer Texts in Statistics. doi:10.1007/1-84628-168-7. ISBN   978-1-85233-896-1. ISSN   1431-875X.
  7. Poyton, A. A.; Varziri, Mohammad Saeed; McAuley, Kimberley B.; MclellanPat James, Pat James; Ramsay, James O. (February 15, 2006). "Parameter estimation in continuous-time dynamic models using principal differential analysis". Computers & Chemical Engineering. 30 (4): 698–708. doi:10.1016/j.compchemeng.2005.11.008.
  8. Odifreddi, Piergiorgio (February 18, 1992). Classical Recursion Theory: The Theory of Functions and Sets of Natural Numbers. North Holland Publishing Company. p. 18. ISBN   978-0444894830.
  9. van Douwen, Eric (1984). Handbook of Set-Theoretic Topology. North Holland: Elsevier. pp. 113–167. ISBN   978-0-444-86580-9.
  10. Clogg, Clifford C.; Shockey, James W. (1988). Handbook of Multivariate Experimental Psychology. Boston, Massachusetts: Springer Publishing Company. pp. 337–365. ISBN   978-1-4613-0893-5.
  11. Thyagarajan, K.S. (2019). Introduction to Digital Signal Processing Using MATLAB with Application to Digital Communications (1 ed.). Springer Publishing Company. pp. 21–63. ISBN   978-3319760285.
  12. Miller, Jerry L.L.; Erickson, Maynard L. (May 1974). "On Dummy Variable Regression Analysis". Sociological Methods & Research. 2 (4): 395–519. doi:10.1177/004912417400200402.
  13. Hardy, Melissa A. (February 25, 1993). Regression with Dummy Variables (Quantitative Applications in the Social Sciences) (1st ed.). Newbury Park: Sage Publications, Inc. p. v. ISBN   0803951280.
  14. Olkin, Ingram; Tate, Robert (June 1961). "Multivariate Correlation Models with Mixed Discrete and Continuous Variables". The Annals of Mathematical Statistics. 32 (2): 448–465. doi: 10.1214/aoms/1177705052 .
  15. Fitzmaurice, Garrett M.; Laird, Nan M. (March 1997). "Regression Models for Mixed Discrete and Continuous Responses with Potentially Missing Values". Biometrics. 53 (1): 110–122. doi:10.2307/2533101. JSTOR   2533101.
  16. Sharma, Shalendra D. (March 1975). "On a Continuous/Discrete Time Queueing System with Arrivals in Batches of Variable Size and Correlated Departures". Journal of Applied Probability. 12 (1): 115–129. doi:10.2307/3212413. JSTOR   3212413.