Data generating process

Last updated May 24, 2025

In statistics and in empirical sciences, a data generating process is a process in the real world that "generates" the data one is interested in.^[1] This process encompasses the underlying mechanisms, factors, and randomness that contribute to the production of observed data. Usually, scholars do not know the real data generating model and instead rely on assumptions, approximations, or inferred models to analyze and interpret the observed data effectively. However, it is assumed that those real models have observable consequences. Those consequences are the distributions of the data in the population. Those distributors or models can be represented via mathematical functions. There are many functions of data distribution. For example, normal distribution, Bernoulli distribution, Poisson distribution, etc.

References

↑ Tu, Jun; Zhou, Guofu (2004). "Data-generating process uncertainty: What difference does it make in portfolio decisions?" . Journal of Financial Economics. 72 (2): 385–421. doi:10.1016/j.jfineco.2003.05.003.

https://stats.stackexchange.com/questions/443320/what-does-a-data-generating-process-dgp-actually-mean

This statistics-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[TuZhou2004-1] Tu, Jun; Zhou, Guofu (2004). "Data-generating process uncertainty: What difference does it make in portfolio decisions?" . Journal of Financial Economics. 72 (2): 385–421. doi:10.1016/j.jfineco.2003.05.003.

[1]