Social statistics

Last updated

Social statistics is the use of statistical measurement systems to study human behavior in a social environment. This can be accomplished through polling a group of people, evaluating a subset of data obtained about a group of people, or by observation and statistical analysis of a set of data that relates to people and their behaviors.


Statistics in the social sciences


Adolph Quetelet, published data on European population. Statue elevee a la memoire de Adolphe Quetelet.jpg
Adolph Quetelet, published data on European population.

Adolph Quetelet was a proponent of social physics. In his book Physique sociale [1] he presents distributions of human heights, age of marriage, time of birth and death, time series of human marriages, births and deaths, a survival density for humans and curve describing fecundity as a function of age. He also developed the Quetelet Index.

Francis Ysidro Edgeworth published "On Methods of Ascertaining Variations in the Rate of Births, Deaths, and Marriages" in 1885 [2] which uses squares of differences for studying fluctuations and George Udny Yule published "On the Correlation of total Pauperism with Proportion of Out-Relief " in 1895. [3]

A numerical calibration for the fertility curve was given by Karl Pearson in 1897 in his "The Chances of Death, and Other Studies in Evolution" [4] In this book Pearson also uses standard deviation, correlation and skewness for studying humans.

Vilfredo Pareto published his analysis of the distribution of income in Great Britain and Ireland in 1897, [5] this is now known as the Pareto principle.

Louis Guttman proposed that the values of ordinal variables can be represented by a Guttman scale, which is useful if the number of variables is large and allows the use of techniques such as ordinary least squares. [6]

Macroeconomic statistical research has provided stylized facts, which include:

Statistics and statistical analyses have become a key feature of social science: statistics is employed in economics, psychology, political science, sociology and anthropology.

Statistical methods in social sciences

Diagram illustrating path analysis: causal paths link endogenous variables and exogenous variables. Path example.JPG
Diagram illustrating path analysis: causal paths link endogenous variables and exogenous variables.
Cluster analysis showing two main clusters. SLINK-density-data.svg
Cluster analysis showing two main clusters.
A classification performed using the perceptron algorithm. Perceptron cant choose.svg
A classification performed using the perceptron algorithm.

Methods and concepts used in quantitative social sciences include: [9]

Statistical techniques include: [9]

Covariance based methods

Probability based methods

Distance based methods

Methods for categorical data

Usage and applications

Social scientists use social statistics for many purposes, including:


The use of statistics has become so widespread in the social sciences that many universities such as Harvard, have developed institutes focusing on "quantitative social science." Harvard's Institute for Quantitative Social Science focuses mainly on fields like political science that incorporate the advanced causal statistical models that Bayesian methods provide. However, some experts in causality feel that these claims of causal statistics are overstated. [13] [14] There is a debate regarding the uses and value of statistical methods in social science, especially in political science, with some statisticians questioning practices such as data dredging that can lead to unreliable policy conclusions of political partisans who overestimate the interpretive power that non-robust statistical methods such as simple and multiple linear regression allow. Indeed, an important axiom that social scientists cite, but often forget, is that "correlation does not imply causation." For example, it appears widely accepted that the lower numbers of women in decision making positions in politics, business and science is good evidence of gender discrimination. But where men suffer adverse statistical indicators such as greater imprisonment rates or a higher suicide rate, that is not usually accepted as evidence of gender bias acting against them.

Further reading

Related Research Articles

Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. More precisely, it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference". An introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships". Jan Tinbergen is one of the two founding fathers of econometrics. The other, Ragnar Frisch, also coined the term in the sense in which it is used today.

Psychological statistics

Psychological statistics is application of formulas, theorems, numbers and laws to psychology. Statistical methods for psychology include development and application statistical theory and methods for modeling psychological data. These methods include psychometrics, factor analysis, experimental designs, and Bayesian statistics. The article also discusses journals in the same field.

Statistics Study of the collection, analysis, interpretation, and presentation of data

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

A case study is an in-depth, detailed examination of a particular case within a real-world context. For example, case studies in medicine may focus on an individual patient or ailment; case studies in business might cover a particular firm's strategy or a broader market; similarly, case studies in politics can range from a narrow happening over time to an enormous undertaking.

Quantitative research All procedures for the numerical representation of empirical facts

Quantitative research is a research strategy that focuses on quantifying the collection and analysis of data. It is formed from a deductive approach where emphasis is placed on the testing of theory, shaped by empiricist and positivist philosophies.

Spurious relationship Apparent, but false, correlation between causally-independent variables

In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but not causally related, due to either coincidence or the presence of a certain third, unseen factor.

In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis, factor analysis, canonical correlation analysis, discriminant analysis, as well as more general families of models in the multivariate analysis of variance and covariance analyses.

Structural equation modeling Form of causal modeling that fit networks of constructs to data

Structural equation modeling (SEM) is a label for a diverse set of methods used by scientists in both experimental and observational research across the sciences, business, and other fields. It is used most in the social and behavioral sciences. A definition of SEM is difficult without reference to highly technical language, but a good starting place is the name itself.

Quantitative psychology Field of scientific study

Quantitative psychology is a field of scientific study that focuses on the mathematical modeling, research design and methodology, and statistical analysis of psychological processes. It includes tests and other devices for measuring cognitive abilities. Quantitative psychologists develop and analyze a wide variety of research methods, including those of psychometrics, a field concerned with the theory and technique of psychological measurement.

Confounding Variable in statistics

In statistics, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation.

Designing Social Inquiry: Scientific Inference in Qualitative Research is an influential 1994 book written by Gary King, Robert Keohane, and Sidney Verba that lays out guidelines for conducting qualitative research. The central thesis of the book is that qualitative and quantitative research share the same "logic of inference." The book primarily applies lessons from regression-oriented analysis to qualitative research, arguing that the same logics of causal inference can be used in both types of research.

Statistics, in the modern sense of the word, began evolving in the 18th century in response to the novel needs of industrializing sovereign states. The evolution of statistics was, in particular, intimately connected with the development of European states following the peace of Westphalia (1648), and with the development of probability theory, which put statistics on a firm theoretical basis.

Quantitative methods provide the primary research methods for studying the distribution and causes of crime. Quantitative methods provide numerous ways to obtain data that are useful to many aspects of society. The use of quantitative methods such as survey research, field research, and evaluation research as well as others. The data can, and is often, used by criminologists and other social scientists in making causal statements about variables being researched.

David Collier (political scientist) American political scientist (born 1942)

David Collier is an American political scientist specializing in comparative politics. He is Chancellor's Professor Emeritus at the University of California, Berkeley. He works in the fields of comparative politics, Latin American politics, and methodology. His father was the anthropologist Donald Collier.

Andrew Gelman American statistician

Andrew Gelman is an American statistician, professor of statistics and political science at Columbia University. He earned a bachelor's degree in mathematics and in physics from MIT, where he was a National Merit Scholar, in 1986. He then earned a Ph.D. in statistics from Harvard University in 1990 under the supervision of Donald Rubin.

Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. Typically it involves establishing four elements: correlation, sequence in time, a plausible physical or information-theoretical mechanism for an observed effect to follow from a possible cause, and eliminating the possibility of common and alternative ("special") causes. Such analysis usually involves one or more artificial or natural experiments.

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The science of why things occur is called etiology. Causal inference is said to provide the evidence of causality theorized by causal reasoning.

In statistics, econometrics, epidemiology, genetics and related disciplines, causal graphs are probabilistic graphical models used to encode assumptions about the data-generating process.

Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. Exploratory causal analysis (ECA), also known as data causality or causal discovery is the use of statistical algorithms to infer associations in observed data sets that are potentially causal under strict assumptions. ECA is a type of causal inference distinct from causal modeling and treatment effects in randomized controlled trials. It is exploratory research usually preceding more formal causal research in the same way exploratory data analysis often precedes statistical hypothesis testing in data analysis


  1. A. Quetelet, Physique Sociale,
  2. Edgeworth, F. Y. (1885). "On Methods of Ascertaining Variations in the Rate of Births, Deaths, and Marriages". Journal of the Statistical Society of London . 48 (4): 628–649. doi:10.2307/2979201. JSTOR   2979201.
  3. Yule, G. U. (1895). "On the Correlation of total Pauperism with Proportion of Out-Relief". The Economic Journal . 5 (20): 603–611. doi:10.2307/2956650. JSTOR   2956650.
  4. K. Pearson, The Chances of Death, and Other Studies in Evolution, 1897
  5. V. Pareto, Cours d'Économie Politique, vol. II, 1897
  6. Guttman, L. (1944). "A Basis for Scaling Qualitative Data". The American Sociological Review . 9 (20): 603–611. JSTOR   2086306.
  7. A. Bowley, Wages and income in the United kingdom since 1860, 1937
  8. W. Phillips, The Relation Between Unemployment and the Rate of Change of Money Wage Rates in the United Kingdom, 1861–1957, published 1958
  9. 1 2 Miller, Delbert C., & Salkind, Neil J (2002), Handbook of Research Design and Social Measurement, California: Sage, ISBN   0-7619-2046-3 {{citation}}: CS1 maint: multiple names: authors list (link)
  10. 1 2 3 Hoffman, Frederick (1908). "Problems of Social Statistics and Social Research". Publications of the American Statistical Association. 11 (82).
  11. Willcox, Walter (1908). "The Need of Social Statistics as an Aid to the Courts". Publications of the American Statistical Association. 13 (82).
  12. Mitchell, Wesley (1919). "Statistics and Government". Publications of the American Statistical Association. 16 (125).
  13. Pearl, Judea 2001, Bayesianism and Causality, or, Why I am only a Half-Bayesian, Foundations of Bayesianism, Kluwer Applied Logic Series, Kluwer Academic Publishers, Vol 24, D. Cornfield and J. Williamson (Eds.) 19-36.
  14. J. Pearl, Bayesianism and causality, or, why I am only a half-bayesian
Social science statistics centers
Statistical databases for social science