Social statistics

Last updated

Social statistics is the use of statistical measurement systems to study human behavior in a social environment. This can be accomplished through polling a group of people, evaluating a subset of data obtained about a group of people, or by observation and statistical analysis of a set of data that relates to people and their behaviors.

Contents

Statistics in the social sciences

History

Adolph Quetelet published data on European population. Statue elevee a la memoire de Adolphe Quetelet.jpg
Adolph Quetelet published data on European population.

Adolph Quetelet was a proponent of social physics. In his book Physique sociale [1] he presents distributions of human heights, age of marriage, time of birth and death, time series of human marriages, births and deaths, a survival density for humans and curve describing fecundity as a function of age. He also developed the Quetelet Index.

Francis Ysidro Edgeworth published "On Methods of Ascertaining Variations in the Rate of Births, Deaths, and Marriages" in 1885 [2] which uses squares of differences for studying fluctuations and George Udny Yule published "On the Correlation of total Pauperism with Proportion of Out-Relief" in 1895. [3]

A numerical calibration for the fertility curve was given by Karl Pearson in 1897 in his "The Chances of Death, and Other Studies in Evolution" [4] In this book Pearson also uses standard deviation, correlation and skewness for studying humans.

Vilfredo Pareto published his analysis of the distribution of income in Great Britain and Ireland in 1897, [5] this is now known as the Pareto principle.

Louis Guttman proposed that the values of ordinal variables can be represented by a Guttman scale, which is useful if the number of variables is large and allows the use of techniques such as ordinary least squares. [6]

Macroeconomic statistical research has provided stylized facts, which include:

Statistics and statistical analyses have become a key feature of social science: statistics is employed in economics, psychology, political science, sociology and anthropology.

Statistical methods in social sciences

Diagram illustrating path analysis: causal paths link endogenous variables and exogenous variables. Path example.JPG
Diagram illustrating path analysis: causal paths link endogenous variables and exogenous variables.
Cluster analysis showing two main clusters SLINK-density-data.svg
Cluster analysis showing two main clusters
A classification performed using the perceptron algorithm Perceptron cant choose.svg
A classification performed using the perceptron algorithm

Methods and concepts used in quantitative social sciences include: [9]

Statistical techniques include: [9]

Covariance based methods

Probability based methods

Distance based methods

Methods for categorical data

Usage and applications

Social scientists use social statistics for many purposes, including:

Reliability

The use of statistics has become so widespread in the social sciences that many universities such as Harvard, have developed institutes focusing on "quantitative social science." Harvard's Institute for Quantitative Social Science focuses mainly on fields like political science that incorporate the advanced causal statistical models that Bayesian methods provide. However, some experts in causality feel that these claims of causal statistics are overstated. [13] [14] There is a debate regarding the uses and value of statistical methods in social science, especially in political science, with some statisticians questioning practices such as data dredging that can lead to unreliable policy conclusions of political partisans who overestimate the interpretive power that non-robust statistical methods such as simple and multiple linear regression allow. Indeed, an important axiom that social scientists cite, but often forget, is that "correlation does not imply causation."

Further reading

Related Research Articles

Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. More precisely, it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference." An introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships." Jan Tinbergen is one of the two founding fathers of econometrics. The other, Ragnar Frisch, also coined the term in the sense in which it is used today.

<span class="mw-page-title-main">Statistics</span> Study of the collection, analysis, interpretation, and presentation of data

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

A case study is an in-depth, detailed examination of a particular case within a real-world context. For example, case studies in medicine may focus on an individual patient or ailment; case studies in business might cover a particular firm's strategy or a broader market; similarly, case studies in politics can range from a narrow happening over time like the operations of a specific political campaign, to an enormous undertaking like world war, or more often the policy analysis of real-world problems affecting multiple stakeholders.

<span class="mw-page-title-main">Quantitative research</span> All procedures for the numerical representation of empirical facts

Quantitative research is a research strategy that focuses on quantifying the collection and analysis of data. It is formed from a deductive approach where emphasis is placed on the testing of theory, shaped by empiricist and positivist philosophies.

<span class="mw-page-title-main">Spurious relationship</span> Apparent, but false, correlation between causally-independent variables

In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but not causally related, due to either coincidence or the presence of a certain third, unseen factor.

In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis, factor analysis, canonical correlation analysis, discriminant analysis, as well as more general families of models in the multivariate analysis of variance and covariance analyses.

<span class="mw-page-title-main">Francis Ysidro Edgeworth</span> Irish economist (1845–1926)

Francis Ysidro Edgeworth was an Anglo-Irish philosopher and political economist who made significant contributions to the methods of statistics during the 1880s. From 1891 onward, he was appointed the founding editor of The Economic Journal.

<span class="mw-page-title-main">Structural equation modeling</span> Form of causal modeling that fit networks of constructs to data

Structural equation modeling (SEM) is a diverse set of methods used by scientists doing both observational and experimental research. SEM is used mostly in the social and behavioral sciences but it is also used in epidemiology, business, and other fields. A definition of SEM is difficult without reference to technical language, but a good starting place is the name itself.

Mathematical psychology is an approach to psychological research that is based on mathematical modeling of perceptual, thought, cognitive and motor processes, and on the establishment of law-like rules that relate quantifiable stimulus characteristics with quantifiable behavior. The mathematical approach is used with the goal of deriving hypotheses that are more exact and thus yield stricter empirical validations. There are five major research areas in mathematical psychology: learning and memory, perception and psychophysics, choice and decision-making, language and thinking, and measurement and scaling.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

Quantitative psychology is a field of scientific study that focuses on the mathematical modeling, research design and methodology, and statistical analysis of psychological processes. It includes tests and other devices for measuring cognitive abilities. Quantitative psychologists develop and analyze a wide variety of research methods, including those of psychometrics, a field concerned with the theory and technique of psychological measurement.

<span class="mw-page-title-main">Confounding</span> Variable or factor in causal inference

In causal inference, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation. Some notations are explicitly designed to identify the existence, possible existence, or non-existence of confounders in causal relationships between elements of a system.

Designing Social Inquiry: Scientific Inference in Qualitative Research is an influential 1994 book written by Gary King, Robert Keohane, and Sidney Verba that lays out guidelines for conducting qualitative research. The central thesis of the book is that qualitative and quantitative research share the same "logic of inference." The book primarily applies lessons from regression-oriented analysis to qualitative research, arguing that the same logics of causal inference can be used in both types of research.

Statistics, in the modern sense of the word, began evolving in the 18th century in response to the novel needs of industrializing sovereign states.

Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. Typically it involves establishing four elements: correlation, sequence in time, a plausible physical or information-theoretical mechanism for an observed effect to follow from a possible cause, and eliminating the possibility of common and alternative ("special") causes. Such analysis usually involves one or more artificial or natural experiments.

There have been many criticisms of econometrics' usefulness as a discipline and perceived widespread methodological shortcomings in econometric modelling practices.

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The study of why things occur is called etiology, and can be described using the language of scientific causal notation. Causal inference is said to provide the evidence of causality theorized by causal reasoning.

In statistics, econometrics, epidemiology, genetics and related disciplines, causal graphs are probabilistic graphical models used to encode assumptions about the data-generating process.

Causal analysis is the field of experimental design and statistical analysis pertaining to establishing cause and effect. Exploratory causal analysis (ECA), also known as data causality or causal discovery is the use of statistical algorithms to infer associations in observed data sets that are potentially causal under strict assumptions. ECA is a type of causal inference distinct from causal modeling and treatment effects in randomized controlled trials. It is exploratory research usually preceding more formal causal research in the same way exploratory data analysis often precedes statistical hypothesis testing in data analysis

References

  1. A. Quetelet, Physique Sociale, https://archive.org/details/physiquesociale00quetgoog
  2. Edgeworth, F. Y. (1885). "On Methods of Ascertaining Variations in the Rate of Births, Deaths, and Marriages". Journal of the Statistical Society of London . 48 (4): 628–649. doi:10.2307/2979201. JSTOR   2979201.
  3. Yule, G. U. (1895). "On the Correlation of total Pauperism with Proportion of Out-Relief". The Economic Journal . 5 (20): 603–611. doi:10.2307/2956650. JSTOR   2956650.
  4. K. Pearson, The Chances of Death, and Other Studies in Evolution, 1897 https://archive.org/details/chancesdeathand00peargoog
  5. V. Pareto, Cours d'Économie Politique, vol. II, 1897
  6. Guttman, L. (1944). "A Basis for Scaling Qualitative Data". American Sociological Review . 9 (20): 603–611. doi:10.2307/2086306. JSTOR   2086306.
  7. A. Bowley, Wages and income in the United kingdom since 1860, 1937
  8. W. Phillips, The Relation Between Unemployment and the Rate of Change of Money Wage Rates in the United Kingdom, 1861–1957, published 1958
  9. 1 2 Miller, Delbert C., & Salkind, Neil J (2002), Handbook of Research Design and Social Measurement, California: Sage, ISBN   0-7619-2046-3 {{citation}}: CS1 maint: multiple names: authors list (link)
  10. 1 2 3 Hoffman, Frederick (1908). "Problems of Social Statistics and Social Research". Publications of the American Statistical Association. 11 (82): 105–132. doi:10.2307/2276101. JSTOR   2276101.
  11. Willcox, Walter (1908). "The Need of Social Statistics as an Aid to the Courts". Publications of the American Statistical Association. 13 (82).
  12. Mitchell, Wesley (1919). "Statistics and Government". Publications of the American Statistical Association. 16 (125): 223–235. doi:10.2307/2965000. JSTOR   2965000.
  13. Pearl, Judea 2001, Bayesianism and Causality, or, Why I am only a Half-Bayesian, Foundations of Bayesianism, Kluwer Applied Logic Series, Kluwer Academic Publishers, Vol 24, D. Cornfield and J. Williamson (Eds.) 19-36.
  14. J. Pearl, Bayesianism and causality, or, why I am only a half-bayesian http://ftp.cs.ucla.edu/pub/stat_ser/r284-reprint.pdf
Social science statistics centers
Statistical databases for social science