How to Read Numbers

Last updated

How to Read Numbers: A Guide to Statistics in the News (and Knowing When to Trust Them)
How to Read Numbers.jpg
Author
  • Tom Chivers
  • David Chivers
Subject Statistics in journalism, healthcare and politics
Publisher Weidenfeld & Nicolson
Publication date
March 2021
Pages200
ISBN 9781474619974

How to Read Numbers: A Guide to Statistics in the News (and Knowing When to Trust Them) is a 2021 British book by Tom and David Chivers. It describes misleading uses of statistics in the news, with contemporary examples about the COVID-19 pandemic, healthcare, politics and crime. The book was conceived by the authors, who are cousins, in early 2020. It received positive reviews for its readability, engagingness, accessibility to non-mathematicians and applicability to journalistic writing.

Contents

Background

Tom and David Chivers, cousins, wrote a proposal for the book in the first months of 2020 after complaining to each other about a news story with poor interpretation of numerical data. The proposal used a case study of deaths at a university that was cut from the final book and briefly mentioned the incoming COVID-19 pandemic. [1] At the time of writing, Tom Chivers was a science editor for UnHerd [2] —winning Statistical Excellence in Journalism Awards from the RSS in 2018 and 2020 [3] [4] —and author of one previous book, The Rationalist's Guide to the Galaxy. [5] David Chivers was an assistant professor of economics at the University of Durham. [2] Tom Chivers viewed journalists as more literate than numerate and incentivised to make information sound dramatic; David Chivers said the "publish or perish" motivation in academia could have a similar effect. [1]

The authors believed statistics could be given more prominence in school curricula and that numerical understanding should be viewed like literacy. Tom Chivers received some feedback from school and university teachers that they had use the book in their teaching. David Chivers said it was common to view maths as calculations rather than as interpretation of what numerical information means in context. [1]

The book was released in March 2021. [6] It concludes with a "statistical style guide", recommended for journalists. The authors presented this at the Significance lecture in 2021. [2]

Synopsis

An introduction outlines why the authors believe interpreting statistics is an important skill, with COVID-19 pandemic information to illustrate this. Each chapter covers a misleading use of statistics that can be found in the news:

  1. Simpson's paradox, a type of ecological fallacy, means that an average like the basic reproduction number of SARS-CoV-2 can disguise a different trend in subgroups.
  2. Anecdotal evidence can guide individual decision-making but extraordinary anecdotes are more likely to be reported, such as for claimed effectiveness of alternative medicine.
  3. Small sample sizes from normal distributions can only measure large effect sizes.
  4. Biased samples can be reweighted to be representative, but polls are often not reweighted.
  5. Hypothesis testing shows statistical significance when the hypothesis is determined before data collection; p-hacking such as that of Brian Wansink misuse this framework.
  6. Studies with small effect sizes should not be used to make major lifestyle changes, though they may be important to scientific understanding.
  7. Confounders must be controlled for to determine causality: while ice cream and deaths by drowning positively correlate, neither causes the other. Some studies found "sensation seeking" as a confounder for vaping and smoking marijuana.
  8. Observational studies show correlation, not causation. Randomised controlled trials can measure causality; natural experiments or instrumental variables may be used where this is infeasible.
  9. Large numbers do not indicate high frequency without knowledge of a population size, as in misleading reports of cycling deaths, murders committed by undocumented migrants or money sent to the European Union when the UK was a member state.
  10. The false positive paradox, a consequence of Bayes' theorem, yields unexpected conclusions: for instance, a person's likelihood of having SARS-CoV-2 after a positive test result varies according to its prevalence in the population.
  11. Relative risk can make sound more alarming than absolute risk, which is often omitted from news reports: for instance, an 18% increase in seizures could be an increase from 0.024% to 0.028%.
  12. Measurement changes can cause inaccurate perception of trends: for example, widening of DSM criteria led to an increased proportion of the population who were diagnosed with autism; increased crime statistics can follow falls in prevalence but higher reporting rates.
  13. With GDP and PISA rankings as case studies, changes in ranking position are not always statistically significant.
  14. Individual studies should be contextualised, as in literature studies and meta-analyses, but are often reported in isolation in the news. This can lead scientific consensus on health to appear more changeable than in reality. The Lancet MMR autism fraud was amplified by lack of contextualisation.
  15. Publication bias – which can be detected with a funnel plot – leads to overrepresentation of studies that report a correlation or large effect size. Daryl Bem's claimed evidence of effect preceding cause is used as an example.
  16. For data that regularly fluctuates, such as the weather, extreme values as starting points can be used to disguise trends or create false trends.
  17. Confidence intervals measure uncertainty and Brier scores measure how useful a forecast is over many predictions, rather than over a single prediction that may seem wrong—such as rain occurring when a forecast said there was a 5% chance of rain.
  18. It is important to know the assumptions made by a model, as some can drastically change the resultant forecast. For instance, predictions of COVID-19 pandemic deaths differ based on whether they model unchanged human behaviour or compliance with lockdowns.
  19. The Texas sharpshooter fallacy can make one prediction seem incredibly accurate in hindsight after many diverse predictions are made, such as of the 2007–2008 financial crisis and the 2017 UK hung parliament result.
  20. Identifying patterns by selecting for the dependent variable is survivorship bias, such as concluding what makes a company successful by studying only successful companies.
  21. Collider variables, opposite to confounders, can yield false results if controlled for. If entrance to a college is predicated on either high academic scores or sporting achievement then the student population may show a negative correlation between academic and sporting success where none exists in the population.
  22. Goodhart's law—"when a measure becomes a target, it ceases to be a good measure"—can be seen in healthcare, politics and education.

The authors end with a recommended "statistical style guide" for journalists.

Reception

In a nomination for Chalkdust 's 2021 Book of the Year, a reviewer lauded the "readable and enjoyable" brevity of chapters, the clarity and conciseness of explanations and the utility for non-mathematicians. [7] Writing in The Big Issue , Stephen Bush approved of its light tone, informativeness and separation of expository mathematical material into optional sections. [5] Vivek Kaul of Mint praised its simplicity and the importance of the final chapter. [8]

Martin Chilton recommended the book in The Independent as informative and enjoyable, saying that the Chivers "make sense of dense material and offer engrossing insights". [6] [9] In The Times , Manjit Kumar described that "the authors do a splendid job of stringing words together so smartly that even difficult concepts are explained and understood with deceptive ease". [10] Rainer Hank of Frankfurter Allgemeine Zeitung said that he had learned much from the book and that such engaging educational materials, with little mathematical knowledge required, could lead to better journalism. [11]

Related Research Articles

<span class="mw-page-title-main">Butterfly effect</span> Idea that small causes can have large effects

In chaos theory, the butterfly effect is the sensitive dependence on initial conditions in which a small change in one state of a deterministic nonlinear system can result in large differences in a later state.

<span class="mw-page-title-main">Statistics</span> Study of the collection, analysis, interpretation, and presentation of data

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

<span class="mw-page-title-main">Simpson's paradox</span> Error in statistical reasoning with groups

Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling.

<span class="mw-page-title-main">Journalist</span> Person who collects, writes and distributes news and similar information

A journalist is a person who gathers information in the form of text, audio or pictures, processes it into a newsworthy form and disseminates it to the public. This is called journalism.

<span class="mw-page-title-main">Meta-analysis</span> Statistical method that summarizes and or integrates data from multiple sources

Meta-analysis is the statistical combination of the results of multiple studies addressing a similar research question. An important part of this method involves computing a combined effect size across all of the studies. As such, this statistical approach involves extracting effect sizes and variance measures from various studies. Meta-analyses are integral in supporting research grant proposals, shaping treatment guidelines, and influencing health policies. They are also pivotal in summarizing existing research to guide future studies, thereby cementing their role as a fundamental methodology in metascience.

<span class="mw-page-title-main">Epidemiology</span> Study of health and disease within a population

Epidemiology is the study and analysis of the distribution, patterns and determinants of health and disease conditions in a defined population.

Goodhart's law is an adage often stated as, "When a measure becomes a target, it ceases to be a good measure". It is named after British economist Charles Goodhart, who is credited with expressing the core idea of the adage in a 1975 article on monetary policy in the United Kingdom:

Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

<span class="mw-page-title-main">Dependent and independent variables</span> Concept in mathematical modeling, statistical modeling and experimental sciences

A variable is considered dependent if it depends on an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule, on the values of other variables. Independent variables, in turn, are not seen as depending on any other variable in the scope of the experiment in question. In this sense, some common independent variables are time, space, density, mass, fluid flow rate, and previous values of some observed value of interest to predict future values.

Fact-checking is the process of verifying the factual accuracy of questioned reporting and statements. Fact-checking can be conducted before or after the text or content is published or otherwise disseminated. Internal fact-checking is such checking done in-house by the publisher to prevent inaccurate content from being published; when the text is analyzed by a third party, the process is called external fact-checking.

Statistics, when used in a misleading fashion, can trick the casual observer into believing something other than what the data shows. That is, a misuse of statistics occurs when a statistical argument asserts a falsehood. In some cases, the misuse may be accidental. In others, it is purposeful and for the gain of the perpetrator. When the statistical reason involved is false or misapplied, this constitutes a statistical fallacy.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

<span class="mw-page-title-main">Misinformation</span> Incorrect information with or without an intention to deceive

Misinformation is incorrect or misleading information. Misinformation can exist without specific malicious intent; disinformation is distinct in that it is deliberately deceptive and propagated. Misinformation can include inaccurate, incomplete, misleading, or false information as well as selective or half-truths.

<span class="mw-page-title-main">Confounding</span> Variable or factor in causal inference

In causal inference, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation. Some notations are explicitly designed to identify the existence, possible existence, or non-existence of confounders in causal relationships between elements of a system.

<span class="mw-page-title-main">Randomized experiment</span> Experiment using randomness in some aspect, usually to aid in removal of bias

In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design and in survey sampling.

In causal models, controlling for a variable means binning data according to measured values of the variable. This is typically done so that the variable can no longer act as a confounder in, for example, an observational study or experiment.

A micromort is a unit of risk defined as a one-in-a-million chance of death. Micromorts can be used to measure the riskiness of various day-to-day activities. A microprobability is a one-in-a million chance of some event; thus, a micromort is the microprobability of death. The micromort concept was introduced by Ronald A. Howard who pioneered the modern practice of decision analysis.

Casualty estimation often refers to the process of statistically estimating the number of injuries or deaths in a battle or natural disaster that has already occurred. Estimates based on detailed information on individual deaths, but also extending to statistical extrapolations, became known as casualty recording in the early twenty-first century. Casualty prediction is the process of estimating the number of injuries or deaths that might occur in a planned or potential battle or natural disaster.

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The study of why things occur is called etiology, and can be described using the language of scientific causal notation. Causal inference is said to provide the evidence of causality theorized by causal reasoning.

<span class="mw-page-title-main">Evaluation of binary classifiers</span> Quantitative measurement of accuracy

Evaluation of a binary classifier typically assigns a numerical value, or values, to a classifier that represent its accuracy. An example is error rate, which measures how frequently the classifier makes a mistake.

<i>The Book of Why</i> 2018 book by Judea Pearl and Dana Mackenzie

The Book of Why: The New Science of Cause and Effect is a 2018 nonfiction book by computer scientist Judea Pearl and writer Dana Mackenzie. The book explores the subject of causality and causal inference from statistical and philosophical points of view for a general audience.

References

  1. 1 2 3 Bennett, Daniel (12 April 2021). "Tom Chivers and David Chivers on how to understand statistics". BBC Science Focus . Retrieved 13 February 2024.
  2. 1 2 3 "News". Significance . 18 (5): 2–3. 29 September 2021.
  3. "Statistical Excellence in Journalism Awards 2018: Winners" (PDF). Royal Statistical Society . Retrieved 13 February 2024.
  4. "Statistical Excellence in Journalism Awards 2020: Winners". Royal Statistical Society. 12 October 2020. Retrieved 13 February 2024.
  5. 1 2 Bush, Stephen (16 May 2021). "How to Read Numbers by Tom Chivers and David Chivers: Light and fun". The Big Issue . Retrieved 13 February 2024.
  6. 1 2 Chilton, Martin (28 December 2020). "The books to look out for in 2021: From Snow Country to Milk Fed". The Independent . Retrieved 13 February 2024.
  7. "How to Read Numbers (Review)". Chalkdust . 12 April 2022. Retrieved 13 February 2024.
  8. Kaul, Vivek (27 December 2021). "Ten books of 2021 that you must not miss". Mint . Retrieved 13 February 2024.
  9. Chilton, Martin (7 March 2021). "Here Comes the Sun". The Independent . ProQuest   2497617029.
  10. Kumar, Manjit (10 April 2021). "How to Read Numbers by Tom Chivers and David Chivers review — a beginner's guide to statistics" . Retrieved 13 February 2024.
  11. Hank, Von Rainer (21 June 2021). "Lügen mit der Corona-Statistik" [Lies with the Coronavirus Statistics]. Frankfurter Allgemeine Zeitung (in German). Retrieved 13 February 2024.

Further reading